@ocelotbot/tinyld
TypeScript icon, indicating that this package has built-in type declarations

1.1.8 • Public • Published

TinyLD

npm npm CDN Download License

logo

Tiny Language Detector, simply detect the language of a unicode UTF-8 text:

  • pure javascript, no api call, and no dependency (node and browser compatible)
  • alternative to libraries like CLD
  • blazing fast and low memory footprint (unlike ML methods)
  • support 62 languages (30 for the web version)
  • format ISO 639-1

Extra


Getting Started

Install

yarn add tinyld # or npm install --save tinyld

API

import { detect, detectAll } from 'tinyld'

// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en

// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]

More Information


TinyLD CLI

tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]

More Information


Benchmark

Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.

Library Script Properly Identified Improperly identified Not identified Avg Execution Time Disk Size
TinyLD yarn bench:tinyld 96.1747% 2.6938% 1.1315% 0.1315ms. 778KB
TinyLD Web yarn bench:tinyld-light 92.1169% 3.9536% 3.9295% 0.0616ms. 89KB
node-cld yarn bench:cld 88.9148% 1.7489% 9.3363% 0.0612ms. > 10MB
node-lingua yarn bench:lingua 82.3157% 0.2158% 17.4685% 0.7085ms. ~100MB
franc yarn bench:franc 68.7783% 26.3432% 4.8785% 0.1381ms. 267KB
franc-min yarn bench:franc-min 65.5163% 23.5794% 10.9044% 0.0614ms. 119KB
languagedetect yarn bench:languagedetect 61.6068% 12.295% 26.0982% 0.1585ms. 240KB

Remark

  • For each category, top3 results are in Bold
  • Language evaluated in this benchmark:
    • Asia: jpn, cmn, kor, hin
    • Europe: fra, spa, por, ita, nld, eng, deu, fin, rus
    • Middle east: , tur, heb, ara
  • This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances

Conclusion

Recommended

  • For NodeJS: TinyLD or node-cld (fast and accurate)
  • For Browser: TinyLD Light or franc-min (small, decent accuracy, franc is less accurate but support more languages)

Not recommended

  • node-lingua is just too big and slow
  • languagedetect is light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)

Package Sidebar

Install

npm i @ocelotbot/tinyld

Weekly Downloads

3,499

Version

1.1.8

License

MIT

Unpacked Size

14 MB

Total Files

42

Last publish

Collaborators

  • jun.masui
  • koshea