8000 GitHub - yeondudad/franc: Detect the language of text
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

yeondudad/franc

 
 

Repository files navigation

franc

Build Status Coverage Status Code Climate

Detect the language of text.

What’s so cool about franc?

  1. franc supports more languages(†) than any other library, or Google;
  2. franc is easily forked to support 335 languages;
  3. franc is just as fast as the competition.

† - If humans write in the language, on the web, and the language has more than one million speakers, franc detects it.

Installation

npm:

$ npm install franc

Component:

$ component install wooorm/franc

Bower:

$ bower install franc

Usage

var franc = require('franc');

franc('Alle menslike wesens word vry'); // "afr"
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট'); // "ben"
franc('Alle mennesker er født frie og'); // "nno"
franc(''); // "und"

franc.all('O Brasil caiu 26 posições em');
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'glg', 0.7362599377808503 ],
 *   [ 'src', 0.7286553750432078 ],
 *   [ 'lav', 0.6944348427238161 ],
 *   [ 'cat', 0.6802627030763913 ],
 *   [ 'spa', 0.6633252678880055 ],
 *   [ 'bos', 0.6536467334946423 ],
 *   [ 'tpi', 0.6477704804701002 ],
 *   [ 'hrv', 0.6456965088143796 ],
 *   [ 'snn', 0.6374006221914967 ],
 *   [ 'bam', 0.5900449360525406 ],
 *   [ 'sco', 0.5893536121673004 ],
 *   ...
 * ]
 */

/* "und" is returned for too-short input: */
franc.all(''); // [ [ 'und', 1 ] ]

/* Provide a whitelist: */
franc.all('O Brasil caiu 26 posições em', {
    'whitelist' : ['por', 'src', 'glg', 'spa']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'glg', 0.7362599377808503 ],
 *   [ 'src', 0.7286553750432078 ],
 *   [ 'spa', 0.6633252678880055 ]
 * ]
*/

/* Provide a blacklist: */
franc.all('O Brasil caiu 26 posições em', {
    'blacklist' : ['src', 'glg', 'lav']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'cat', 0.6802627030763913 ],
 *   [ 'spa', 0.6633252678880055 ],
 *   [ 'bos', 0.6536467334946423 ],
 *   [ 'tpi', 0.6477704804701002 ],
 *   [ 'hrv', 0.6456965088143796 ],
 *   [ 'snn', 0.6374006221914967 ],
 *   [ 'bam', 0.5900449360525406 ],
 *   [ 'sco', 0.5893536121673004 ],
 *   ...
 * ]
 */

CLI

Install:

$ npm install --global franc

Use:

Usage: franc [options] string

Detect the language of text

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -w, --whitelist <string>      allow languages
  -b, --blacklist <string>      disallow languages

Usage:

# output language of value
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições em"
# src

# whitelist certain languages and use stdin
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob

Supported languages

franc supports 175 “languages”. For a complete list, check out Supported-Languages.md.

Supporting more or less languages

Supporting more or less languages is easy: fork the project and run the following:

$ npm install # Install development dependencies.
$ THRESHOLD=100000 npm run build # Run the `build` script with an environment variable.

The above would create a version of franc with support for any language with 100,000 or more speakers. To support all languages, even dead ones like Latin, specify -1.

Benchmark

On a MacBook Air, it runs 175 paragraphs 2 times per second (total: 350 op/s).

         benchmarks * 175 paragraphs in different languages
  2 op/s » franc -- this module
  2 op/s » guesslanguage
  2 op/s » languagedetect
  2 op/s » vac

(I’ll work on a better benchmark soon)

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Maciej Ceglowski, Jacob R. Rideout, and Kent S. Johnson.

License

MIT © Titus Wormer

About

Detect the language of text

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 96.2%
  • Shell 3.8%
0