8000 GitHub - midobal/mecab: Yet another Japanese morphological analyzer (and tokenizer).
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

midobal/mecab

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MeCab

Japanese morphologycal analyzer and tokenizer.

What is MeCab?

MeCab is an open source morphological analysis engine developed as a joint research project between Kyoto University Information Research department and the Nippon Telegraph and Telecommunications Communication Science Laboratories. It is built with the goal of general purpose analysis and does not depend on any particular language corpus/dictionary.

More information about MeCab can be found in the documentation (Japanese), or in its English translation.

Instalation

To install MeCab, follow these steps:

MeCab installation

Create a directory in which to install MeCab:

mkdir MeCab_installation_directory

Configure, compile and install MeCab:

cd mecab
configure --prefix=path_to_MeCab_installation_directory --with-charset=utf8
make install

Dictionary installation

Configure, compile and install the dictionary:

cd mecab-ipadic
configure --with-mecab-config=../mecab/mecab-config --prefix=path_to_MeCab_installation_directory --with-charset=utf8
make install

Tokenizer

To use MeCab as a tokenizer, use the option:

mecab -O wakati < raw_file > tokenized_file

About

Yet another Japanese morphological analyzer (and tokenizer).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 74.9%
  • HTML 11.3%
  • Shell 9.5%
  • Makefile 1.8%
  • C# 0.9%
  • Java 0.5%
  • Other 1.1%
0