-
Tokyo University of Foreign Studies
- http://www.tufs.ac.jp/ts/personal/nomoto/
Starred repositories
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
A simple library for querying the URIEL typological database.
This is a Sarawak Malay speech and text data for the purpose of speech technology research. The data was collected by Faculty of Computer Science and Information Technology, Universiti Malaysia Sar…
The first Indonesian structurally ambiguous utterances corpus
VerbInd: Pangkalan data verba bahasa Indonesia berbasis korpus.
Neural Question Generation using the SQuAD and NewsQA datasets
A Vietnamese natural language processing toolkit (NAACL 2018)
Docker file to build the Indonesian TreeTagger.
「BERTによる自然言語処理入門: Transformersを使った実践プログラミング」サポートページ
Stemmer and lemmatizer for Indonesian (Bahasa Indonesia)
A curated list of research papers and resources on Indonesian languages
A collaborative project to collect datasets in Indonesian languages.
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code!…
A constituency treebank that conforms to the Penn Treebank format
🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
Aksara is an Indonesian morphological analyzer that conforms to the UD v2 annotation guidelines
Python package for Korean natural language processing.
Natural language processing resources for multiple languages, with an eye towards use for digital humanities.