Kim et al., 2020 - Google Patents

Comparison of korean preprocessing performance according to tokenizer in nmt transformer model

Kim et al., 2020

Document ID: 2978657861924220703
Author: Kim G; Lee S
Publication year: 2020
Publication venue: Journal of Advances in Information Technology Vol

External Links

Cited by

Snippet

Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model …

Continue reading at pdfs.semanticscholar.org (PDF) (other versions)

238000007781 pre-processing 0 title description 5

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Similar Documents

Publication	Publication Date	Title
Hong et al.	2019	FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm
Faruqui et al.	2015	Morphological inflection generation using character sequence to sequence learning
Ikeda	2017	Japanese text normalization with encoder-decoder model
Watson et al.	2018	Utilizing character and word embeddings for text normalization with sequence-to-sequence models
Mori	2010	Word-based partial annotation for efficient corpus construction
Na	2015	Conditional random fields for Korean morpheme segmentation and POS tagging
Belinkov	2018	On internal language representations in deep learning: An analysis of machine translation and speech recognition
Min et al.	2015	BosonNLP: An ensemble approach for word segmentation and POS tagging
Silfverberg et al.	2016	Data-driven spelling correction using weighted finite-state methods
Gambäck et al.	2009	Methods for Amharic part-of-speech tagging
Beckley	2015	Bekli: A Simple Approach to Twitter Text Normalization.
Ballesteros et al.	2017	Greedy transition-based dependency parsing with stack lstms
Kim et al.	2020	Comparison of korean preprocessing performance according to tokenizer in nmt transformer model
Liu et al.	2021	Morphological segmentation for Seneca
Elshafei et al.	2006	Machine Generation of Arabic Diacritical Marks.
Tran et al.	2021	Hierarchical transformer encoders for Vietnamese spelling correction
Cho et al.	2018	Real-time automatic word segmentation for user-generated text
Mahdhaoui et al.	2023	Optimizing Arabic Named Entity Recognition through Active Learning and AraBERT
Pailai et al.	2013	A comparative study on different techniques for thai part-of-speech tagging
Mammadov et al.	2018	Part-of-speech tagging for azerbaijani language
Ramesh et al.	2020	Interpretable natural language segmentation based on link grammar
Farooq et al.	2009	Phrase-based correction model for improving handwriting recognition accuracies
Shekhar et al.	2020	Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants
Vylomova et al.	2019	Contextualization of morphological inflection
Singvongsa et al.	2016	Lao-Thai machine translation using statistical model