Kim et al., 2020 - Google Patents
Comparison of korean preprocessing performance according to tokenizer in nmt transformer modelKim et al., 2020
View PDF- Document ID
- 2978657861924220703
- Author
- Kim G
- Lee S
- Publication year
- Publication venue
- Journal of Advances in Information Technology Vol
External Links
Snippet
Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model …
- 238000007781 pre-processing 0 title description 5
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hong et al. | FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm | |
Faruqui et al. | Morphological inflection generation using character sequence to sequence learning | |
Ikeda | Japanese text normalization with encoder-decoder model | |
Watson et al. | Utilizing character and word embeddings for text normalization with sequence-to-sequence models | |
Mori | Word-based partial annotation for efficient corpus construction | |
Na | Conditional random fields for Korean morpheme segmentation and POS tagging | |
Belinkov | On internal language representations in deep learning: An analysis of machine translation and speech recognition | |
Min et al. | BosonNLP: An ensemble approach for word segmentation and POS tagging | |
Silfverberg et al. | Data-driven spelling correction using weighted finite-state methods | |
Gambäck et al. | Methods for Amharic part-of-speech tagging | |
Beckley | Bekli: A Simple Approach to Twitter Text Normalization. | |
Ballesteros et al. | Greedy transition-based dependency parsing with stack lstms | |
Kim et al. | Comparison of korean preprocessing performance according to tokenizer in nmt transformer model | |
Liu et al. | Morphological segmentation for Seneca | |
Elshafei et al. | Machine Generation of Arabic Diacritical Marks. | |
Tran et al. | Hierarchical transformer encoders for Vietnamese spelling correction | |
Cho et al. | Real-time automatic word segmentation for user-generated text | |
Mahdhaoui et al. | Optimizing Arabic Named Entity Recognition through Active Learning and AraBERT | |
Pailai et al. | A comparative study on different techniques for thai part-of-speech tagging | |
Mammadov et al. | Part-of-speech tagging for azerbaijani language | |
Ramesh et al. | Interpretable natural language segmentation based on link grammar | |
Farooq et al. | Phrase-based correction model for improving handwriting recognition accuracies | |
Shekhar et al. | Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants | |
Vylomova et al. | Contextualization of morphological inflection | |
Singvongsa et al. | Lao-Thai machine translation using statistical model |