[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Kim et al., 2020 - Google Patents

Comparison of korean preprocessing performance according to tokenizer in nmt transformer model

Kim et al., 2020

View PDF
Document ID
2978657861924220703
Author
Kim G
Lee S
Publication year
Publication venue
Journal of Advances in Information Technology Vol

External Links

Snippet

Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model …
Continue reading at pdfs.semanticscholar.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/271Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • G06F17/2827Example based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/274Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Similar Documents

Publication Publication Date Title
Hong et al. FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm
Faruqui et al. Morphological inflection generation using character sequence to sequence learning
Ikeda Japanese text normalization with encoder-decoder model
Watson et al. Utilizing character and word embeddings for text normalization with sequence-to-sequence models
Mori Word-based partial annotation for efficient corpus construction
Na Conditional random fields for Korean morpheme segmentation and POS tagging
Belinkov On internal language representations in deep learning: An analysis of machine translation and speech recognition
Min et al. BosonNLP: An ensemble approach for word segmentation and POS tagging
Silfverberg et al. Data-driven spelling correction using weighted finite-state methods
Gambäck et al. Methods for Amharic part-of-speech tagging
Beckley Bekli: A Simple Approach to Twitter Text Normalization.
Ballesteros et al. Greedy transition-based dependency parsing with stack lstms
Kim et al. Comparison of korean preprocessing performance according to tokenizer in nmt transformer model
Liu et al. Morphological segmentation for Seneca
Elshafei et al. Machine Generation of Arabic Diacritical Marks.
Tran et al. Hierarchical transformer encoders for Vietnamese spelling correction
Cho et al. Real-time automatic word segmentation for user-generated text
Mahdhaoui et al. Optimizing Arabic Named Entity Recognition through Active Learning and AraBERT
Pailai et al. A comparative study on different techniques for thai part-of-speech tagging
Mammadov et al. Part-of-speech tagging for azerbaijani language
Ramesh et al. Interpretable natural language segmentation based on link grammar
Farooq et al. Phrase-based correction model for improving handwriting recognition accuracies
Shekhar et al. Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants
Vylomova et al. Contextualization of morphological inflection
Singvongsa et al. Lao-Thai machine translation using statistical model