Huang et al., 2012 - Google Patents
Words without boundaries: Computational approaches to Chinese word segmentationHuang et al., 2012
- Document ID
- 7969446719758994437
- Author
- Huang C
- Xue N
- Publication year
- Publication venue
- Language and Linguistics Compass
External Links
Snippet
The fact that words are not conventionally demarcated in Chinese orthography makes the process of word segmentation non‐trivial. Chinese word segmentation remains a challenging topic in Chinese computational linguistics. We survey previous approaches to …
- 230000011218 segmentation 0 title abstract description 88
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G06F17/278—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Antony et al. | Parts of speech tagging for Indian languages: a literature survey | |
Müller et al. | Robust morphological tagging with word representations | |
Kumar et al. | Part of speech taggers for morphologically rich indian languages: a survey | |
Reimers et al. | Germeval-2014: Nested named entity recognition with neural networks | |
Gómez-Adorno et al. | Improving feature representation based on a neural network for author profiling in social media texts | |
Jhamtani et al. | Word-level language identification in bi-lingual code-switched texts | |
Huang et al. | Words without boundaries: Computational approaches to Chinese word segmentation | |
Fu et al. | Chinese word segmentation as morpheme-based lexical chunking | |
Cing et al. | Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language | |
Hu et al. | Self-supervised synonym extraction from the web. | |
Wong et al. | iSentenizer‐μ: Multilingual Sentence Boundary Detection Model | |
Shafi et al. | UNLT: Urdu natural language toolkit | |
Naz et al. | Urdu part of speech tagging using transformation based error driven learning | |
Attia et al. | Gwu-hasp: Hybrid arabic spelling and punctuation corrector | |
Nooralahzadeh et al. | Part of speech tagging for french social media data | |
Alsayadi et al. | Integrating semantic features for enhancing arabic named entity recognition | |
Florian et al. | Improving mention detection robustness to noisy input | |
Wang et al. | Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker | |
Wanjari et al. | Sentence boundary detection for Marathi language | |
Pailai et al. | A comparative study on different techniques for thai part-of-speech tagging | |
Doostmohammadi et al. | Joint Persian word segmentation correction and zero-width non-joiner recognition using BERT | |
JP5454763B2 (en) | Device for associating words in a sentence pair and computer program therefor | |
Li et al. | A unified model for solving the OOV problem of chinese word segmentation | |
AP et al. | Deep learning based deep level tagger for malayalam | |
Al-Arfaj et al. | Arabic NLP tools for ontology construction from Arabic text: An overview |