CN103440236A - United labeling method for syntax of Tibet language and semantic roles - Google Patents
United labeling method for syntax of Tibet language and semantic roles Download PDFInfo
- Publication number
- CN103440236A CN103440236A CN2013104210748A CN201310421074A CN103440236A CN 103440236 A CN103440236 A CN 103440236A CN 2013104210748 A CN2013104210748 A CN 2013104210748A CN 201310421074 A CN201310421074 A CN 201310421074A CN 103440236 A CN103440236 A CN 103440236A
- Authority
- CN
- China
- Prior art keywords
- verb
- semantic
- predicate
- role
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a method of processing minority characters into Chinese language, and in particular relates to a united labeling method for syntax of Tibet language and semantic roles. The united labeling method comprises the following steps of: a) distinguishing a simple sentence and a compound sentence; b) labeling semantic roles; c) recognizing a predicate; d) classifying verb semantics; e) labeling a syntactic structure; f) editing and revising semantic role labeling results. According to the united labeling method, the syntax of Tibet language and semantic features are extracted, on the one hand, semantic role information such as a performer, a receiver, time, a place and a way expressed in the sentence can be labeled by directly utilizing grammatical labels of the Tibet language; on the other hand, a syntax analytical process can be reacted upon by the predicate semantic role labeling result so that the influence of the syntax labeling which is not well-determined can be reduced, and accordingly the performance of a sentence processing system can be improved.
Description
Technical field
The present invention relates to minority language is processed into the method for Chinese, relate in particular to a kind of Tibetan language syntax and semantic role associating mask method.
Background technology
A hundred flowers blossom for Tibetan information process field research contents, aspect word, word and phrase processing, obtaining successively breakthrough, and the tackling key problem processing stage of sentence starts.
Semantic analysis is one of challenging problem of tool in computational linguistics field, is also the Main Bottleneck of restriction language message technology large-scale application.Semantic analysis is exactly the meaning of a word according to notional word in sentence structure and sentence, derives the actual semanteme of sentence, and this is the main target that sentence is processed.
The task of semantic character labeling, find out the corresponding semantic role composition of predicate in sentence exactly, as: agent, word denoting the receiver of an action, time, place, mode etc., the mark of these compositions plays an important role for the semanteme of understanding a sentence.
Syntactic analysis is according to given grammer, derives the syntactic structure of sentence, the one, determine the pedigree structure that sentence comprises, and the one, the constituent of definite sentence.The expression-form of syntactic analysis result is syntax tree.
General semantic character labeling method is in the situation that given syntax tree studies how to be applied to various characteristic actions in machine learning algorithm.
Traditional semantic character labeling research is generally carried out on the syntax treatment basis.But at present, be difficult to obtain the result of Tibetan language deep parsing.Existing Tibetan language syntactic analysis system is also not fully up to expectations in the performance of general field.
Summary of the invention
For the deficiency existed on prior art, the invention provides a kind of Tibetan language syntax and semantic role associating mask method.
To achieve these goals, the present invention realizes by the following technical solutions:
A kind of Tibetan language syntax and semantic role associating mask method, it comprises the following steps:
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
Above-mentioned Tibetan language syntax and semantic role associating mask method, its step b) be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the nominalizations mark in its step b) comprise the person of doing things or mode, method, situation or craft, handicraft, material, things or action, res gestae or custom, rule or attitude, situation or soul, spirit or quantity, standard, place the time or idle or alternate, in turn or certain aspect.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the described grammer role in its step b) is Arg0-5, and Arg0 means the agent of action, and Arg1 means the impact of action, and Arg2-5 is defined as different semantic meanings according to predicate.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the described grammer role composition in its step b) comprises agentive subject, possesses subject, recipient object, object object, effected object, place object, verbal predicate and adjective predicate.
Above-mentioned Tibetan language syntax and semantic role associating mask method, described in its step c), predicate comprises verb, auxiliary verb, verbal suffix or modal particle.
Above-mentioned Tibetan language syntax and semantic role associating mask method, predicate described in its step c) comprises Tong Ge or common cast, accusative, executes lattice, possesses lattice, position lattice, dative, object lattice, from possessive case, instrumental (case), from lattice, result case or factitive case.
Above-mentioned Tibetan language syntax and semantic role associating mask method, in its step d) verb comprise transitive verb, intransitive verb, autonomous verb, autonomous verb, modal auxiliary, proterties verb, action verb, psychological verb, perception verb, change verb, directional verb, state verb, close link-verb, possess verb, have verb, interactive verb, causative verb.
Beneficial effect:
The present invention extracts Tibetan language syntax and semantic feature, on the one hand, can directly utilize the grammatical markers of Tibetan language, marks the semantic role informations such as agent expressed in sentence, word denoting the receiver of an action, time, place, mode; On the other hand, the semantic character labeling result for predicate, can react on syntax resolving, and reducing syntactic marker can not well-determinedly affect, thereby improve the performance of sentence disposal system.
Embodiment
For technological means, creation characteristic that the present invention is realized, reach purpose and effect is easy to understand, below in conjunction with embodiment, further set forth the present invention.
It comprises the following steps: the present invention
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
According to case marking and the semantic character labeling needs of Tibetan language, the semantic role of clear and definite Tibetan language.The semantic role of core is Arg0-5, Arg0 means the agent (agentive case) of action, and Arg1 means the impact (result case) of action, and Arg2-5 has different semantic meanings according to the predicate verb difference, increase the additional semantic role of part, as ArgM-LOC(position lattice).
Be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
Tibetan language has abundant case marking, and relevant with semantic character labeling have a kind more than 30, as: execute lattice, objective case, benefactive case, similar comparative case, occupy lattice, destination etc.Some lattice are corresponding to a kind of semantic role (as executing lattice); Some case markings may be corresponding to a plurality of semantic roles, or a corresponding multiple case marking of semantic role, as Arg1(result case, benefactive case) etc.
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
The predicate part of speech is mainly distinguished adjective predicate sentence and Verb Predicate Sentence, and adjective predicate sentence is according to sentence formula feature identification, Verb Predicate Sentence: the syntactic marker relevant to predicate verb, as the time, the body suffix, modal particle, auxiliary verb etc.
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
Sentence formula analysis based on the verbal suffix semantic information.
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
Marked content in the present invention comprises:
1. syntax composition mark
Agentive subject | SUA | Subject?agent |
Possesses subject | SUP | Subject?possessive |
Recipient object | OBP | Object?patient |
The object object | OBT | Object?target |
Effected object | OBD | Object?product |
The place object | OBL | Object?locative |
Verbal predicate | PRV | Verb?Predicate |
The adjective predicate | PRA | Adjective?Predicate |
2. the case marking in syntax
Logical lattice/common cast | ABS | Absolutive |
Accusative | PAT | patient |
Execute lattice | AGN | agentive |
Possess lattice | POS | possessive |
The position lattice | LOC | locative |
Dative | DAT | dative |
The object lattice | OBJ | objective |
From possessive case | GEN | genitive |
Instrumental (case) | INS | instrumental |
From lattice | ABL | ablative |
Result case/factitive case | FAT | factitive |
3. nominalizations mark
4. verb mark
Transitive verb | 1 grade of mark | VT | transitive?verb |
Intransitive verb | 1 grade of mark | VI | intransitive?verb |
Autonomous verb | 1 grade of mark | VL | volition?verb |
Not autonomous verb | 1 grade of mark | IVL | in-volition?verb |
Modal auxiliary | 1 grade of mark | MAU | modal?auxiliary |
The proterties verb | 2 grades of marks | STA | stative?verb |
Action verb | 2 grades of marks | ACT | action?verb |
The psychology verb | 2 grades of marks | COG | cognition?verb |
The perception verb | 2 grades of marks | PER | perception?verb |
Change verb | 2 grades of marks | CHA | verb?of?change |
Directional verb | 2 grades of marks | DIR | directional?verb |
State verb | 2 grades of marks | NAR | narrate?verb |
Close link-verb | 2 grades of marks | COU | copula |
Possesses verb | 2 grades of marks | VOP | verb?of?possession |
There is verb | 2 grades of marks | EXI | existential?verb |
Interactive verb | 2 grades of marks | REL | interrelation?verb |
Causative verb | 2 grades of marks | CAV | Causative?verb |
In table, the connotation of 1 grade of mark and 2 grades of marks is:
The word of 2 grades of marks, may belong to 1 grade of mark of certain class, and for example: being labeled as " action verb " (2 grades of marks), is also transitive verb (1 grade of mark).
When the verb in distich carries out mark, if can not refine to 2 grades of marks, at least to mark 1 grade of mark.
Above demonstration and described ultimate principle of the present invention and principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; that in above-described embodiment and instructions, describes just illustrates principle of the present invention; without departing from the spirit and scope of the present invention; the present invention also has various changes and modifications, and these changes and improvements all fall in the claimed scope of the invention.The claimed scope of the present invention is defined by appending claims and equivalent thereof.
Claims (8)
1. a Tibetan language syntax and semantic role associating mask method, is characterized in that, comprises the following steps:
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
2. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, step b) be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
3. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the nominalizations mark in step b) comprise the person of doing things or mode, method, situation or craft, handicraft, material, things or action, res gestae or custom, rule or attitude, situation or soul, spirit or quantity, standard, place the time or idle or alternate, in turn or certain aspect.
4. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the described grammer role in step b) is Arg0-5, and Arg0 means the agent of action, Arg1 means the impact of action, and Arg2-5 is defined as different semantic meanings according to predicate.
5. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the described grammer role composition in step b) comprises agentive subject, possesses subject, recipient object, object object, effected object, place object, verbal predicate and adjective predicate.
6. Tibetan language syntax according to claim 1 and semantic role associating mask method, is characterized in that, predicate described in step c) comprises verb, auxiliary verb, verbal suffix or modal particle.
7. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, predicate described in step c) comprises Tong Ge or common cast, accusative, executes lattice, possesses lattice, position lattice, dative, object lattice, from possessive case, instrumental (case), from lattice, result case or factitive case.
8. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, in step d) verb comprise transitive verb, intransitive verb, autonomous verb, autonomous verb, modal auxiliary, proterties verb, action verb, psychological verb, perception verb, change verb, directional verb, state verb, possess verb, have verb, interactive verb, causative verb.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310421074.8A CN103440236B (en) | 2013-09-16 | 2013-09-16 | Tibetan language syntax and semantic role associating mask method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310421074.8A CN103440236B (en) | 2013-09-16 | 2013-09-16 | Tibetan language syntax and semantic role associating mask method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440236A true CN103440236A (en) | 2013-12-11 |
CN103440236B CN103440236B (en) | 2015-12-09 |
Family
ID=49693928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310421074.8A Expired - Fee Related CN103440236B (en) | 2013-09-16 | 2013-09-16 | Tibetan language syntax and semantic role associating mask method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103440236B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217713A (en) * | 2014-07-15 | 2014-12-17 | 西北师范大学 | Tibetan-Chinese speech synthesis method and device |
CN104239294A (en) * | 2014-09-10 | 2014-12-24 | 华建宇通科技(北京)有限责任公司 | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system |
CN106294311A (en) * | 2015-06-12 | 2017-01-04 | 科大讯飞股份有限公司 | A kind of Tibetan language tone Forecasting Methodology and system |
CN107818078A (en) * | 2017-07-20 | 2018-03-20 | 张宝华 | The semantic association and matching process of Chinese natural language dialogue |
CN108446268A (en) * | 2018-02-11 | 2018-08-24 | 青海师范大学 | Tibetan language personal pronoun reference resolution system |
CN111275094A (en) * | 2020-01-17 | 2020-06-12 | 厦门快商通科技股份有限公司 | Data labeling method, device and equipment based on machine learning |
CN115017902A (en) * | 2022-06-09 | 2022-09-06 | 青海师范大学 | Deep learning-based Tibetan phrase structure recognition model construction method and device |
CN115510869A (en) * | 2022-05-30 | 2022-12-23 | 青海师范大学 | End-to-end Tibetan La lattice shallow semantic analysis method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
CN102662931A (en) * | 2012-04-13 | 2012-09-12 | 厦门大学 | Semantic role labeling method based on synergetic neural network |
-
2013
- 2013-09-16 CN CN201310421074.8A patent/CN103440236B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
CN102662931A (en) * | 2012-04-13 | 2012-09-12 | 厦门大学 | Semantic role labeling method based on synergetic neural network |
Non-Patent Citations (2)
Title |
---|
CAROL GENETTI: "Syntactic aspects of nominalization in five Tibeto-Burman languages of the Himalayan area", 《LINGUISTICS OF THE TIBETO-BURMAN AREA》, vol. 31, no. 2, 31 October 2008 (2008-10-31), pages 97 - 143 * |
江 荻: "现代藏语谓语动词的识别与信息提取", 《20TH INTERNATIONAL CONFERENCE ON COMPUTER PROCESSING OF ORIENTAL LANGUAGES》, 31 August 2003 (2003-08-31), pages 154 - 160 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217713A (en) * | 2014-07-15 | 2014-12-17 | 西北师范大学 | Tibetan-Chinese speech synthesis method and device |
CN104239294A (en) * | 2014-09-10 | 2014-12-24 | 华建宇通科技(北京)有限责任公司 | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system |
CN106294311A (en) * | 2015-06-12 | 2017-01-04 | 科大讯飞股份有限公司 | A kind of Tibetan language tone Forecasting Methodology and system |
CN106294311B (en) * | 2015-06-12 | 2019-03-19 | 科大讯飞股份有限公司 | A kind of Tibetan language tone prediction technique and system |
CN107818078A (en) * | 2017-07-20 | 2018-03-20 | 张宝华 | The semantic association and matching process of Chinese natural language dialogue |
CN107818078B (en) * | 2017-07-20 | 2021-08-17 | 张宝华 | Semantic association and matching method for Chinese natural language dialogue |
CN108446268A (en) * | 2018-02-11 | 2018-08-24 | 青海师范大学 | Tibetan language personal pronoun reference resolution system |
CN111275094A (en) * | 2020-01-17 | 2020-06-12 | 厦门快商通科技股份有限公司 | Data labeling method, device and equipment based on machine learning |
CN115510869A (en) * | 2022-05-30 | 2022-12-23 | 青海师范大学 | End-to-end Tibetan La lattice shallow semantic analysis method |
CN115510869B (en) * | 2022-05-30 | 2023-08-01 | 青海师范大学 | End-to-end Tibetan Lager shallow semantic analysis method |
CN115017902A (en) * | 2022-06-09 | 2022-09-06 | 青海师范大学 | Deep learning-based Tibetan phrase structure recognition model construction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103440236B (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103440236B (en) | Tibetan language syntax and semantic role associating mask method | |
JP5356197B2 (en) | Word semantic relation extraction device | |
Pettersson et al. | A multilingual evaluation of three spelling normalisation methods for historical text | |
US20150154184A1 (en) | Morphology analysis for machine translation | |
CN105068990B (en) | A kind of English long sentence dividing method of more strategies of Machine oriented translation | |
Zhang et al. | HANSpeller++: A unified framework for Chinese spelling correction | |
JPWO2017163346A1 (en) | Sentence analysis system and program | |
Zampieri et al. | Colonia: Corpus of historical portuguese | |
Stallard et al. | Unsupervised morphology rivals supervised morphology for arabic mt | |
CN103678288A (en) | Automatic proper noun translation method | |
Tyers et al. | Annotation schemes in North Sámi dependency parsing | |
Falkenjack et al. | Classifying easy-to-read texts without parsing | |
Tuggener et al. | The sentence end and punctuation prediction in nlg text (sepp-nlg) shared task 2021 | |
CN109241521A (en) | A kind of high attention rate sentence extracting method of scientific and technical literature based on adduction relationship | |
JP6418975B2 (en) | Difficulty level estimation model learning device, difficulty level estimation device, method, and program | |
Dinu et al. | Dealing with the grey sheep of the Romanian gender system, the neuter | |
Çetinoglu | Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing. | |
Wang et al. | A light rule-based approach to English subject-verb agreement errors on the third person singular forms | |
Kaur et al. | Deadwood detection and elimination in text summarization for Punjabi language | |
Øvrelid et al. | Lexical categories for improved parsing of web data | |
CN102184171B (en) | Method for checking mechanical translation | |
Ribeyre et al. | Accurate deep syntactic parsing of graphs: The case of french | |
Salaberri et al. | First approach toward Semantic Role Labeling for Basque. | |
JP6298780B2 (en) | Difficulty level learning device, difficulty level estimation model learning device, difficulty level estimation device, method, and program | |
KR101739393B1 (en) | Specialty eojeol analysis method considering punctuation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20151209 Termination date: 20200916 |
|
CF01 | Termination of patent right due to non-payment of annual fee |