[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103440236A - United labeling method for syntax of Tibet language and semantic roles - Google Patents

United labeling method for syntax of Tibet language and semantic roles Download PDF

Info

Publication number
CN103440236A
CN103440236A CN2013104210748A CN201310421074A CN103440236A CN 103440236 A CN103440236 A CN 103440236A CN 2013104210748 A CN2013104210748 A CN 2013104210748A CN 201310421074 A CN201310421074 A CN 201310421074A CN 103440236 A CN103440236 A CN 103440236A
Authority
CN
China
Prior art keywords
verb
semantic
predicate
role
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013104210748A
Other languages
Chinese (zh)
Other versions
CN103440236B (en
Inventor
邱莉榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minzu University of China
Original Assignee
Minzu University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minzu University of China filed Critical Minzu University of China
Priority to CN201310421074.8A priority Critical patent/CN103440236B/en
Publication of CN103440236A publication Critical patent/CN103440236A/en
Application granted granted Critical
Publication of CN103440236B publication Critical patent/CN103440236B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a method of processing minority characters into Chinese language, and in particular relates to a united labeling method for syntax of Tibet language and semantic roles. The united labeling method comprises the following steps of: a) distinguishing a simple sentence and a compound sentence; b) labeling semantic roles; c) recognizing a predicate; d) classifying verb semantics; e) labeling a syntactic structure; f) editing and revising semantic role labeling results. According to the united labeling method, the syntax of Tibet language and semantic features are extracted, on the one hand, semantic role information such as a performer, a receiver, time, a place and a way expressed in the sentence can be labeled by directly utilizing grammatical labels of the Tibet language; on the other hand, a syntax analytical process can be reacted upon by the predicate semantic role labeling result so that the influence of the syntax labeling which is not well-determined can be reduced, and accordingly the performance of a sentence processing system can be improved.

Description

Tibetan language syntax and semantic role associating mask method
Technical field
The present invention relates to minority language is processed into the method for Chinese, relate in particular to a kind of Tibetan language syntax and semantic role associating mask method.
Background technology
A hundred flowers blossom for Tibetan information process field research contents, aspect word, word and phrase processing, obtaining successively breakthrough, and the tackling key problem processing stage of sentence starts.
Semantic analysis is one of challenging problem of tool in computational linguistics field, is also the Main Bottleneck of restriction language message technology large-scale application.Semantic analysis is exactly the meaning of a word according to notional word in sentence structure and sentence, derives the actual semanteme of sentence, and this is the main target that sentence is processed.
The task of semantic character labeling, find out the corresponding semantic role composition of predicate in sentence exactly, as: agent, word denoting the receiver of an action, time, place, mode etc., the mark of these compositions plays an important role for the semanteme of understanding a sentence.
Syntactic analysis is according to given grammer, derives the syntactic structure of sentence, the one, determine the pedigree structure that sentence comprises, and the one, the constituent of definite sentence.The expression-form of syntactic analysis result is syntax tree.
General semantic character labeling method is in the situation that given syntax tree studies how to be applied to various characteristic actions in machine learning algorithm.
Traditional semantic character labeling research is generally carried out on the syntax treatment basis.But at present, be difficult to obtain the result of Tibetan language deep parsing.Existing Tibetan language syntactic analysis system is also not fully up to expectations in the performance of general field.
Summary of the invention
For the deficiency existed on prior art, the invention provides a kind of Tibetan language syntax and semantic role associating mask method.
To achieve these goals, the present invention realizes by the following technical solutions:
A kind of Tibetan language syntax and semantic role associating mask method, it comprises the following steps:
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
Above-mentioned Tibetan language syntax and semantic role associating mask method, its step b) be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the nominalizations mark in its step b) comprise the person of doing things or mode, method, situation or craft, handicraft, material, things or action, res gestae or custom, rule or attitude, situation or soul, spirit or quantity, standard, place the time or idle or alternate, in turn or certain aspect.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the described grammer role in its step b) is Arg0-5, and Arg0 means the agent of action, and Arg1 means the impact of action, and Arg2-5 is defined as different semantic meanings according to predicate.
Above-mentioned Tibetan language syntax and semantic role associating mask method, the described grammer role composition in its step b) comprises agentive subject, possesses subject, recipient object, object object, effected object, place object, verbal predicate and adjective predicate.
Above-mentioned Tibetan language syntax and semantic role associating mask method, described in its step c), predicate comprises verb, auxiliary verb, verbal suffix or modal particle.
Above-mentioned Tibetan language syntax and semantic role associating mask method, predicate described in its step c) comprises Tong Ge or common cast, accusative, executes lattice, possesses lattice, position lattice, dative, object lattice, from possessive case, instrumental (case), from lattice, result case or factitive case.
Above-mentioned Tibetan language syntax and semantic role associating mask method, in its step d) verb comprise transitive verb, intransitive verb, autonomous verb, autonomous verb, modal auxiliary, proterties verb, action verb, psychological verb, perception verb, change verb, directional verb, state verb, close link-verb, possess verb, have verb, interactive verb, causative verb.
Beneficial effect:
The present invention extracts Tibetan language syntax and semantic feature, on the one hand, can directly utilize the grammatical markers of Tibetan language, marks the semantic role informations such as agent expressed in sentence, word denoting the receiver of an action, time, place, mode; On the other hand, the semantic character labeling result for predicate, can react on syntax resolving, and reducing syntactic marker can not well-determinedly affect, thereby improve the performance of sentence disposal system.
Embodiment
For technological means, creation characteristic that the present invention is realized, reach purpose and effect is easy to understand, below in conjunction with embodiment, further set forth the present invention.
It comprises the following steps: the present invention
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
According to case marking and the semantic character labeling needs of Tibetan language, the semantic role of clear and definite Tibetan language.The semantic role of core is Arg0-5, Arg0 means the agent (agentive case) of action, and Arg1 means the impact (result case) of action, and Arg2-5 has different semantic meanings according to the predicate verb difference, increase the additional semantic role of part, as ArgM-LOC(position lattice).
Be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
Tibetan language has abundant case marking, and relevant with semantic character labeling have a kind more than 30, as: execute lattice, objective case, benefactive case, similar comparative case, occupy lattice, destination etc.Some lattice are corresponding to a kind of semantic role (as executing lattice); Some case markings may be corresponding to a plurality of semantic roles, or a corresponding multiple case marking of semantic role, as Arg1(result case, benefactive case) etc.
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
The predicate part of speech is mainly distinguished adjective predicate sentence and Verb Predicate Sentence, and adjective predicate sentence is according to sentence formula feature identification, Verb Predicate Sentence: the syntactic marker relevant to predicate verb, as the time, the body suffix, modal particle, auxiliary verb etc.
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
Sentence formula analysis based on the verbal suffix semantic information.
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
Marked content in the present invention comprises:
1. syntax composition mark
Agentive subject SUA Subject?agent
Possesses subject SUP Subject?possessive
Recipient object OBP Object?patient
The object object OBT Object?target
Effected object OBD Object?product
The place object OBL Object?locative
Verbal predicate PRV Verb?Predicate
The adjective predicate PRA Adjective?Predicate
2. the case marking in syntax
Logical lattice/common cast ABS Absolutive
Accusative PAT patient
Execute lattice AGN agentive
Possess lattice POS possessive
The position lattice LOC locative
Dative DAT dative
The object lattice OBJ objective
From possessive case GEN genitive
Instrumental (case) INS instrumental
From lattice ABL ablative
Result case/factitive case FAT factitive
3. nominalizations mark
4. verb mark
Transitive verb 1 grade of mark VT transitive?verb
Intransitive verb 1 grade of mark VI intransitive?verb
Autonomous verb 1 grade of mark VL volition?verb
Not autonomous verb 1 grade of mark IVL in-volition?verb
Modal auxiliary 1 grade of mark MAU modal?auxiliary
The proterties verb 2 grades of marks STA stative?verb
Action verb 2 grades of marks ACT action?verb
The psychology verb 2 grades of marks COG cognition?verb
The perception verb 2 grades of marks PER perception?verb
Change verb 2 grades of marks CHA verb?of?change
Directional verb 2 grades of marks DIR directional?verb
State verb 2 grades of marks NAR narrate?verb
Close link-verb 2 grades of marks COU copula
Possesses verb 2 grades of marks VOP verb?of?possession
There is verb 2 grades of marks EXI existential?verb
Interactive verb 2 grades of marks REL interrelation?verb
Causative verb 2 grades of marks CAV Causative?verb
In table, the connotation of 1 grade of mark and 2 grades of marks is:
The word of 2 grades of marks, may belong to 1 grade of mark of certain class, and for example: being labeled as " action verb " (2 grades of marks), is also transitive verb (1 grade of mark).
When the verb in distich carries out mark, if can not refine to 2 grades of marks, at least to mark 1 grade of mark.
Above demonstration and described ultimate principle of the present invention and principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; that in above-described embodiment and instructions, describes just illustrates principle of the present invention; without departing from the spirit and scope of the present invention; the present invention also has various changes and modifications, and these changes and improvements all fall in the claimed scope of the invention.The claimed scope of the present invention is defined by appending claims and equivalent thereof.

Claims (8)

1. a Tibetan language syntax and semantic role associating mask method, is characterized in that, comprises the following steps:
A) single complex sentence is distinguished: long sentence is divided into to some short sentences;
B) semantic role mark: case marking, comprise grammer role composition, nominalizations or non-meaning verb chunk mark, remove non-marked content;
C) predicate recognition: for the predicate feature, determine that the semantic structure classification under predicate is adjective predicate sentence or Verb Predicate Sentence;
D) semantic verbs classification: for the verbal suffix marker characteristic, determine semantic structure of verb type;
E) syntactic structure mark: for semantic structure of verb type, utilize Shallow Semantic Parsing, screening, identification semantic role, again to the semantic structure classification of type;
F) editor's revision semantic character labeling result.
2. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, step b) be marked in sentence serve as agent, word denoting the receiver of an action, relate to thing, possess and control, the grammer role composition of object, purpose, place, material, source or instrument, remove modal particle, demonstrative pronoun, indefinite deictic words, interrogative pronoun, plural suffix or respect language morpheme, do not consider temporal information.
3. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the nominalizations mark in step b) comprise the person of doing things or mode, method, situation or craft, handicraft, material, things or action, res gestae or custom, rule or attitude, situation or soul, spirit or quantity, standard, place the time or idle or alternate, in turn or certain aspect.
4. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the described grammer role in step b) is Arg0-5, and Arg0 means the agent of action, Arg1 means the impact of action, and Arg2-5 is defined as different semantic meanings according to predicate.
5. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, the described grammer role composition in step b) comprises agentive subject, possesses subject, recipient object, object object, effected object, place object, verbal predicate and adjective predicate.
6. Tibetan language syntax according to claim 1 and semantic role associating mask method, is characterized in that, predicate described in step c) comprises verb, auxiliary verb, verbal suffix or modal particle.
7. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, predicate described in step c) comprises Tong Ge or common cast, accusative, executes lattice, possesses lattice, position lattice, dative, object lattice, from possessive case, instrumental (case), from lattice, result case or factitive case.
8. Tibetan language syntax according to claim 1 and semantic role are combined mask method, it is characterized in that, in step d) verb comprise transitive verb, intransitive verb, autonomous verb, autonomous verb, modal auxiliary, proterties verb, action verb, psychological verb, perception verb, change verb, directional verb, state verb, possess verb, have verb, interactive verb, causative verb.
CN201310421074.8A 2013-09-16 2013-09-16 Tibetan language syntax and semantic role associating mask method Expired - Fee Related CN103440236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310421074.8A CN103440236B (en) 2013-09-16 2013-09-16 Tibetan language syntax and semantic role associating mask method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310421074.8A CN103440236B (en) 2013-09-16 2013-09-16 Tibetan language syntax and semantic role associating mask method

Publications (2)

Publication Number Publication Date
CN103440236A true CN103440236A (en) 2013-12-11
CN103440236B CN103440236B (en) 2015-12-09

Family

ID=49693928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310421074.8A Expired - Fee Related CN103440236B (en) 2013-09-16 2013-09-16 Tibetan language syntax and semantic role associating mask method

Country Status (1)

Country Link
CN (1) CN103440236B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217713A (en) * 2014-07-15 2014-12-17 西北师范大学 Tibetan-Chinese speech synthesis method and device
CN104239294A (en) * 2014-09-10 2014-12-24 华建宇通科技(北京)有限责任公司 Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system
CN106294311A (en) * 2015-06-12 2017-01-04 科大讯飞股份有限公司 A kind of Tibetan language tone Forecasting Methodology and system
CN107818078A (en) * 2017-07-20 2018-03-20 张宝华 The semantic association and matching process of Chinese natural language dialogue
CN108446268A (en) * 2018-02-11 2018-08-24 青海师范大学 Tibetan language personal pronoun reference resolution system
CN111275094A (en) * 2020-01-17 2020-06-12 厦门快商通科技股份有限公司 Data labeling method, device and equipment based on machine learning
CN115017902A (en) * 2022-06-09 2022-09-06 青海师范大学 Deep learning-based Tibetan phrase structure recognition model construction method and device
CN115510869A (en) * 2022-05-30 2022-12-23 青海师范大学 End-to-end Tibetan La lattice shallow semantic analysis method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119050A1 (en) * 2009-11-18 2011-05-19 Koen Deschacht Method for the automatic determination of context-dependent hidden word distributions
CN102662931A (en) * 2012-04-13 2012-09-12 厦门大学 Semantic role labeling method based on synergetic neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119050A1 (en) * 2009-11-18 2011-05-19 Koen Deschacht Method for the automatic determination of context-dependent hidden word distributions
CN102662931A (en) * 2012-04-13 2012-09-12 厦门大学 Semantic role labeling method based on synergetic neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CAROL GENETTI: "Syntactic aspects of nominalization in five Tibeto-Burman languages of the Himalayan area", 《LINGUISTICS OF THE TIBETO-BURMAN AREA》, vol. 31, no. 2, 31 October 2008 (2008-10-31), pages 97 - 143 *
江 荻: "现代藏语谓语动词的识别与信息提取", 《20TH INTERNATIONAL CONFERENCE ON COMPUTER PROCESSING OF ORIENTAL LANGUAGES》, 31 August 2003 (2003-08-31), pages 154 - 160 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217713A (en) * 2014-07-15 2014-12-17 西北师范大学 Tibetan-Chinese speech synthesis method and device
CN104239294A (en) * 2014-09-10 2014-12-24 华建宇通科技(北京)有限责任公司 Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system
CN106294311A (en) * 2015-06-12 2017-01-04 科大讯飞股份有限公司 A kind of Tibetan language tone Forecasting Methodology and system
CN106294311B (en) * 2015-06-12 2019-03-19 科大讯飞股份有限公司 A kind of Tibetan language tone prediction technique and system
CN107818078A (en) * 2017-07-20 2018-03-20 张宝华 The semantic association and matching process of Chinese natural language dialogue
CN107818078B (en) * 2017-07-20 2021-08-17 张宝华 Semantic association and matching method for Chinese natural language dialogue
CN108446268A (en) * 2018-02-11 2018-08-24 青海师范大学 Tibetan language personal pronoun reference resolution system
CN111275094A (en) * 2020-01-17 2020-06-12 厦门快商通科技股份有限公司 Data labeling method, device and equipment based on machine learning
CN115510869A (en) * 2022-05-30 2022-12-23 青海师范大学 End-to-end Tibetan La lattice shallow semantic analysis method
CN115510869B (en) * 2022-05-30 2023-08-01 青海师范大学 End-to-end Tibetan Lager shallow semantic analysis method
CN115017902A (en) * 2022-06-09 2022-09-06 青海师范大学 Deep learning-based Tibetan phrase structure recognition model construction method and device

Also Published As

Publication number Publication date
CN103440236B (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN103440236B (en) Tibetan language syntax and semantic role associating mask method
JP5356197B2 (en) Word semantic relation extraction device
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
US20150154184A1 (en) Morphology analysis for machine translation
CN105068990B (en) A kind of English long sentence dividing method of more strategies of Machine oriented translation
Zhang et al. HANSpeller++: A unified framework for Chinese spelling correction
JPWO2017163346A1 (en) Sentence analysis system and program
Zampieri et al. Colonia: Corpus of historical portuguese
Stallard et al. Unsupervised morphology rivals supervised morphology for arabic mt
CN103678288A (en) Automatic proper noun translation method
Tyers et al. Annotation schemes in North Sámi dependency parsing
Falkenjack et al. Classifying easy-to-read texts without parsing
Tuggener et al. The sentence end and punctuation prediction in nlg text (sepp-nlg) shared task 2021
CN109241521A (en) A kind of high attention rate sentence extracting method of scientific and technical literature based on adduction relationship
JP6418975B2 (en) Difficulty level estimation model learning device, difficulty level estimation device, method, and program
Dinu et al. Dealing with the grey sheep of the Romanian gender system, the neuter
Çetinoglu Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing.
Wang et al. A light rule-based approach to English subject-verb agreement errors on the third person singular forms
Kaur et al. Deadwood detection and elimination in text summarization for Punjabi language
Øvrelid et al. Lexical categories for improved parsing of web data
CN102184171B (en) Method for checking mechanical translation
Ribeyre et al. Accurate deep syntactic parsing of graphs: The case of french
Salaberri et al. First approach toward Semantic Role Labeling for Basque.
JP6298780B2 (en) Difficulty level learning device, difficulty level estimation model learning device, difficulty level estimation device, method, and program
KR101739393B1 (en) Specialty eojeol analysis method considering punctuation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20151209

Termination date: 20200916

CF01 Termination of patent right due to non-payment of annual fee