[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107330011A - The recognition methods of the name entity of many strategy fusions and device - Google Patents

The recognition methods of the name entity of many strategy fusions and device Download PDF

Info

Publication number
CN107330011A
CN107330011A CN201710447439.2A CN201710447439A CN107330011A CN 107330011 A CN107330011 A CN 107330011A CN 201710447439 A CN201710447439 A CN 201710447439A CN 107330011 A CN107330011 A CN 107330011A
Authority
CN
China
Prior art keywords
recognition result
name entity
recognition
identification
language material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710447439.2A
Other languages
Chinese (zh)
Other versions
CN107330011B (en
Inventor
赵红红
王萌萌
晋耀红
蒋宏飞
杨凯程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingfu Intelligent Technology Co., Ltd
Original Assignee
China Science And Technology (beijing) Co Ltd
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Science And Technology (beijing) Co Ltd, Beijing Shenzhou Taiyue Software Co Ltd filed Critical China Science And Technology (beijing) Co Ltd
Priority to CN201710447439.2A priority Critical patent/CN107330011B/en
Publication of CN107330011A publication Critical patent/CN107330011A/en
Application granted granted Critical
Publication of CN107330011B publication Critical patent/CN107330011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)

Abstract

This application discloses a kind of recognition methods of name entity of many strategy fusions and device, the name entity in the language material obtained is recognized using the first identification model, obtain the first recognition result, in the method that the application is provided, first identification model can update and expand corpus, so as to identify the name entity newly produced in language material, and then first recognition result has higher accuracy rate, recycle the name entity in the method identification language material of many identification model fusions, obtain the second recognition result, merge first recognition result and the second recognition result obtains the 3rd recognition result, semantic digging system is recycled to carry out role's distribution to the 3rd recognition result, and export the name entity with role, it is achieved thereby that in data magnanimity, entity type variation, neologisms reliably identify name entity when emerging in an endless stream, and role's distribution is carried out to the name entity identified.

Description

The recognition methods of the name entity of many strategy fusions and device
Technical field
The application is related to natural language processing field, more particularly to a kind of recognition methods of the name entity of many strategy fusions And device.
Background technology
Name entity is exactly name, mechanism name, place name and other all entities with entitled mark, during it is text Basic information element, is the important carrier of information representation, is the basis of correct understanding and processing text message.Chinese name is real Body identification is one of basic task in natural language processing field, and its main task is to identify that the name occurred in text is real Body and significant numeral classifier phrase are simultaneously sorted out, mainly including name, place name, institution term, temporal expression, the date, Numerical expression etc..
In terms of natural language processing research, name Entity recognition is in information retrieval, information extraction, machine translation and text The application fields such as classification play an important role, and it can significantly increase information retrieval, abstract extraction, information extraction, machine translation It is that the automatic knowledge that obtains is laid a good foundation from text with the performance of the application system such as text classification.Name Entity recognition accurate The height of rate and recall rate, directly decides the performance of the language understanding overall process such as syntactic analysis, semantic analysis.
In recent ten years, domestic and foreign scholars have been inquired into and furtherd investigate extensively to the entity recognition techniques in text.But With the rapid development of Internet, a large amount of random, multi-field text datas constantly increase, to the accurate of name Entity recognition Rate and recall rate propose new requirement, in addition, market there is also a need for carrying out role's distribution to the name entity recognized, because This, either caters to the market demand, or improves the accuracy rate and recall rate of identification, the recognition methods of name entity all need into One step is improved.
Conventional name entity recognition method is divided into two major classes at present:One is rule-based and knowledge method, and two be base In the method for statistics.Rule-based and knowledge method is a kind of method used earliest, and this method is simple, convenient, shortcoming It is to need substantial amounts of manual observation, it is portable poor.Statistics-Based Method will be named Entity recognition to regard a classification as and be asked Topic, using similar SVMs, the sorting technique such as Bayesian model;Name Entity recognition can also be regarded as a sequence simultaneously Row mark problem, sequence labelling is obtained using machine learning such as HMM, maximum entropy Markov chain, condition random fields Model.But the above method or exist to be difficult to meet random, the multi-field, texts that make rapid progress a large amount of at present are named The problem of Entity recognition, or the accuracy rate and recall rate of identification are low.
Such as, Chinese patent CN201610943210.3 disclose a kind of name entity recognition method based on artificial intelligence and Device, the function mould that this method is generated by using conditional random field models and according to the retrieval daily record in preset time period Type, while being named Entity recognition to text to be identified.The defect of the program is default entity word in its second identification The functional mode that converges is to obtain candidates all in text to be identified by methods such as dictionary, rule match first to name entity word Converge, so judge its as name entity vocabulary confidence level height, due to rule method tend to rely on concrete syntax, Field and text formatting, compilation process is time-consuming and easily produces mistake, and needs exper ienced linguist to complete, And the coverage rate of dictionary is relatively low, therefore this method is difficult to meet to largely random, the multi-field, texts that make rapid progress enter at present Row name Entity recognition.
For another example Chinese patent CN201510889318.4 discloses a kind of name Entity recognition side suitable for social networks Method, this method obtains first instance probability distribution and the test of Training document in the First ray marking model using initial construction After the second instance probability distribution of document, similarity feature is extracted from social network information, similarity feature is based on again afterwards Training obtains the second sequence labelling model, and then is obtained carrying out sequence labelling to test document based on the second sequence labelling model The recognition result of entity is named, the accuracy rate and recall rate of final this method are low, its F value recognized is only 33.19%.
Therefore, need that exploitation one kind copes with data scale magnanimity, entity type variation, neologisms emerge in an endless stream badly New situation, with higher recall rate and accuracy rate, but also the name entity that can be obtained to identification carries out the life of role's distribution Name entity recognition method and name entity recognition device.
The content of the invention
This application provides a kind of recognition methods of name entity of many strategy fusions and device, to solve to advise in data In the case that mould magnanimity, entity type variation, neologisms emerge in an endless stream, accuracy rate and recall rate to naming Entity recognition It is low, and can not be to naming the problem of entity carries out role's distribution.
In a first aspect, this application provides a kind of recognition methods of the name entity of many strategy fusions, the recognition methods Including:
Obtain language material;
The name entity in the language material is recognized using the first identification model, the first recognition result is obtained;
The name entity in the language material is recognized using the second identification model, the second recognition result is obtained;
First recognition result and second recognition result are merged, the 3rd recognition result is obtained.
Alternatively, first identification model is conditional random field models.
Alternatively, the name entity in the identification language material using the first identification model, obtains the first recognition result Before step, in addition to:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain first identification using CRF kits Model.
Alternatively,
The name entity recognized using the second identification model in the language material, the step of obtaining the second recognition result is wrapped Include:
The language material is recognized using at least two identification models, every kind of identification model respectively obtains a sub- recognition result, Generate sub- recognition result list;
Judge whether the recognition result in the sub- recognition result list meets output condition, output second is known if meeting Other result;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value is the mode of at least two identification model.
Alternatively, the name entity recognized using the first identification model in the language material, obtains the first recognition result The step of include:
The language material is recognized using at least two identification models, every kind of identification model respectively obtains a sub- recognition result, Generate sub- recognition result list;
Judge whether the recognition result in the sub- recognition result list meets output condition, output first is known if meeting Other result;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value is the mode of at least two identification model.
Second identification model is conditional random field models;
Described using the second identification model identification language material, before the step of obtaining the second recognition result, in addition to:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain second identification using CRF kits Model.
Fusion first recognition result is with second recognition result, and the step of obtaining three recognition results is wrapped Include:
Judge whether first recognition result meets fusion conditions with second recognition result, merged if meeting, And export the result after fusion, i.e. the 3rd recognition result;
Alternatively, the fusion conditions are that first recognition result has identical name with second recognition result Entity.
Alternatively, also include after the 3rd recognition result is obtained:Using semantic digging system to the 3rd recognition result Carry out role's distribution, name entity of the generation with role.
Alternatively, the role is assigned as using semantic digging system, to naming entity point in the 3rd recognition result Not carry out role's mark, and respectively output with role name entity.
Alternatively, the semantic digging system includes regular expression and text.
Second aspect, the application also provides a kind of name entity recognition device of many strategy fusions, and the name entity is known Other device includes,
Language material acquiring unit, for obtaining language material;
First recognition unit, for recognizing the name entity in the language material using the first identification model, obtains the first knowledge Other result;
Second recognition unit, for recognizing the name entity in the language material using the second identification model, obtains the second knowledge Other result;
Recognition result integrated unit, for merging first recognition result and second recognition result, obtains the 3rd Recognition result.Alternatively, first identification model is conditional random field models.
Alternatively, first recognition unit also includes model training unit, and the model training unit is used for:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain first identification using CRF kits Model.
Alternatively, second recognition unit includes following subelement:
Many strategy recognition units, it is every kind of for recognizing the name entity in the language material using at least two identification models Identification model respectively obtains a sub- recognition result, generates sub- recognition result list;
Recognition result output unit, for judging whether the recognition result in the sub- recognition result list meets output bars Part, the second recognition result is exported if meeting.
Alternatively, the output condition be in the sub- recognition result list, it is identical name entity number reach it is pre- If value, wherein, the preset value is the mode of at least two identification model.
Alternatively, first recognition unit includes following subelement:
Many strategy recognition units, it is every kind of for recognizing the name entity in the language material using at least two identification models Identification model respectively obtains a sub- recognition result, generates sub- recognition result list;
Recognition result output unit, for judging whether the recognition result in the sub- recognition result list meets output bars Part, the first recognition result is exported if meeting;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value is the mode of at least two identification model.
Alternatively, second identification model is conditional random field models;
Also include model training unit in second recognition unit, the model training unit is used for:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain second identification using CRF kits Model.
Alternatively, the recognition result integrated unit, for judging that first recognition result is tied with the described second identification Whether fruit meets fusion conditions, is merged if meeting, and exports the result after fusion, i.e. the 3rd recognition result.
Alternatively, the fusion refers to increase on the basis of the first recognition result the name increased newly in the second recognition result Entity;
Alternatively, the fusion conditions be the second recognition result in exist on the basis of the first recognition result increase newly name Entity.
Alternatively, in addition to role's allocation unit, for being carried out using semantic digging system to the 3rd recognition result Role distributes, name entity of the generation with role.
Alternatively, role's allocation unit is used for using semantic digging system, to being named in the 3rd recognition result Entity carries out role's mark, and name entity of the output with role respectively respectively.
Alternatively, the semantic digging system includes regular expression and text.
Brief description of the drawings
In order to illustrate more clearly of the technical scheme of the application, letter will be made to the required accompanying drawing used in embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without having to pay creative labor, Other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 shows a kind of method flow of the name entity recognition method for many strategy fusions that the embodiment of the present application is provided Figure;
Fig. 2 shows the method flow diagram for the conditional random field models that the embodiment of the present application is provided;
Fig. 3 shows the structural representation for the name entity recognition device that the embodiment of the present application is provided;
Fig. 4 shows the structural representation for the computer system 400 that the embodiment of the present application is provided;
Fig. 5 shows the accuracy rate, recall rate and F value result line charts of experimental example 1;
Fig. 6 shows the accuracy rate, recall rate and F value result line charts of experimental example 2.
Embodiment
It is described in detail, will with these explanations becomes more with advantage the characteristics of the application below by the application To be clear, clear and definite.
Special word " exemplary " is meant " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.Although each of embodiment is shown in the drawings In terms of kind, but unless otherwise indicated, it is not necessary to accompanying drawing drawn to scale.
The application described below.
According to the first aspect of the application there is provided a kind of recognition methods of the name entity of many strategy fusions, first is utilized Name entity in the language material that identification model identification is obtained, obtains the first recognition result, described in the method that the application is provided First identification model can update and expand corpus, so as to identify the name entity newly produced in language material, Jin Ersuo The first recognition result is stated with higher accuracy rate, the name in the method identification language material of many identification model fusions is recycled Entity, obtains the second recognition result, merges first recognition result and the second recognition result obtains the 3rd recognition result, so that Realize and reliably identify name entity when data magnanimity, entity type variation, neologisms emerge in an endless stream, appoint Selection of land, recycles semantic digging system to carry out role's distribution to the 3rd recognition result, and exports the name entity with role, from And role's distribution is carried out to the name entity identified.
Specifically, as shown in figure 1, the name entity recognition method includes:
S101 obtains language material;
S102 recognizes the name entity in the language material using the first identification model, obtains the first recognition result;
S103 recognizes the name entity in the language material using the second identification model, obtains the second recognition result;
S104 merges first recognition result and second recognition result, obtains the 3rd recognition result;
Optionally, in addition to S105 carries out role's distribution, generation to the 3rd recognition result using semantic digging system Name entity with role.
In this application, the language material refers to the text for being used as training or identification.
In the application is a kind of preferred embodiment, first identification model is conditional random field models, i.e. CRF Model (Conditional Random Fields, conditional random field models), it has counted global probability in statistics, done Data are considered during normalization in global distribution, rather than only in local normalization, so as to avoid asking for marking bias Topic.
In this application, as shown in Fig. 2 when the first identification model selects CRF models, knowing using the first identification model Not described language material, obtains also including before the first recognition result:
S301 sets up corpus;
S302 carries out part-of-speech tagging and sequence labelling to the language material in the corpus;
Language material after mark as training data, is trained to obtain described first by S303 using CRF kits Identification model.
In this application, the corpus refers to the set of the language material of identification name entity, e.g., the people for public security system In name recognition method, language material stock is exactly notes set;For the language material stock in medical system name entity recognition method just It is case set;Corpus without specific area can also use the set for the language material that reptile obtains from network.
In this application, the corpus of setting up is imported including language material, the language material imported in above-mentioned corpus.
In this application, the language material in the corpus is processed into the form that can be recognized by CRF first, i.e. to language Material progress part-of-speech tagging and sequence labelling, the training text string and test text string obtained, wherein, the training text after mark This string is as training data, and the test text string after mark is used as test data.
In this application, when to CRF model trainings, the special characteristic of training data, then root are obtained according to feature templates It is trained according to special characteristic, part-of-speech tagging and sequence labelling result, obtains CRF models, above and below the special characteristic includes Literary feature, part of speech feature etc..
In this application, training result is tested using test data after being finished to CRF model trainings, when identification knot When the F values of fruit are below 0.8, training data and test data are reacquired, continues to train, training uses what is newly obtained after finishing Test data is tested, and when the F values of recognition result are less than 0.8, is repeated the above steps, until the F values of training result reach More than 0.8, deconditioning, so as to obtain the first identification model.
In the present embodiment, the name entity indicia that the first identification model identification is obtained has first position information.
In the present embodiment, the name entity recognized using the second identification model in the language material, obtains second The step of recognition result, includes:
The language material is recognized using at least two identification models, every kind of identification model respectively obtains a sub- recognition result, Generate sub- recognition result list;
Judge whether the recognition result in the sub- recognition result list meets output condition, output second is known if meeting Other result;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value is the mode of at least two identification model.
In this application, at least two identification model includes participle model and Named Entity Extraction Model.
In this application, the participle model includes nGram participle models (single order Markov Chain), HMM participle models (HMM), the participle model with new word discovery function.
In this application, the name physical model includes the Named Entity Extraction Model based on maximum entropy, based on structure Change the Named Entity Extraction Model of perceptron.
In this application, the nGram participle models obtain nGram statistical information by statistics first, then basis The statistical information carries out participle to the language material for needing to recognize name entity, and this method can be looked after and is possible to, but can also be made Index entry increase, such as " coming into search engine " can be divided into by 2-gram participles:Come into, enter to search, search for, indexing, engine.
In this application, the HMM participle models obtain HMM parameters, so by the participle training set marked The language material for needing to recognize name entity is explained using viterbi algorithm afterwards, word segmentation result is obtained, it is independent that the model is based on output Property is not it is assumed that consider contextual feature.
In this application, the participle model with new word discovery function is found by the identification model of rule or statistics Name entity in language material, but relatively depend on training corpus.
In this application, the Named Entity Extraction Model based on maximum entropy results in all constraintss that meet The model of Information Entropy Maximal in model, and can be by setting constraints to adjust model to the fitness of unknown data and right The fitting degree of given data, again, the problem of it can also solve parameter smoothing in statistical model naturally.But the model The timely air switch pin of calculation cost is larger, and Sparse Problem is than more serious.
In this application, feature extraction considers the overall situation in the Named Entity Extraction Model based on structuring perceptron Structuring output so that model can carry out the overall situation structuring study.
In the present embodiment, the name entity that at least two identification model is identified is marked with the second place respectively Information.
In this application, the output condition is that in the sub- recognition result list, the number of identical name entity reaches To preset value, wherein, it is whether identical with the whether identical name entity for judging that various identification models are identified of second place information, The preset value is the mode of at least two identification model.
Therefore, the recognition result obtained by above-mentioned model is merged, the intrinsic deficiency of each model itself can be made up so that is known Other result is optimal.
In this application, described is to be determined by the F values of experimental result, as shown in the application experimental example 1, when using essence Quasi- segmentation methods (combining language model, sequence labelling and HMM), the participle with new word discovery function are calculated During the name entity identification algorithms of method and structuring perceptron, mode takes 3, as a result optimal.
Applicants have discovered that, judge whether recognition result, energy in the output sub- recognition result list using output condition The enough farthest other result of deletion misrecognition, such as wrong identification, so as to improve the recall rate of final recognition result.
Applicants have discovered that, the language material is recognized using at least two identification models, name can be more accurately identified Entity, so that multiple weak identification models are combined into one strong identification model, is supplemented basic result, and then improves identification As a result.
In the application another preferred embodiment, the life recognized using the first identification model in the language material Name entity, the step of obtaining the first recognition result can also be:
The language material is recognized using at least two identification models, every kind of identification model respectively obtains a sub- recognition result, Generate sub- recognition result list;
Judge whether the recognition result in the sub- recognition result list meets output condition, output first is known if meeting Other result;
In the present embodiment, the name entity that at least two identification model is identified is marked with first position respectively Information.
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value whether identical with the whether identical name entity for judging that various identification models are identified of first position information For the mode of at least two identification model.
In the present embodiment, second identification model is conditional random field models, preferably conditional random field models.
In the present embodiment, mark has information on second recognition result.
In this application, fusion first recognition result and second recognition result, obtain the 3rd identification knot The step of fruit, includes:
Judge whether first recognition result meets fusion conditions with second recognition result, merged if meeting, And export the result after fusion, i.e. the 3rd recognition result.
Applicants have discovered that, the first recognition result is merged with the second recognition result Ji Wei removing the first recognition result and the The name entity repeated in two recognition results, so as to avoid the redundancy of data, and then improves the accuracy rate of identification and recalls Rate.
In this application, the fusion refers to increase on the basis of the first recognition result what is increased newly in the second recognition result Name entity.
In this application, the fusion conditions be the second recognition result in exist on the basis of the first recognition result increase newly Name entity.
In the application is a kind of preferred embodiment, judge whether second place information and first position information are identical, If it is different, the name entity increased newly in then judging the name entity for the second recognition result.
Alternatively, the semantic digging system, to naming entity to carry out role's mark respectively in the 3rd recognition result, And output has the name entity of role respectively.
In this application, the semantic digging system can not only carry out role's distribution, additionally it is possible to name Entity recognition As a result judged, determine whether it is name entity.
The semantic digging system includes regular expression and text.
For the recognition methods for the name entity for being more fully understood by many strategy fusions described herein, one is set forth below Specific embodiment is illustrated.
Set up corpus.
Part-of-speech tagging and sequence labelling, wherein sequence mark are carried out to each subordinate sentence in the language material in corpus, i.e. language material The corresponding word of entity will be named to be labeled with B, M, E during note, remaining word is marked with S, the training text string of acquisition.Assuming that one Training text string is " check to have in discovery satchel through people's police and three see identity card perhaps ", and annotation results are as shown in table 1.
The text string of table 1 marks example
Using the corresponding annotation results of a large amount of training text strings as training data, it is trained using CRF.
Assuming that the user's input language material being currently received is " victim Ni Chengang alarms claim to find mobile phone not in Qinghe Oak Tree gulf See ".The CRF models obtained using preceding step are named Entity recognition to user input language material, can be named Entity " Ni Chengang ".
The method learnt afterwards using a variety of method integrations carries out supplement amendment to CRF results, and such as accurate word segmentation result will Name Entity recognition in upper example is " Ni Chen ", and structuring perceptron recognition result is " Ni Chengang ", with new word discovery function Recognition result is " Ni Chengang ", and mode is taken to the recognition result of several method, it may be determined that name Entity recognition result is " Ni Chen Just ", rather than " Ni Chen ".
By regular expression present in semantic digging system or text, such as " victim's alarm ", on the one hand it can determine " Ni Chengang " is correct name Entity recognition result, on the other hand can by role be defined as " victim ".
According to the second aspect of the application, as shown in figure 3, additionally providing a kind of name Entity recognition dress of many strategy fusions Put, the name entity recognition device of many strategy fusions includes,
Language material acquiring unit 201, for obtaining language material;
First recognition unit 202, for recognizing the name entity in the language material using the first identification model, obtains first Recognition result;
Second recognition unit 203, for recognizing the name entity in the language material using the second identification model, obtains second Recognition result;
Recognition result integrated unit 204, for merging first recognition result and second recognition result, obtains Three recognition results;
Optionally, in addition to role's allocation unit 205, for being entered using semantic digging system to the 3rd recognition result Row role distributes, name entity of the generation with role.
In a kind of optional embodiment of the application, first identification model is conditional random field models.
Alternatively, first recognition unit also includes model training unit, and the model training unit is used for:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain first identification using CRF kits Model.
Alternatively, second recognition unit includes following subelement:
Many strategy recognition units, it is every kind of for recognizing the name entity in the language material using at least two identification models Identification model respectively obtains a sub- recognition result, generates sub- recognition result list;
Recognition result output unit, for judging whether the recognition result in the sub- recognition result list meets output bars Part, the second recognition result is exported if meeting;
Alternatively, the output condition be in the sub- recognition result list, it is identical name entity number reach it is pre- If value, wherein, the preset value is the mode of at least two identification model.
In another optional embodiment of the application, first recognition unit includes following subelement:
Many strategy recognition units, it is every kind of for recognizing the name entity in the language material using at least two identification models Identification model respectively obtains a sub- recognition result, generates sub- recognition result list;
Recognition result output unit, for judging whether the recognition result in the sub- recognition result list meets output bars Part, the first recognition result is exported if meeting;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, its In, the preset value is the mode of at least two identification model.
Alternatively, second identification model is conditional random field models;
Also include model training unit in second recognition unit, the model training unit is used for:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain second identification using CRF kits Model.
Alternatively, the recognition result integrated unit, for judging that first recognition result is tied with the described second identification Whether fruit meets fusion conditions, is merged if meeting, and exports the result after fusion, i.e. the 3rd recognition result.
Alternatively, the fusion conditions are that the second recognition result and the first recognition result have identical name entity.
Alternatively, role's allocation unit is used for using semantic digging system, to being named in the 3rd recognition result Entity carries out role's mark, and name entity of the output with role respectively respectively.
Alternatively, the semantic digging system includes regular expression and text.
Fig. 4 show can thereon implement embodiment computer system 400 block diagram.Computer system 400 is wrapped Include processor 410, storage medium 420, system storage 430, monitor 440, keyboard 450, mouse 460, the and of network interface 420 Video adapter 480.These parts are coupled by system bus 490.
Storage medium 420 (such as hard disk) stores multiple programs, including operating system, application program and other program moulds Block.User can input into computer system 400 order and information by input equipment, input equipment be, for example, keyboard 450, Touch pad (not shown) and mouse 460.Text and graphical information are shown using monitor 440.
Operating system is on processor 410 and for coordinating and providing in the personal computer system 400 in Fig. 6 Various parts control.Furthermore, it is possible in computer system 400 using computer program with implement it is above-mentioned it is various implement Example.
It would be recognized that hardware component shown in Fig. 4 is only for illustrative purposes, and physical unit may be according to be real The computing device applying the application and dispose and change.
In addition, computer system 400 for example can be desktop computer, server computer, laptop computer or nothing Line equipment, such as mobile phone, personal digital assistant (PDA), handheld computer.
The embodiment provides a kind of effective ways that name entity is extracted in the case of given document collected works.Implement Example solve from the webpage typically organized with least cost extract any types entity the problem of.The weighting name entity proposed Figure can be encoded to the complex relationship between each name entity and the type of other entities, therefore propagate seed on the diagram Confidence level can make up the shortage of network size redundancy, and effective size of the organization can be supported to extract.Furthermore, it is possible to will life Confidence spread on name sterogram is transformed into efficient matrix computations, and it can support the high efficiency extraction on extensive collected works.
It would be recognized that the embodiment in the range of the application can be embodied as to the form of computer program product, computer Program product includes computer executable instructions, such as program code, and it can run on any with reference to appropriate operating system In appropriate computing environment, operating system is, for example, Microsoft Windows, Linux or UNIX operating system.The application scope Interior embodiment can also include program product, and program product includes computer-readable medium can for carrying or storing computer Execute instruction or data structure are thereon.Such computer-readable medium can be it is any can by it is universal or special calculate The usable medium that machine is accessed.For example, such computer-readable medium can include RAM, ROM, EPROM, EEPROM, CD- ROM, magnetic disk storage or other storage devices, or can be used in carrying with form of computer-executable instructions or store desired Program code and any other medium that can be accessed by universal or special computer.
Experimental example
Influence of the mode value to F values when experimental example 1 second is recognized
Used in the second identification step during the second identification in this experimental example, preset value is different, final name entity Recognition result significant difference, this experimental example has investigated influence of the preset value to name Entity recognition result.
The preset value is the mode of at least two identification model;
The name Entity recognition result is weighed by F values, and F values are higher, and recognition result is more reliable, wherein,
The name entity number of accuracy rate (P)=correct number/machine recognition of name Entity recognition,
Name entity number in recall rate (R)=correct number/testing material of name Entity recognition.
F values=2*P*R/ (P+R).
Identification model used during the second identification includes accurate segmentation methods, with new word discovery function in this experimental example The name entity identification algorithms of segmentation methods, structuring perceptron, wherein,
Accurate participle is the segmentation methods of a kind of combination language model, sequence labelling and HMM, it is preferable that Thick cutting is carried out first by N-gram and HMM, CRF is then reused and fritter point;
Segmentation methods with new word discovery function find the neologisms in text by the identification model of rule or statistics;
The problem of structuring perceptron is used to solve sequence labelling.
The result of this experimental example as shown in Fig. 5 and table 1,
Influence of the preset value of table 1 to name Entity recognition result
In Figure 5, broken line A is the corresponding recall rate broken line of each preset value;Broken line B shows the corresponding F values folding of each preset value Line;Broken line C is the corresponding accuracy rate broken line of each preset value.
From Fig. 5 and table 1, in this experimental example, when mode value is 3, F values reach maximum.
Entity recognition result is named when each identification model of experimental example 2 is used alone
A kind of result of identification model to name Entity recognition is used alone in the test of this experimental example, to contrast single identification mould Type merges the reliability of two kinds of name entity recognition methods with many identification models.
Identification model used is respectively CRF identification models used in preliminary identification, the second identification in this experimental example The middle accurate segmentation methods used, the segmentation methods with new word discovery function, the name Entity recognition of structuring perceptron are calculated Method, as a result as shown in Fig. 6 and table 2.
The single identification model of table 2 names the reliability of entity recognition method
In figure 6, broken line A is the corresponding recall rate broken line of each recognition methods;Broken line B shows the corresponding F of each recognition methods It is worth broken line;Broken line C is the corresponding accuracy rate broken line of each recognition methods.
From Fig. 6 and table 2, the name entity recognition method merged with many identification models (name of i.e. many strategy fusions Entity recognition method) and (experimental example 1, mode be 3 result) compare, single identification model name entity recognition method F values compared with It is low, i.e. the name Entity recognition result that the name entity recognition method of many identification models fusion provided with the application is obtained is more To be reliable and stably.
The name Entity recognition result of each identification model of the application method of experimental example 3
This experimental example utilizes the method that the application is provided, and the first recognition result, the second recognition result and the 3rd are calculated respectively Accuracy rate, recall rate and the F values of recognition result, it is as a result as shown in table 3 below.
The name Entity recognition result of each identification model of the application method of table 3
As shown in Table 3, the method provided according to the application, on the basis of the first recognition result and the second recognition result The 3rd recognition result arrived, its accuracy rate, recall rate and F values have raising by a relatively large margin, i.e. the method that the application is provided The new situations such as data scale magnanimity, entity type are diversified, neologisms emerge in an endless stream are coped with, with higher recall rate and standard True rate.
The name entity recognition method and identifying device of many strategy fusions provided according to the application, with following beneficial effect Really:
(1) scheme that the application is provided can be named entity knowledge by preliminary identification step to new data or frontier Not, so as to adapt to when data scale magnanimity, entity type variation, neologisms emerge in an endless stream to name Entity recognition Demand;
(2) second identification steps name the fusion of entity recognition method by many identification models, by multiple weak identification models One strong identification model is combined into, the first recognition result is supplemented, so as to improve recognition result accuracy rate and recall rate;
(3) the name entity obtained using semantic digging system to the second identification carries out role's mark, so as to obtain role Name entity after distribution;
(4) method that the application is provided, which can be migrated easily into new data and frontier, uses;
(5) method that the application is provided has higher accuracy rate and recall rate, and its F value is up to more than 0.8.
The application is described in detail above in association with embodiment and exemplary example, but these explanations are simultaneously It is not intended that the limitation to the application.It will be appreciated by those skilled in the art that in the case of without departing from the application spirit and scope, A variety of equivalencings, modification can be carried out to technical scheme and embodiments thereof or is improved, these each fall within the application In the range of.The protection domain of the application is determined by the appended claims.

Claims (10)

1. a kind of recognition methods of the name entity of many strategy fusions, it is characterised in that including:
Obtain language material;
The name entity in the language material is recognized using the first identification model, the first recognition result is obtained;
The name entity in the language material is recognized using the second identification model, the second recognition result is obtained;
First recognition result and second recognition result are merged, the 3rd recognition result is obtained.
2. recognition methods according to claim 1, it is characterised in that
First identification model is conditional random field models;
Before name entity in the identification language material using the first identification model, the step of obtaining the first recognition result, also Including:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained using CRF kits, obtains first identification model.
3. recognition methods according to claim 2, it is characterised in that described to recognize the language material using the second identification model In name entity, the step of obtaining the second recognition result include:
The name entity in the language material is recognized using at least two identification models, every kind of identification model respectively obtains a son knowledge Other result, generates sub- recognition result list;
Judge whether the recognition result in the sub- recognition result list meets output condition, the identification of output second knot if meeting Really;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, wherein, institute State the mode that preset value is at least two identification model.
4. recognition methods according to claim 3, it is characterised in that fusion first recognition result and described the Two recognition results, the step of obtaining three recognition results includes:
Judge whether first recognition result meets fusion conditions with second recognition result, merged if meeting, and it is defeated The result gone out after fusion, i.e. the 3rd recognition result;
The fusion refers to increase on the basis of the first recognition result the name entity increased newly in the second recognition result;
The fusion conditions be the second recognition result in exist on the basis of the first recognition result increase newly name entity.
5. recognition methods according to claim 1, it is characterised in that also include after the 3rd recognition result is obtained:
Role's distribution is carried out to the 3rd recognition result using semantic digging system, generation has the name entity of role, its In,
The role is assigned as using semantic digging system, to naming entity to carry out role's mark respectively in the 3rd recognition result Note, and name entity of the output with role respectively;
The semantic digging system includes regular expression and text.
6. a kind of name entity recognition device of many strategy fusions, it is characterised in that the name entity recognition device includes,
Language material acquiring unit, for obtaining language material;
First recognition unit, for recognizing the name entity in the language material using the first identification model, obtains the first identification knot Really;
Second recognition unit, for recognizing the name entity in the language material using the second identification model, obtains the second identification knot Really;
Recognition result integrated unit, for merging first recognition result and second recognition result, obtains the 3rd identification As a result.
7. identifying device according to claim 6, it is characterised in that
First identification model is conditional random field models;
Also include model training unit in first recognition unit, the model training unit is used for:
Set up corpus;
Part-of-speech tagging and sequence labelling are carried out to the language material in the corpus;
Using the language material after mark as training data, it is trained to obtain first identification model using CRF kits.
8. identifying device according to claim 7, it is characterised in that second recognition unit includes following subelement:
Many strategy recognition units, for recognizing the name entity in the language material, every kind of identification using at least two identification models Model respectively obtains a sub- recognition result, generates sub- recognition result list;
Recognition result output unit, for judging whether the recognition result in the sub- recognition result list meets output condition, The second recognition result is exported if meeting;
The output condition is that in the sub- recognition result list, the number of identical name entity reaches preset value, wherein, institute State the mode that preset value is at least two identification model.
9. identifying device according to claim 8, it is characterised in that
The recognition result integrated unit, melts for judging first recognition result with whether second recognition result meets Conjunction condition, is merged if meeting, and exports the result after fusion, i.e. the 3rd recognition result;
The fusion refers to increase on the basis of the first recognition result the name entity increased newly in the second recognition result;
The fusion conditions be the second recognition result in exist on the basis of the first recognition result increase newly name entity.
10. identifying device according to claim 6, it is characterised in that also including role's allocation unit, for utilizing semanteme Digging system carries out role's distribution to the 3rd recognition result, and generation has the name entity of role, wherein,
Role's allocation unit is used for using semantic digging system, to naming entity to carry out respectively in the 3rd recognition result Role marks, and name entity of the output with role respectively;
The semantic digging system includes regular expression and text.
CN201710447439.2A 2017-06-14 2017-06-14 The recognition methods of the name entity of more strategy fusions and device Active CN107330011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710447439.2A CN107330011B (en) 2017-06-14 2017-06-14 The recognition methods of the name entity of more strategy fusions and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710447439.2A CN107330011B (en) 2017-06-14 2017-06-14 The recognition methods of the name entity of more strategy fusions and device

Publications (2)

Publication Number Publication Date
CN107330011A true CN107330011A (en) 2017-11-07
CN107330011B CN107330011B (en) 2019-03-26

Family

ID=60195026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710447439.2A Active CN107330011B (en) 2017-06-14 2017-06-14 The recognition methods of the name entity of more strategy fusions and device

Country Status (1)

Country Link
CN (1) CN107330011B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108350A (en) * 2017-11-29 2018-06-01 北京小米移动软件有限公司 Name word recognition method and device
CN108170674A (en) * 2017-12-27 2018-06-15 东软集团股份有限公司 Part-of-speech tagging method and apparatus, program product and storage medium
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108388638A (en) * 2018-02-26 2018-08-10 出门问问信息科技有限公司 Semantic analytic method, device, equipment and storage medium
CN108829681A (en) * 2018-06-28 2018-11-16 北京神州泰岳软件股份有限公司 A kind of name entity extraction method and device
CN109086274A (en) * 2018-08-23 2018-12-25 电子科技大学 English social media short text time expression recognition method based on restricted model
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
CN109791570A (en) * 2018-12-13 2019-05-21 香港应用科技研究院有限公司 Efficiently and accurately name entity recognition method and device
CN109815296A (en) * 2018-12-29 2019-05-28 北京中科闻歌科技股份有限公司 The personage's construction of knowledge base method, apparatus and storage medium of notarization document
CN109886270A (en) * 2019-01-17 2019-06-14 大连理工大学 A kind of case element recognition methods towards electronics folder notes text
CN110489727A (en) * 2019-07-12 2019-11-22 深圳追一科技有限公司 Name recognition methods and relevant apparatus
CN110569332A (en) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 Sentence feature extraction processing method and device
CN110688467A (en) * 2019-08-23 2020-01-14 北京百度网讯科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN110750990A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Entity identification corpus labeling method, system, device and storage medium
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN111125438A (en) * 2019-12-25 2020-05-08 北京百度网讯科技有限公司 Entity information extraction method and device, electronic equipment and storage medium
CN111178073A (en) * 2018-10-23 2020-05-19 北京嘀嘀无限科技发展有限公司 Text processing method and device, electronic equipment and storage medium
CN111178075A (en) * 2019-12-19 2020-05-19 厦门快商通科技股份有限公司 Online customer service log analysis method, device and equipment
CN111368541A (en) * 2018-12-06 2020-07-03 北京搜狗科技发展有限公司 Named entity identification method and device
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111400429A (en) * 2020-03-09 2020-07-10 北京奇艺世纪科技有限公司 Text entry searching method, device, system and storage medium
CN111488737A (en) * 2019-01-09 2020-08-04 阿里巴巴集团控股有限公司 Text recognition method, device and equipment
CN111797629A (en) * 2020-06-23 2020-10-20 平安医疗健康管理股份有限公司 Medical text data processing method and device, computer equipment and storage medium
WO2020215456A1 (en) * 2019-04-26 2020-10-29 网宿科技股份有限公司 Text labeling method and device based on teacher forcing
CN112270173A (en) * 2020-10-27 2021-01-26 北京百度网讯科技有限公司 Character mining method and device in text, electronic equipment and storage medium
EP3748548A4 (en) * 2019-04-26 2021-03-10 Wangsu Science & Technology Co., Ltd. Adversarial learning-based text annotation method and device
CN112541065A (en) * 2020-12-11 2021-03-23 浙江汉德瑞智能科技有限公司 Medical new word discovery processing method based on representation learning
CN113051918A (en) * 2019-12-26 2021-06-29 北京中科闻歌科技股份有限公司 Named entity identification method, device, equipment and medium based on ensemble learning
CN113127060A (en) * 2021-04-09 2021-07-16 中通服软件科技有限公司 Software function point identification method based on natural language pre-training model (BERT)
CN113127645A (en) * 2021-04-09 2021-07-16 厦门渊亭信息科技有限公司 Automatic extraction method of large-scale knowledge graph ontology, terminal equipment and storage medium
CN113971216A (en) * 2021-10-22 2022-01-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and memory

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110035210A1 (en) * 2009-08-10 2011-02-10 Benjamin Rosenfeld Conditional random fields (crf)-based relation extraction system
CN102033879A (en) * 2009-09-27 2011-04-27 腾讯科技(深圳)有限公司 Method and device for identifying Chinese name
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN104572631A (en) * 2014-12-03 2015-04-29 北京捷通华声语音技术有限公司 Training method and system for language model
CN104933152A (en) * 2015-06-24 2015-09-23 北京京东尚科信息技术有限公司 Named entity recognition method and device
CN106202255A (en) * 2016-06-30 2016-12-07 昆明理工大学 Merge the Vietnamese name entity recognition method of physical characteristics
CN106326206A (en) * 2015-06-24 2017-01-11 北京京东尚科信息技术有限公司 Entity extraction method based on grammar templates
CN106503192A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Name entity recognition method and device based on artificial intelligence
CN106570132A (en) * 2016-10-27 2017-04-19 浙江大学 Document vector learning method with fusion of mentioned entity information
CN106649272A (en) * 2016-12-23 2017-05-10 东北大学 Named entity recognizing method based on mixed model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110035210A1 (en) * 2009-08-10 2011-02-10 Benjamin Rosenfeld Conditional random fields (crf)-based relation extraction system
CN102033879A (en) * 2009-09-27 2011-04-27 腾讯科技(深圳)有限公司 Method and device for identifying Chinese name
CN103309926A (en) * 2013-03-12 2013-09-18 中国科学院声学研究所 Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN104572631A (en) * 2014-12-03 2015-04-29 北京捷通华声语音技术有限公司 Training method and system for language model
CN104933152A (en) * 2015-06-24 2015-09-23 北京京东尚科信息技术有限公司 Named entity recognition method and device
CN106326206A (en) * 2015-06-24 2017-01-11 北京京东尚科信息技术有限公司 Entity extraction method based on grammar templates
CN106202255A (en) * 2016-06-30 2016-12-07 昆明理工大学 Merge the Vietnamese name entity recognition method of physical characteristics
CN106570132A (en) * 2016-10-27 2017-04-19 浙江大学 Document vector learning method with fusion of mentioned entity information
CN106503192A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Name entity recognition method and device based on artificial intelligence
CN106649272A (en) * 2016-12-23 2017-05-10 东北大学 Named entity recognizing method based on mixed model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栾悉道: "《多媒体情报处理技术》", 31 May 2016, 国防工业出版社 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108350A (en) * 2017-11-29 2018-06-01 北京小米移动软件有限公司 Name word recognition method and device
CN108170674A (en) * 2017-12-27 2018-06-15 东软集团股份有限公司 Part-of-speech tagging method and apparatus, program product and storage medium
CN108388638A (en) * 2018-02-26 2018-08-10 出门问问信息科技有限公司 Semantic analytic method, device, equipment and storage medium
CN108388638B (en) * 2018-02-26 2020-09-18 出门问问信息科技有限公司 Semantic parsing method, device, equipment and storage medium
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108829681A (en) * 2018-06-28 2018-11-16 北京神州泰岳软件股份有限公司 A kind of name entity extraction method and device
CN108829681B (en) * 2018-06-28 2022-11-11 鼎富智能科技有限公司 Named entity extraction method and device
CN109086274A (en) * 2018-08-23 2018-12-25 电子科技大学 English social media short text time expression recognition method based on restricted model
CN109086274B (en) * 2018-08-23 2020-06-26 电子科技大学 English social media short text time expression recognition method based on constraint model
CN111178073A (en) * 2018-10-23 2020-05-19 北京嘀嘀无限科技发展有限公司 Text processing method and device, electronic equipment and storage medium
CN111178073B (en) * 2018-10-23 2024-06-04 北京嘀嘀无限科技发展有限公司 Text processing method, device, electronic equipment and storage medium
CN109543153A (en) * 2018-11-13 2019-03-29 成都数联铭品科技有限公司 A kind of sequence labelling system and method
CN109543153B (en) * 2018-11-13 2023-08-18 成都数联铭品科技有限公司 Sequence labeling system and method
CN111368541B (en) * 2018-12-06 2024-06-11 北京搜狗科技发展有限公司 Named entity identification method and device
CN111368541A (en) * 2018-12-06 2020-07-03 北京搜狗科技发展有限公司 Named entity identification method and device
CN109791570A (en) * 2018-12-13 2019-05-21 香港应用科技研究院有限公司 Efficiently and accurately name entity recognition method and device
CN111382570B (en) * 2018-12-28 2024-05-03 深圳市优必选科技有限公司 Text entity recognition method, device, computer equipment and storage medium
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN109815296A (en) * 2018-12-29 2019-05-28 北京中科闻歌科技股份有限公司 The personage's construction of knowledge base method, apparatus and storage medium of notarization document
CN111488737A (en) * 2019-01-09 2020-08-04 阿里巴巴集团控股有限公司 Text recognition method, device and equipment
CN111488737B (en) * 2019-01-09 2023-04-14 阿里巴巴集团控股有限公司 Text recognition method, device and equipment
CN109886270A (en) * 2019-01-17 2019-06-14 大连理工大学 A kind of case element recognition methods towards electronics folder notes text
CN109886270B (en) * 2019-01-17 2022-03-01 大连理工大学 Case element identification method for electronic file record text
WO2020215456A1 (en) * 2019-04-26 2020-10-29 网宿科技股份有限公司 Text labeling method and device based on teacher forcing
EP3748548A4 (en) * 2019-04-26 2021-03-10 Wangsu Science & Technology Co., Ltd. Adversarial learning-based text annotation method and device
EP3751445A4 (en) * 2019-04-26 2021-03-10 Wangsu Science & Technology Co., Ltd. Text labeling method and device based on teacher forcing
CN110489727B (en) * 2019-07-12 2023-07-07 深圳追一科技有限公司 Person name recognition method and related device
CN110489727A (en) * 2019-07-12 2019-11-22 深圳追一科技有限公司 Name recognition methods and relevant apparatus
CN110688467A (en) * 2019-08-23 2020-01-14 北京百度网讯科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN110569332A (en) * 2019-09-09 2019-12-13 腾讯科技(深圳)有限公司 Sentence feature extraction processing method and device
CN110569332B (en) * 2019-09-09 2023-01-06 腾讯科技(深圳)有限公司 Sentence feature extraction processing method and device
CN110750990B (en) * 2019-09-17 2024-09-06 平安科技(深圳)有限公司 Labeling method, system, device and storage medium for entity recognition corpus
CN110750990A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Entity identification corpus labeling method, system, device and storage medium
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN110750991B (en) * 2019-09-18 2022-04-15 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN111178075A (en) * 2019-12-19 2020-05-19 厦门快商通科技股份有限公司 Online customer service log analysis method, device and equipment
CN111125438A (en) * 2019-12-25 2020-05-08 北京百度网讯科技有限公司 Entity information extraction method and device, electronic equipment and storage medium
CN111125438B (en) * 2019-12-25 2023-06-27 北京百度网讯科技有限公司 Entity information extraction method and device, electronic equipment and storage medium
CN113051918B (en) * 2019-12-26 2024-05-14 北京中科闻歌科技股份有限公司 Named entity recognition method, device, equipment and medium based on ensemble learning
CN113051918A (en) * 2019-12-26 2021-06-29 北京中科闻歌科技股份有限公司 Named entity identification method, device, equipment and medium based on ensemble learning
CN111400429B (en) * 2020-03-09 2023-06-30 北京奇艺世纪科技有限公司 Text entry searching method, device, system and storage medium
CN111400429A (en) * 2020-03-09 2020-07-10 北京奇艺世纪科技有限公司 Text entry searching method, device, system and storage medium
CN111797629A (en) * 2020-06-23 2020-10-20 平安医疗健康管理股份有限公司 Medical text data processing method and device, computer equipment and storage medium
CN111797629B (en) * 2020-06-23 2022-07-29 平安医疗健康管理股份有限公司 Method and device for processing medical text data, computer equipment and storage medium
CN112270173A (en) * 2020-10-27 2021-01-26 北京百度网讯科技有限公司 Character mining method and device in text, electronic equipment and storage medium
CN112270173B (en) * 2020-10-27 2021-10-26 北京百度网讯科技有限公司 Character mining method and device in text, electronic equipment and storage medium
CN112541065A (en) * 2020-12-11 2021-03-23 浙江汉德瑞智能科技有限公司 Medical new word discovery processing method based on representation learning
CN113127645A (en) * 2021-04-09 2021-07-16 厦门渊亭信息科技有限公司 Automatic extraction method of large-scale knowledge graph ontology, terminal equipment and storage medium
CN113127060A (en) * 2021-04-09 2021-07-16 中通服软件科技有限公司 Software function point identification method based on natural language pre-training model (BERT)
CN113971216A (en) * 2021-10-22 2022-01-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and memory

Also Published As

Publication number Publication date
CN107330011B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN110097085B (en) Lyric text generation method, training method, device, server and storage medium
CN106919673B (en) Text mood analysis system based on deep learning
US11113323B2 (en) Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering
CN109960800A (en) Weakly supervised text classification method and device based on active learning
CN104050160B (en) Interpreter's method and apparatus that a kind of machine is blended with human translation
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN110457689B (en) Semantic processing method and related device
CN108647225A (en) A kind of electric business grey black production public sentiment automatic mining method and system
CN108563703A (en) A kind of determination method of charge, device and computer equipment, storage medium
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN106570180A (en) Artificial intelligence based voice searching method and device
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN114818717B (en) Chinese named entity recognition method and system integrating vocabulary and syntax information
CN110321561A (en) A kind of keyword extracting method and device
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN107357765A (en) Word document flaking method and device
CN105988978B (en) Determine the method and system of text focus
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
CN115714002B (en) Training method for depression risk detection model, depression symptom early warning method and related equipment
CN106933802A (en) A kind of social security class entity recognition method and device towards multi-data source
CN110610003A (en) Method and system for assisting text annotation
CN112036186A (en) Corpus labeling method and device, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhao Honghong

Inventor after: Wang Mengmeng

Inventor after: Jin Yaohong

Inventor after: Jiang Hongfei

Inventor after: Yang Kaicheng

Inventor after: Dong Mingtao

Inventor before: Zhao Honghong

Inventor before: Wang Mengmeng

Inventor before: Jin Yaohong

Inventor before: Jiang Hongfei

Inventor before: Yang Kaicheng

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190904

Address after: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Patentee after: China Science and Technology (Beijing) Co., Ltd.

Address before: Room 601, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Co-patentee before: China Science and Technology (Beijing) Co., Ltd.

Patentee before: Beijing Shenzhou Taiyue Software Co., Ltd.

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Patentee after: Dingfu Intelligent Technology Co., Ltd

Address before: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Patentee before: DINFO (BEIJING) SCIENCE DEVELOPMENT Co.,Ltd.

CP03 Change of name, title or address