CN113158654B

CN113158654B - Domain model extraction method and device and readable storage medium

Info

Publication number: CN113158654B
Application number: CN202011301741.5A
Authority: CN
Inventors: 杜佳诺; 连小利; 张莉; 赵子岩; 张航; 樊志强; 李华莹; 刘必欣; 张捷
Original assignee: Beihang University; CETC 15 Research Institute; Research Institute of War of PLA Academy of Military Science
Current assignee: Beihang University; CETC 15 Research Institute; Research Institute of War of PLA Academy of Military Science
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2022-04-29
Anticipated expiration: 2040-11-19
Also published as: CN113158654A

Abstract

The invention discloses a method, device and readable storage medium for extracting a domain model, wherein the method comprises: grammatically analyzing a requirement document to determine the dependency between word segments; determining the relationship between concepts according to the dependency between the word segments The semantic relationship between the concepts; the corresponding domain model is determined according to the semantic relationship between the concepts. In the present invention, the dependency relationship between the word segments in the requirement document is determined; the semantic relationship between concepts is determined according to the dependency relationship between the word segments; the corresponding domain model is determined according to the semantic relationship between the concepts, thereby improving the The extraction accuracy of the domain model.

Description

Domain model extraction method, device and readable storage medium

技术领域technical field

本发明涉及自然语言识别技术领域，尤其涉及一种领域模型提取方法、装置及可读存储介质。The present invention relates to the technical field of natural language recognition, and in particular, to a method, device and readable storage medium for extracting a domain model.

背景技术Background technique

领域模型是对领域中重要概念及其间关系的可视化表示，在软件开发的分析阶段用于分析如何满足系统的功能性需求。根据需要，领域模型可以使用UML类图、用例图、本体等来表示。领域模型主要由概念、属性和关系组成。概念表示现实世界中实体或事件，概念的属性是概念表示的实体所包含的逻辑数据，概念之间的各种关系表示概念所表示的实体之间存在的语义联系或交互行为，常见的关系包括关联关系、聚合关系、继承关系等。A domain model is a visual representation of important concepts in the domain and their relationships, and is used in the analysis phase of software development to analyze how to meet the functional requirements of the system. Domain models can be represented using UML class diagrams, use case diagrams, ontologies, etc., as needed. The domain model mainly consists of concepts, attributes and relationships. Concepts represent entities or events in the real world. The attributes of concepts are logical data contained in entities represented by concepts. Various relationships between concepts represent semantic connections or interactive behaviors between entities represented by concepts. Common relationships include Association relationship, aggregation relationship, inheritance relationship, etc.

领域模型提供了有关构成领域的基础术语的结构化知识。并且，系统的设计，尤其是在基于模型的开发环境中，通常会围绕领域模型成型。正确地识别概念以及概念之间的关系，能够在软件开发过程中帮助分析系统架构，降低开发难度，减少代码的冗余，还能够帮助开发人员分析需求本身存在的不一致、不完整等问题。开发人员在建立领域模型时，需要反复检查需求文档，确保建立的领域模型与需求相一致，并且需要确保与需求相关的所有概念和关系都包含在领域模型中。而对于大型应用程序而言，人工构建领域模型是一项十分艰巨的任务。A domain model provides structured knowledge about the underlying terms that make up the domain. Also, the design of a system, especially in a model-based development environment, is often shaped around a domain model. Correctly identifying concepts and the relationship between concepts can help analyze the system architecture in the software development process, reduce development difficulty, reduce code redundancy, and help developers analyze the inconsistency and incompleteness of the requirements themselves. When developers build a domain model, they need to check the requirements document repeatedly to ensure that the established domain model is consistent with the requirements, and to ensure that all concepts and relationships related to the requirements are included in the domain model. For large applications, building a domain model manually is a daunting task.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种领域模型提取方法、装置及可读存储介质，用以提高提取领域模型的准确率。Embodiments of the present invention provide a method, an apparatus, and a readable storage medium for extracting a domain model, so as to improve the accuracy of extracting a domain model.

第一方面，本发明实施例提供一种领域模型提取方法，包括：In a first aspect, an embodiment of the present invention provides a method for extracting a domain model, including:

对需求文档进行语法分析，确定分词之间的依赖关系；Perform grammatical analysis on the requirements document to determine the dependencies between the word segmentations;

根据所述分词之间的依赖关系确定概念之间的语义关系；Determine the semantic relationship between concepts according to the dependency relationship between the word segmentations;

根据所述概念之间的语义关系确定对应的领域模型。The corresponding domain model is determined according to the semantic relationship between the concepts.

可选的，对需求文档进行语法分析，包括：Optionally, perform syntax analysis on the requirements document, including:

将所述需求文档进行分解，获得对应的分词；Decompose the requirement document to obtain the corresponding word segmentation;

基于所述分词进行词性标注，并根据词性标注结果确定对应的分词类型；Perform part-of-speech tagging based on the part-of-speech, and determine the corresponding part-of-speech type according to the part-of-speech tagging result;

基于所述分词类型确定各个分词之间的依赖关系。Based on the word segmentation type, a dependency relationship between each word segmentation is determined.

可选的，根据词性标注结果确定对应的分词类型之后，还包括：Optionally, after determining the corresponding word segmentation type according to the part-of-speech tagging result, it also includes:

对所述分词进行清洗；cleaning the participle;

提取清洗结果中的分词词干；Extract the word stems in the cleaning result;

对所述分词词干进行还原处理。A reduction process is performed on the participle stems.

可选的，根据所述分词之间的依赖关系确定概念之间的语义关系，包括：Optionally, the semantic relationship between concepts is determined according to the dependency between the word segments, including:

遍历所述分词中的名词短语，确定短语和单词之间以及短语之间的依赖关系；Traverse the noun phrases in the participle, and determine the dependencies between phrases and words and between phrases;

根据短语和单词之间以及短语之间的依赖关系提取概念之间的语义关系。Extract semantic relationships between concepts based on dependencies between phrases and words and between phrases.

可选的，遍历所述分词中的名词短语，派生短语和单词之间以及短语之间的依赖关系，包括：Optionally, traverse the noun phrases in the word segmentation, and the dependencies between derived phrases and words and between phrases, including:

若以所述当前名词短语中的单词为源节点的依赖关系对应的目标节点落入所述当前名词短语中，则对所述当前名词短语不进行派生；If the target node corresponding to the dependency relationship with the word in the current noun phrase as the source node falls into the current noun phrase, the current noun phrase is not derived;

若以所述当前名词短语中的单词为源节点的依赖关系对应的目标节点落入所述当前名词短语之外，则对所述当前名词短语进行派生。If the target node corresponding to the dependency with the word in the current noun phrase as the source node falls outside the current noun phrase, the current noun phrase is derived.

可选的，对所述当前名词短语进行派生，包括：若派生单词为除所述当前名词短语之外的名词短语中的源节点单词，则派生获得短语之间的依赖关系，否则派生获得短语和单词之间的依赖关系。Optionally, the current noun phrase is derived, including: if the derived word is a source node word in a noun phrase other than the current noun phrase, then the dependency relationship between the phrases is obtained by deriving, otherwise the phrase is obtained by deriving and word dependencies.

可选的，根据短语和单词之间以及短语之间的依赖关系提取概念之间的语义关系，包括：Optionally, extract semantic relationships between concepts based on dependencies between phrases and words and between phrases, including:

根据短语和单词之间以及短语之间的依赖关系按照不同语法结构对应的源节点提取概念之间的关联关系；以及，Extract the associations between concepts according to the source nodes corresponding to different grammatical structures according to the dependencies between phrases and words and between phrases; and,

根据预设单词结构对短语和单词之间以及短语之间的依赖关系进行匹配，识别概念之间的聚合关系、基数关系和属性关系。Match the dependencies between phrases and words and between phrases according to the preset word structure, and identify the aggregation relationship, cardinality relationship and attribute relationship between concepts.

可选的，根据所述概念之间的关联关系确定对应的领域模型，包括：Optionally, the corresponding domain model is determined according to the association relationship between the concepts, including:

遍历所述概念之间的关联关系中的边界概念；Traversing the boundary concepts in the association relationship between the concepts;

对所述边界概念中与预设字段相匹配的边界概念的关联关系进行修正；Modifying the association relationship of the boundary concepts that match the preset fields in the boundary concepts;

其中，所述边界概念为仅有一个其他概念与所述边界概念存在语义关系。Wherein, the boundary concept is that there is only one other concept that has a semantic relationship with the boundary concept.

第二方面，本发明实施例提供一种领域模型提取装置，包括：In a second aspect, an embodiment of the present invention provides an apparatus for extracting a domain model, including:

分析单元，用于对需求文档进行语法分析，确定分词之间的依赖关系；The analysis unit is used to syntactically analyze the requirement document and determine the dependency between the word segmentations;

关系确定单元，用于根据所述分词之间的依赖关系确定概念之间的语义关系；a relationship determining unit, configured to determine the semantic relationship between concepts according to the dependency between the word segments;

领域模型确定单元，用于根据所述概念之间的语义关系确定对应的领域模型。A domain model determining unit, configured to determine a corresponding domain model according to the semantic relationship between the concepts.

第三方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现前述的领域模型提取方法的步骤。In a third aspect, embodiments of the present invention provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the foregoing method for extracting a domain model.

本发明实施例通过确定需求文档中分词之间的依赖关系；根据所述分词之间的依赖关系确定概念之间的语义关系；根据所述概念之间的语义关系确定对应的领域模型，由此提高了领域模型的提取准确率，取得了积极的技术效果。In the embodiment of the present invention, the dependency relationship between the word segments in the requirement document is determined; the semantic relationship between concepts is determined according to the dependency relationship between the word segments; the corresponding domain model is determined according to the semantic relationship between the concepts, thereby The extraction accuracy of the domain model is improved, and positive technical effects are achieved.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1为本发明第一实施例流程图；Fig. 1 is a flow chart of the first embodiment of the present invention;

图2为本发明第一实施例语法分析流程图；2 is a flow chart of syntax analysis according to the first embodiment of the present invention;

图3为本发明第二实施例装置结构示意图。FIG. 3 is a schematic structural diagram of a device according to a second embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

实施例一Example 1

本发明第一实施例提供一种领域模型提取方法，如图1所示，包括以下具体步骤：The first embodiment of the present invention provides a method for extracting a domain model, as shown in FIG. 1 , including the following specific steps:

S101、对需求文档进行语法分析，确定分词之间的依赖关系；S101. Perform grammatical analysis on the requirement document to determine the dependency between the word segmentations;

S102、根据所述分词之间的依赖关系确定概念之间的语义关系；S102, determining the semantic relationship between concepts according to the dependency between the word segments;

S103、根据所述概念之间的语义关系确定对应的领域模型。S103. Determine a corresponding domain model according to the semantic relationship between the concepts.

本发明实施例通过确定需求文档中分词之间的依赖关系；根据所述分词之间的依赖关系确定概念之间的语义关系；根据所述概念之间的语义关系确定对应的领域模型，由此提高了领域模型的提取准确率。In the embodiment of the present invention, the dependency relationship between the word segments in the requirement document is determined; the semantic relationship between concepts is determined according to the dependency relationship between the word segments; the corresponding domain model is determined according to the semantic relationship between the concepts, thereby The extraction accuracy of the domain model is improved.

具体的说，本实施例中，语法分析包括需求声明语句进行预处理，包括分词、分句、词性标注、短语结构分析以及依存句法分析。本实施例中对输入的需求文档进行语法分析，主要的流程如图2所示，包括如下内容：Specifically, in this embodiment, the syntactic analysis includes preprocessing of the statement of requirements, including word segmentation, sentence segmentation, part-of-speech tagging, phrase structure analysis, and dependency syntax analysis. In this embodiment, the input requirement document is grammatically analyzed, and the main process is shown in Figure 2, including the following content:

分句：将输入的文本划分成一个个单独的句子。Clause: Divide the input text into individual sentences.

分词：将输入的语句划分成一个个单独的符号。一个符号可以是一个单词、一个数字、一个标点或一个空格。Word segmentation: Divide the input sentence into individual symbols. A symbol can be a word, a number, a punctuation or a space.

词性标注：标注分词器得出的符号的词性，如名词(NN)、动词(VB)、形容词(JJ)、介词(IN)、冠词(DT)、连词(CC)等。Part-of-speech tagging: tag the part-of-speech of the symbols obtained by the tokenizer, such as noun (NN), verb (VB), adjective (JJ), preposition (IN), article (DT), conjunction (CC), etc.

短语结构分析：推测句子中每一个结构单元所属的类型，如名词短语(NP)、动词短语(VP)、介词短语(PP)、动词(VB)、冠词(DT)、介词(IN)等。Phrase Structure Analysis: Infer the type of each structural unit in a sentence, such as Noun Phrase (NP), Verb Phrase (VP), Prepositional Phrase (PP), Verb (VB), Article (DT), Preposition (IN), etc. .

依存句法分析：分析得出句子中各个独立单词之间的语法关系，由依赖关系表示。依存句法分析输入的是句子，输出一个由关系三元组组成的有向无环图，例如表示为三元组<单词，依赖关系类别，单词>。根据国际依赖关系定义框架Universal Dependencies规定，本实施例中依赖关系类别主要包括：名词主语关系(nsubj)、被动名词主语关系(nsubjpass)、直接宾语关系(dobj)、形容词型修饰符(amod)、名义修饰符(nmod)、名词的子句修饰符(acl)、关系从句修饰符(acl:relcl)等。Dependency syntactic analysis: The grammatical relationship between independent words in a sentence is analyzed and represented by the dependency relationship. The input of dependency parsing is a sentence, and the output is a directed acyclic graph consisting of relation triples, for example represented as triples <word, dependency category, word>. According to the International Dependency Definition Framework Universal Dependencies, the dependency categories in this embodiment mainly include: noun-subject relationship (nsubj), passive noun-subject relationship (nsubjpass), direct object relationship (dobj), adjective modifier (amod), Nominal modifier (nmod), noun clause modifier (acl), relative clause modifier (acl:relcl), etc.

其中，名义修饰符(nmod)表示句子中的介词短语结构。名词的子句修饰符(acl)表示动词不定式或分词形式的补语结构；关系从句修饰符(acl:relcl)表示从句修饰结构。Among them, the nominal modifier (nmod) represents the prepositional phrase structure in the sentence. The noun clause modifier (acl) represents the complement structure of the infinitive or participle form of the verb; the relative clause modifier (acl:relcl) represents the clause modifier structure.

对所述分词进行清洗；cleaning the participle;

本实施例中在获得分词结果之后语法分析进一步还包括去除停用词：其中，停用词是在文本中频繁出现且不表示具体含义的单词，如“a”，“the”，“any”等。In this embodiment, after the word segmentation result is obtained, the grammatical analysis further includes removing stop words: wherein, stop words are words that frequently appear in the text and do not represent specific meanings, such as "a", "the", "any" Wait.

词干提取和词形还原：将名词的复数形式、动词的分词形式、形容词变副词形式等形式，转换为这些词的原形。Stemming and lemmatization: Convert the plural forms of nouns, the participle forms of verbs, and the forms of adjectives to adverbs into the original forms of these words.

提取原子名词短语和动词，为接下来进一步提取领域模型的概念和关系做准备。Extract atomic noun phrases and verbs in preparation for further extraction of concepts and relationships from the domain model.

具体的说，本实施例中在前述语法分析获得的分词的基础上，进一步将语法分析得出的单词之间的依赖关系，派生得到短语级别的依赖关系。短语级别的依赖关系可以表示为关系三元组<短语，依赖关系类型，短语>或者<短语，依赖关系类型，单词>。Specifically, in this embodiment, on the basis of the word segmentation obtained by the aforementioned grammatical analysis, the dependency relationship between the words obtained by the grammatical analysis is further derived to obtain the dependency relationship at the phrase level. Phrase-level dependencies can be represented as relation triples <phrase, dependency type, phrase> or <phrase, dependency type, word>.

本实施例中采用的依赖关系派生算法的伪代码如表1所示。The pseudo code of the dependency relationship derivation algorithm adopted in this embodiment is shown in Table 1.

表1依赖关系派生算法Table 1 Dependency Derivation Algorithm

本实施例中依赖关系派生算法的输入为语法分析得到的全部单词、名词短语以及单词之间的依赖关系，输出为短语之间以及短语和单词之间的依赖关系，具体流程包括：The input of the dependency relationship derivation algorithm in this embodiment is all words, noun phrases, and the dependencies between words obtained by grammatical analysis, and the output is the dependencies between phrases and between phrases and words. The specific process includes:

对需求文档中的所有名词短语NP进行检查：Check for all noun phrases NP in the requirements document:

对于名词短语NP中的每一个单词token₁：如果以这个单词为源节点出发的一条依赖关系dep(token₁,token₂)的目标节点仍落在这个名词短语中，则不对这条依赖关系进行派生。For each word token ₁ in the noun phrase NP: if the target node of a dependency dep(token ₁ , token ₂ ) starting from this word as the source node still falls in this noun phrase, then this dependency is not performed. derived.

如果该依赖关系的目标节点落在该名词短语之外，则对依赖关系dep进行派生：If the target node of the dependency falls outside the noun phrase, then the dependency dep is derived:

若token₂是另一个名词短语NP₂的一部分，则派生依赖关系dep为dep(NP,NP₂)，否则派生依赖关系dep为dep(NP,token₂)。If token ₂ is part of another noun phrase NP ₂ , the derived dependency dep is dep(NP, NP ₂ ), otherwise the derived dependency dep is dep(NP, token ₂ ).

由此确定短语和单词之间以及短语之间的依赖关系。From this, dependencies between phrases and words and between phrases are determined.

具体的，本实施例中，概念之间的语义关系包括关联关系、聚合关系、基数关系和属性关系。其中关联关系包括直接关系和间接关系，其中直接关系表示概念和概念直接由一个动词或动词短语(包括动词或动词短语的分词形式或不定式形式等)或者介词直接相连接表示的关系；间接关系则是直接关系的传递，如果概念A和概念B之间有直接关系，概念B和概念C之间有直接关系，则概念A和概念C之间有间接关系。Specifically, in this embodiment, the semantic relationship between concepts includes an association relationship, an aggregation relationship, a cardinality relationship, and an attribute relationship. The association relationship includes direct relationship and indirect relationship, wherein the direct relationship represents the relationship between concepts and concepts directly connected by a verb or verb phrase (including the participle form or infinitive form of the verb or verb phrase, etc.) or a preposition directly connected; indirect relationship If there is a direct relationship between concept A and concept B, and there is a direct relationship between concept B and concept C, then there is an indirect relationship between concept A and concept C.

基于此本实施例中根据短语和单词之间以及短语之间的依赖关系按照不同语法结构对应的源节点提取概念之间的关联关系，包括：首先识别概念之间的直接关系，进而根据直接关系推导得出概念之间的间接关系，从而得出所有的关联关系。Based on this, in this embodiment, the association relationship between concepts is extracted according to the source nodes corresponding to different grammatical structures according to the dependencies between phrases and words and between phrases, including: first identifying the direct relationship between concepts, and then according to the direct relationship. Indirect relationships between concepts are derived, resulting in all associations.

具体的，根据短语和单词之间以及短语之间的依赖关系按照不同语法结构对应的源节点提取概念之间的关联关系，包括：Specifically, according to the dependencies between phrases and words and between phrases, the associations between concepts are extracted according to source nodes corresponding to different grammatical structures, including:

对于主谓宾结构表示的直接关系，将主语作为关系的源概念，宾语作为关系的目标概念，将谓语动词作为关系的内容。For the direct relationship represented by the subject-verb-object structure, the subject is the source concept of the relationship, the object is the target concept of the relationship, and the predicate verb is the content of the relationship.

对于关系从句中的主谓宾关系，根据acl:relcl依赖关系，找到关系从句中主语that或which指代的名词短语，作为关系的源概念，从句中的宾语和谓语分别作为关系的目标概念和内容。For the subject-predicate-object relationship in a relative clause, according to the acl:relcl dependency, find the noun phrase referred to by the subject that or which in the relative clause, as the source concept of the relationship, and the object and predicate in the clause as the target concept and content.

对于介词短语结构表示的直接关系，使用名词性修饰符(nmod)进行提取，提取算法的伪代码如表2所示。For the direct relationship represented by the prepositional phrase structure, the noun modifier (nmod) is used for extraction, and the pseudocode of the extraction algorithm is shown in Table 2.

表2介词短语提取算法伪代码Table 2 Pseudo code of preposition phrase extraction algorithm

将所有的原子名词短语和动词的集合作为输入，检查每一个名词短语或者动词，是否是一个nmod依赖关系的源节点。Taking as input the set of all atomic noun phrases and verbs, check whether each noun phrase or verb is a source node of an nmod dependency.

如果该nmod依赖关系的源节点是一个名词短语，则将nmod依赖关系源节点的名词短语作为关系的源概念，目标节点的名词短语作为关系的目标概念，介词作为关系的内容。If the source node of the nmod dependency is a noun phrase, the noun phrase of the source node of the nmod dependency is taken as the source concept of the relation, the noun phrase of the target node is taken as the target concept of the relation, and the preposition is taken as the content of the relation.

如果该nmod依赖关系的源节点是一个动词，则将这个动词的直接宾语作为关系的源概念，nmod依赖关系目标节点的名词短语作为关系的目标概念，介词作为关系的内容。If the source node of the nmod dependency is a verb, the direct object of this verb is taken as the source concept of the relation, the noun phrase of the target node of the nmod dependency is taken as the target concept of the relation, and the preposition is taken as the content of the relation.

对于动词性补语结构表示的直接关系，使用名词的子句修饰符(acl)进行提取，提取算法的伪代码如表3所示。For the direct relationship represented by the verb complement structure, the clause modifier (acl) of the noun is used to extract, and the pseudo code of the extraction algorithm is shown in Table 3.

表3动词性补语提取算法伪代码Table 3 Pseudo code of verb complement extraction algorithm

将所有的原子名词短语的集合作为输入，检查每一个名词短语，是否为一个acl依赖关系的源节点。Take the set of all atomic noun phrases as input, and check whether each noun phrase is the source node of an acl dependency.

如果该名词短语是一个acl依赖关系的源节点，并且该acl依赖关系的目标节点是一个及物动词或动词短语，则将acl依赖关系源节点的名词短语作为关系的源概念，依赖关系目标节点动词或动词短语接的宾语作为关系的目标概念，该动词或动词短语为关系的内容。If the noun phrase is the source node of an acl dependency, and the target node of the acl dependency is a transitive verb or verb phrase, then the noun phrase of the source node of the acl dependency is used as the source concept of the relationship, and the dependency target node The object followed by the verb or verb phrase is the target concept of the relation, and the verb or verb phrase is the content of the relation.

根据上述提取得到的直接关系，推导概念之间的间接关系，从而得出概念与概念之间全部的关联关系。According to the direct relationship extracted above, the indirect relationship between concepts is deduced, so as to obtain all the related relationships between concepts.

聚合关系的识别具体实施方式可以包括：Specific implementations of the identification of the aggregation relationship may include:

对于如“contain”、“include”、“type of”等单词结构，以及名词的所有格形式表示聚合关系。以语句“A contains B”或“A's B”为例，可以提取得到源概念为B，目标概念为A的聚合关系。For word structures such as "contain", "include", "type of", etc., as well as the possessive form of nouns, aggregate relationships are expressed. Taking the sentence "A contains B" or "A's B" as an example, an aggregation relationship in which the source concept is B and the target concept is A can be extracted.

基数关系的识别具体实施方式可以包括：Specific implementations of the identification of the cardinality relationship may include:

单词结构中不定冠词、序数词、名词的单复数形式表示基数关系。The singular and plural forms of indefinite articles, ordinal numbers, and nouns in word structure represent cardinality relationships.

如果一个关联关系的源概念和目标概念都是单数，则该关系是一个一对一关系。A relationship is a one-to-one relationship if its source and target concepts are both singular.

如果一个关联关系的源概念和目标概念都是复数，则该关系是一个多对多关系。A relationship is a many-to-many relationship if its source and target concepts are both plural.

如果一个关联关系的源概念是单数，目标概念是复数，则该关系是一个一对多关系。If the source concept of an association relationship is singular and the target concept is plural, the relationship is a one-to-many relationship.

如果一个关联关系的源概念是复数，目标概念是单数，则该关系是一个多对一关系。If the source concept of an association relationship is plural and the target concept is singular, the relationship is a many-to-one relationship.

如果一个关联关系的源概念或目标概念前有明确的数字修饰，则该数字表示基数关系。If the source concept or target concept of an association relationship is preceded by an explicit number, the number indicates a cardinality relationship.

属性关系的识别具体实施方式可以包括：Specific implementations of attribute relationship identification may include:

形如“identified by”，“recognized by”等单词结构可以表示属性。以语句“A isidentified by B”为例，可以提取得到B是概念A的一个属性。Word structures such as "identified by", "recognized by", etc. can represent attributes. Taking the sentence "A isidentified by B" as an example, it can be extracted that B is an attribute of concept A.

修饰概念的形容词表示概念的属性，在自然语言中体现为定语或主系表结构。定语表示它修饰的名词短语的属性，表语表示主语的属性。The adjectives that modify the concept represent the attributes of the concept, which are reflected in the attributive or phylogenetic structure in natural language. The attribute expresses the attribute of the noun phrase it modifies, and the predicate expresses the attribute of the subject.

带有副词或补语修饰的不及物动词表示属性。以语句“The train arrives inthe morning at 10am.”为例，可以通过不及物动词“arrives”和之后的补语共同推断出，概念“train”应该有一个属性“arrival time”。Intransitive verbs with adverb or complement modifiers denote attributes. Taking the sentence "The train arrives in the morning at 10am." as an example, it can be inferred from the intransitive verb "arrives" and the complement that follows that the concept "train" should have an attribute "arrival time".

在获得概念之间的语义关系之后，可以进一步对聚合关系、基数关系和属性关系进行区分，从而提高三者的识别准确率。After obtaining the semantic relationship between concepts, the aggregation relationship, cardinality relationship and attribute relationship can be further distinguished, thereby improving the recognition accuracy of the three.

本实施例中的边界概念表示如果有且仅有一个其他概念与这个概念存在关系，则这个概念称为边界概念。对于获得的概念之间的语义关系中包含边界概念的检查所有包含边界概念的关联关系，如果关联关系的内容能够匹配“include in”、“including”、“consist of”等结构，则将该关联关系修正为聚合关系或属性。例如可以基于已有的领域模型提取结果，检查领域模型的所有边界概念以及连接这个边界概念的关联关系。如果这个关联关系的具体内容能够与表示聚合关系的模式，如“contain”，“include”等，的近义词匹配，则将这个关联关系修改为聚合关系或属性。The boundary concept in this embodiment means that if there is one and only one other concept that has a relationship with this concept, this concept is called a boundary concept. For the semantic relationship between the obtained concepts that contains the boundary concept, check all the associations that contain the boundary concept, if the content of the association can match the structure of "include in", "including", "consist of", etc., then the association The relationship is modified to an aggregate relationship or attribute. For example, the results can be extracted based on the existing domain model, and all boundary concepts of the domain model and the associations connecting the boundary concepts can be checked. If the specific content of this relationship can match the synonyms of the schema representing the aggregation relationship, such as "contain", "include", etc., then modify the relationship to an aggregation relationship or attribute.

与领域建模专家的提取结果相比，本发明方法能够提取出需求文档中95％的关系。Compared with the extraction results of domain modeling experts, the method of the present invention can extract 95% of the relationships in the requirement document.

综上，本发明方法扩展了领域模型的提取规则，引入多种新的依赖关系和语法结构用于提取领域模型，能够更加全面准确地提取介词短语结构和补语结构所表示的信息。本本发明方法还提出了领域模型中边界概念，并且提出了检查包含边界概念的关联关系的方法，能够提高关联关系、聚合关系和属性识别的准确率。In conclusion, the method of the present invention expands the extraction rules of the domain model, introduces a variety of new dependencies and grammatical structures for extracting the domain model, and can extract the information represented by the prepositional phrase structure and the complement structure more comprehensively and accurately. The method of the invention also proposes the boundary concept in the domain model, and proposes a method for checking the association relationship including the boundary concept, which can improve the accuracy of the association relationship, the aggregation relationship and the attribute identification.

实施例二Embodiment 2

本发明第二实施例提供一种领域模型提取装置，如图3所示，包括：The second embodiment of the present invention provides a domain model extraction device, as shown in FIG. 3 , including:

本发明实施例通过确定需求文档中分词之间的依赖关系；根据所述分词之间的依赖关系确定概念之间的语义关系；根据所述概念之间的语义关系确定对应的领域模型，由此提高了领域模型的提取准确率。In the embodiment of the present invention, the dependency relationship between word segments in the requirement document is determined; the semantic relationship between concepts is determined according to the dependency relationship between the word segments; the corresponding domain model is determined according to the semantic relationship between the concepts, thereby The extraction accuracy of the domain model is improved.

实施例三Embodiment 3

本发明第三实施例提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现第一实施例的领域模型提取方法的步骤。A third embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored thereon, and when the computer program is executed by a processor, implements the steps of the domain model extraction method of the first embodiment.

在一个可选的实施方式中，所述计算机程序被处理器执行时实现：In an optional embodiment, when the computer program is executed by a processor, it realizes:

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.

Claims

1. a domain model extraction method, is characterized in that, comprises:

Perform grammatical analysis on the requirements document to determine the dependencies between the word segmentations;

Determine the semantic relationship between concepts according to the dependency relationship between the word segmentations;

Determine the corresponding domain model according to the semantic relationship between the concepts;

Determine the semantic relationship between concepts according to the dependencies between the word segments, including:

Traverse the noun phrases in the participle, and determine the dependencies between phrases and words and between phrases;

Extract semantic relationships between concepts based on dependencies between phrases and words and between phrases;

Traverse the noun phrases in the participle, the dependencies between derived phrases and words, and between phrases, including:

If the target node corresponding to the dependency relationship with the word in the current noun phrase as the source node falls into the current noun phrase, the current noun phrase is not derived;

If the target node corresponding to the dependency with the word in the current noun phrase as the source node falls outside the current noun phrase, the current noun phrase is derived;

Derive the current noun phrase, including: if the derived word is a source node word in a noun phrase other than the current noun phrase, then derive the dependency between the phrases, otherwise derive the dependency between the phrase and the word relation;

Extract semantic relationships between concepts based on dependencies between phrases and words and between phrases, including:

Extract the association relationship between concepts according to the source nodes corresponding to different grammatical structures according to the dependencies between phrases and words and between phrases; and,

Match the dependencies between phrases and words and between phrases according to the preset word structure, and identify the aggregation relationship, cardinality relationship and attribute relationship between concepts;

Determine the corresponding domain model according to the association relationship, including:

Traversing the boundary concepts in the association relationship between the concepts;

Modifying the association relationship of the boundary concepts that match the preset fields in the boundary concepts;

Wherein, the boundary concept is that there is only one other concept that has a semantic relationship with the boundary concept.

2. The method for extracting a domain model according to claim 1, wherein the requirement document is grammatically analyzed, comprising:

Decompose the requirement document to obtain the corresponding word segmentation;

Perform part-of-speech tagging based on the part-of-speech, and determine the corresponding part-of-speech type according to the part-of-speech tagging result;

Based on the word segmentation type, a dependency relationship between each word segmentation is determined.

3. The domain model extraction method as claimed in claim 2, characterized in that, after determining the corresponding word segmentation type according to the part-of-speech tagging result, further comprising:

cleaning the participle;

Extract the word stems in the cleaning result;

A reduction process is performed on the participle stems.

4. A domain model extraction device, characterized in that, comprising:

The analysis unit is used to syntactically analyze the requirement document and determine the dependency between the word segmentations;

a relationship determining unit, configured to determine the semantic relationship between concepts according to the dependency between the word segments;

a domain model determination unit, configured to determine a corresponding domain model according to the semantic relationship between the concepts;

If the target node corresponding to the dependency relationship with the word in the current noun phrase as the source node falls outside the current noun phrase, the current noun phrase is derived;

Derivating the current noun phrase, including: if the derived word is a source node word in a noun phrase other than the current noun phrase, then deriving to obtain the dependency relationship between the phrases, otherwise deriving to obtain the relationship between the phrase and the word dependencies;

5. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1 to 3 is realized. steps of the method.