CN105335447A

CN105335447A - Computer network-based expert question-answering system and construction method thereof

Info

Publication number: CN105335447A
Application number: CN201410400821.4A
Authority: CN
Inventors: 白明
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2014-08-14
Filing date: 2014-08-14
Publication date: 2016-02-17

Abstract

The present invention discloses a computer network-based expert question answering system and construction method, suitable for recommending experts for questioning users according to different fields, wherein the question answering system includes: a knowledge base construction unit, which is used to build a domain knowledge base including concepts and entities The domain expert determination unit is used to determine the experts who belong to the information in the information collection according to the information collection of the field. The information in the information collection is the information associated with concepts or entities obtained from domain-related websites; the question receiving unit uses for receiving the question input by the user; the similarity determining unit is used for determining the first similarity between the expert and the question; the expert matching unit is used for answering the question with the expert with the highest first similarity. The expert question answering system and construction method of the present invention can objectively and accurately recommend experts in the field to which the question belongs to the user, and improve the accuracy rate of the user in solving the question.

Description

Expert question answering system and its construction method based on computer network

技术领域technical field

本发明属于计算机网络领域，特别涉及一种基于计算机网络的专家问答系统及其构建方法。The invention belongs to the field of computer networks, in particular to an expert question answering system based on computer networks and a construction method thereof.

背景技术Background technique

随着互联网信息技术的发展，互联网用户进行信息交流的方式和对象逐渐多样化。当用户存在需要解答的问题时，可以通过多种方式获取答案。传统的方式包括使用例如电话、电子邮件或其他即时通讯工具，向熟悉的或者处于相识的社交圈提出问题，从而获取答案。With the development of Internet information technology, the ways and objects for Internet users to exchange information are gradually diversified. When a user has a question that needs to be answered, there are many ways to get the answer. The traditional way includes using, for example, telephone, e-mail or other instant messaging tools to ask questions to familiar or acquainted social circles, so as to obtain answers.

最近一种常用的方式是，计算机网络用户还可以在具有问答系统的网站上直接提交问题，例如在搜索网站上直接输入需要搜索的问题内容，由问答系统通过关键字匹配已有的问答数据库，给出检索结果。主要的实现方式是问答系统首先建立属于自己的知识数据库，通过不同输入方式(文字、图像)等，以输入的关键字进行检索匹配，获取答案。A recent common method is that computer network users can also directly submit questions on websites with question-and-answer systems, such as directly inputting the content of questions to be searched on a search website, and the question-and-answer system matches existing question-and-answer databases through keywords. Give search results. The main implementation method is that the question answering system first establishes its own knowledge database, and searches and matches the input keywords through different input methods (text, image), etc., to obtain answers.

然而，上述传统方式，无法突破熟悉或相识人群的限制，当所欲了解的问题在已有的通讯圈之内无人知道答案或者联系不上时，无法获取答案。上述第二种方式即最近常用的方式虽然克服了传统方式的弊端，例如，只能通过关键字进行匹配，由此导致检索出的问题的答案与用户实际想获取问题的答案不相符。However, the above-mentioned traditional methods cannot break through the limitations of familiar or acquainted people. When the question you want to know does not know the answer or cannot be contacted within the existing communication circle, you cannot obtain the answer. Although the above-mentioned second method, which is the most commonly used method recently, overcomes the disadvantages of the traditional method, for example, it can only be matched by keywords, which leads to the fact that the retrieved answer to the question does not match the answer to the question that the user actually wants to obtain.

鉴于此，如何在具有问答系统的网站上提问题时，获取与该问题所属领域内专家的解答成为当前需要解决的技术问题。In view of this, how to obtain answers from experts in the field to which the question belongs when asking a question on a website with a question answering system has become a technical problem that needs to be solved at present.

发明内容Contents of the invention

针对现有技术中的缺陷，本发明提供一种基于计算机网络的专家问答系统及其构建方法。Aiming at the defects in the prior art, the present invention provides an expert question answering system based on computer network and its construction method.

第一方面，本发明实施例提供一种基于计算机网络的专家问答系统，知识库构建单元，用于构建领域知识库，所述领域知识库包括：所述领域的至少一个概念、与每一概念对应的多个实体；In the first aspect, an embodiment of the present invention provides an expert question answering system based on a computer network, and a knowledge base construction unit is used to construct a domain knowledge base. The domain knowledge base includes: at least one concept in the field, and each concept Corresponding multiple entities;

领域专家确定单元，用于根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，所述信息集合中的信息为从所述领域相关的网站或评论中获取的与所述概念或所述实体关联的信息，所述专家为所述信息的发出者或所述信息的接收者；A field expert determining unit, configured to determine the experts to whom the information in the information set belongs according to the information set in the field, and the information in the information set is information related to the field obtained from websites or comments related to the field The concept or the information associated with the entity, the expert is the sender of the information or the recipient of the information;

问题接收单元，用于接收用户输入的问题；a question receiving unit, configured to receive a question input by a user;

相似度确定单元，用于确定所述领域专家确定单元确定的专家与所述问题接收单元接收的问题的第一相似度；a similarity determining unit, configured to determine a first degree of similarity between the expert determined by the domain expert determining unit and the question received by the question receiving unit;

专家匹配单元，用于将所述相似度确定单元确定的第一相似度按照大小排序，选取排在前N位的第一相似度对应的专家解答所述问题，N为大于等于1的自然数。The expert matching unit is used to sort the first similarities determined by the similarity determination unit according to their size, and select the experts corresponding to the first similarities in the top N positions to answer the question, where N is a natural number greater than or equal to 1.

可选地，所述知识库构建单元，具体用于Optionally, the knowledge base construction unit is specifically used for

向所述领域对应的网站进行定向抓取，建立二元组表单的表单集合，所述表单集合中的表单包括：导航词、所述导航词对应的多个元素组成的元素集合；Carry out directional grabbing to the website corresponding to the field, and set up a form set of two-tuple forms, the forms in the form set include: a navigation word, an element set composed of a plurality of elements corresponding to the navigation word;

确定所述表单集合中每一表单的导航词与所述至少一个概念是否匹配，若一表单中所述导航词与所述至少一个概念匹配，则将所述导航词所属表单中的元素作为所述至少一个概念对应的核心实体，且每一概念对应的核心实体组成所述概念的实体集合。Determine whether the navigation word of each form in the form set matches the at least one concept, and if the navigation word in a form matches the at least one concept, use the element in the form to which the navigation word belongs as the The core entity corresponding to the at least one concept, and the core entity corresponding to each concept constitutes the entity set of the concept.

可选地，所述知识库构建单元，还用于Optionally, the knowledge base construction unit is also used for

在所述表单集合中存在至少一个表单的导航词未与所述至少一个概念相匹配时，则分别获取未与所述至少一个概念相匹配的导航词所属表单中的元素集合与每一概念的实体集合的第二相似度；When the navigation word of at least one form in the form collection does not match the at least one concept, then obtain the element set in the form to which the navigation word that does not match the at least one concept belongs and each concept the second similarity of the entity set;

针对每一未匹配的导航词的多个第二相似度，将该导航词的多个所述第二相似度按照大小排序，该导航词所属表单中的元素作为排在前M位的第二相似度对应的概念中的非核心实体；M为大于等于1的自然数。For multiple second similarities of each unmatched navigation word, the multiple second similarities of the navigation word are sorted according to size, and the element in the list to which the navigation word belongs is regarded as the first M second similarity. The non-core entity in the concept corresponding to the similarity; M is a natural number greater than or equal to 1.

在所述概念中未包括核心实体和非核心实体时，补充所述概念对应的核心实体；When the concept does not include core entities and non-core entities, supplement the core entities corresponding to the concept;

其中，所述概念对应多个实体包括：所述核心实体和/或所述非核心实体。Wherein, the concept corresponds to multiple entities including: the core entity and/or the non-core entity.

可选地，所述领域专家确定单元，具体用于Optionally, the domain expert determines the unit, specifically for

获取所述领域对应的社交网站中的信息，确定所述信息内容是否包括所述领域知识库中的概念名称或实体名称；Obtain the information in the social networking site corresponding to the field, and determine whether the information content includes the concept name or entity name in the field knowledge base;

若所述信息内容包括所述概念名称或实体名称，则根据所述信息的发送者、接收者生成专家候选集合，以及If the information content includes the concept name or entity name, generate a candidate set of experts according to the sender and receiver of the information, and

计算所述信息与所述领域的第三相似度，将所述信息的发送者，接受者和所述信息的第三相似度作为一个三元组信息，生成信息集合；calculating the third degree of similarity between the information and the field, using the sender of the information, the receiver and the third degree of similarity between the information as a triplet information to generate an information set;

根据所述专家候选集合的专家和所述信息集合中的信息，获取所述专家候选集合中每一专家的排名；Obtain the ranking of each expert in the expert candidate set according to the experts in the expert candidate set and the information in the information set;

和/或，and / or,

选取排名靠前的X个专家作为所述信息集合中所述信息所属的专家，X为大于等于1的自然数。The top X experts are selected as the experts to whom the information in the information set belongs, and X is a natural number greater than or equal to 1.

可选地，所述领域专家确定单元，还用于Optionally, the domain expert determination unit is also used for

针对所述专家候选集合中的每一专家，获取每一专家在所述信息集合中的所有信息；For each expert in the expert candidate set, obtain all information of each expert in the information set;

根据每一专家在所述信息集合中的所有信息和所述领域知识库中的所有概念，获取每一专家对所有概念的概念相似向量。According to all information of each expert in the information set and all concepts in the domain knowledge base, the concept similarity vectors of each expert for all concepts are obtained.

可选地，所述相似度确定单元，具体用于Optionally, the similarity determining unit is specifically used for

对所述问题接收单元接收的所述问题进行切词处理，得到与所述问题对应的词的第一集合；performing word segmentation processing on the question received by the question receiving unit to obtain a first set of words corresponding to the question;

获取所述第一集合与所述领域知识库中所有概念的问题相似向量；Obtaining question similarity vectors between the first set and all concepts in the domain knowledge base;

根据所述概念相似向量和所述问题相似向量，确定所述专家与所述问题的第一相似度。A first degree of similarity between the expert and the question is determined according to the concept similarity vector and the question similarity vector.

第二方面，本发明提供一种专家问答系统，包括：In a second aspect, the present invention provides an expert question answering system, comprising:

接收单元，用于接收用户输入的问题；a receiving unit, configured to receive a question input by a user;

相似度确定单元，用于确定所述问题与专家问答系统中每一专家的相似度，所述专家为所述问题所属领域的技术熟悉人；A similarity determination unit, configured to determine the similarity between the question and each expert in the expert question answering system, where the expert is a technically familiar person in the field to which the question belongs;

专家选取单元，用于将所述相似度按照大小排序，选取排在前N位的相似度对应的专家，N为大于等于1的自然数；An expert selection unit, configured to sort the similarities according to their size, and select experts corresponding to the top N similarities, where N is a natural number greater than or equal to 1;

问题解答单元，用于使所述专家选取单元选取的专家为所述用户解答所述问题。A question answering unit, configured to enable the expert selected by the expert selection unit to answer the question for the user.

对所述问题进行切词处理，得到与所述问题对应的词的第一集合；performing word segmentation processing on the question to obtain a first set of words corresponding to the question;

获取所述第一集合与领域知识库中所有概念的问题相似向量，所述领域知识库为所述专家问答系统中预先获取的包括至少一个概念、所述至少一个概念对应的多个实体的知识库；Obtaining the question similarity vectors between the first set and all the concepts in the domain knowledge base, the domain knowledge base is the pre-acquired knowledge of at least one concept and multiple entities corresponding to the at least one concept in the expert question answering system Library;

根据每一专家的概念相似向量和所述问题相似向量，确定所述专家问答系统中专家与所述问题的相似度；所述每一专家的概念相似向量为根据该专家发送的所有信息和所述领域知识库中所有概念预先获取的，且所述专家发送的所有信息为从所述领域相关网站或评论中获取的与所述概念或者所述实体关联的信息。According to the conceptual similarity vector of each expert and the similarity vector of the question, determine the similarity between the expert and the question in the expert question answering system; the conceptual similarity vector of each expert is based on all the information sent by the expert and the All the concepts in the domain knowledge base are obtained in advance, and all the information sent by the experts is the information associated with the concepts or the entities obtained from the domain-related websites or comments.

第三方面，本发明实施例提供一种基于计算机网络的专家问答系统的构建方法，包括：In a third aspect, an embodiment of the present invention provides a method for constructing a computer network-based expert question answering system, including:

构建领域知识库，所述领域知识库包括：所述领域的至少一个概念、与每一概念对应的多个实体；Constructing a domain knowledge base, the domain knowledge base includes: at least one concept in the domain, and multiple entities corresponding to each concept;

根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，所述信息集合中的信息为从所述领域相关的网站或评论中获取的与所述概念或所述实体关联的信息，所述专家为所述信息的发出者或所述信息的接收者；According to the information collection in the field, determine the experts to whom the information in the information collection belongs, and the information in the information collection is obtained from websites or comments related to the field and associated with the concept or the entity information, the expert is the sender of the information or the recipient of the information;

若所述专家问答系统接收到问题，则确定所述专家与所述问题的第一相似度，将所述第一相似度按照大小排序，选取排在前N位的第一相似度对应的专家解答所述问题，N为大于等于1的自然数。If the expert question answering system receives a question, determine the first degree of similarity between the expert and the question, sort the first degree of similarity according to size, and select an expert corresponding to the first degree of similarity in the top N positions To answer the question, N is a natural number greater than or equal to 1.

可选地，所述构建领域知识库，包括：Optionally, the construction of domain knowledge base includes:

可选地，所述构建领域知识库，还包括：Optionally, said building domain knowledge base also includes:

若所述表单集合中存在至少一个表单的导航词未与所述至少一个概念相匹配，则分别获取未与所述至少一个概念相匹配的导航词所属表单中的元素集合与每一概念的实体集合的第二相似度；If the navigation word of at least one form in the form set does not match the at least one concept, then obtain the element set and the entity of each concept in the form to which the navigation word that does not match the at least one concept belongs the second similarity of the set;

针对每一未匹配的导航词的多个第二相似度，将该导航词的多个所述第二相似度按照大小排序，该导航词所属表单中的元素作为排在前M位的第二相似度对应的概念中的非核心实体；For multiple second similarities of each unmatched navigation word, the multiple second similarities of the navigation word are sorted according to size, and the element in the list to which the navigation word belongs is regarded as the first M second similarity. Non-core entities in the concept corresponding to the similarity;

M为大于等于1的自然数。M is a natural number greater than or equal to 1.

若所述概念中未包括核心实体和非核心实体，则补充所述概念对应的核心实体；If the concept does not include core entities and non-core entities, then supplement the core entities corresponding to the concept;

其中，所述概念对应的多个实体包括：所述核心实体和/或所述非核心实体。Wherein, the multiple entities corresponding to the concept include: the core entity and/or the non-core entity.

可选地，所述根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，包括：Optionally, the determining, according to the information set in the field, the expert to whom the information in the information set belongs includes:

计算所述信息与所述领域的第三相似度，将所述信息的发送者、接收者和所述信息的第三相似度作为一个三元组信息，生成所述信息集合；calculating a third similarity between the information and the field, and using the sender of the information, the receiver, and the third similarity of the information as a triplet of information to generate the information set;

和/或，and / or,

可选地，还包括：Optionally, also include:

可选地，若所述专家问答系统接收到问题，确定所述专家与所述问题的第一相似度，包括：Optionally, if the expert question answering system receives a question, determining the first similarity between the expert and the question includes:

第四方面，本发明提供一种自动问答方法，包括：In a fourth aspect, the present invention provides an automatic question answering method, comprising:

接收用户输入的问题，确定所述问题与专家问答系统中每一专家的相似度，所述专家为所述问题所属领域的技术熟悉人；Receive a question input by the user, determine the similarity between the question and each expert in the expert question answering system, and the expert is a person familiar with the technology in the field to which the question belongs;

将所述相似度按照大小排序，选取排在前N位的相似度对应的专家解答所述问题，N为大于等于1的自然数。The similarities are sorted by size, and the experts corresponding to the top N similarities are selected to answer the questions, where N is a natural number greater than or equal to 1.

可选地，所述确定所述问题与专家问答系统中专家的相似度，包括：Optionally, the determining the similarity between the question and the experts in the expert question answering system includes:

根据每一专家的概念相似向量和所述问题相似向量，确定所述专家问答系统中专家与所述问题的相似度；所述每一专家的概念相似向量为根据该专家发送的所有信息和所述领域知识库中的所有概念预先获取的；所述专家发送的所有信息为从所述领域相关网站或评论中获取的与所述概念或者所述实体关联的信息。According to the conceptual similarity vector of each expert and the similarity vector of the question, determine the similarity between the expert and the question in the expert question answering system; the conceptual similarity vector of each expert is based on all the information sent by the expert and the All the concepts in the domain knowledge base are pre-acquired; all the information sent by the experts is the information associated with the concepts or the entities obtained from the domain-related websites or comments.

由上述技术方案可知，本发明的基于计算机网络的专家问答系统及其构建方法，通过构建领域知识库，并确定与领域知识库中概念或实体相关的专家，进而在接收到用户提出的问题时，可确定每一专家与该问题的第一相似度，选择比较相似的专家中的一位解答用户提出的问题，由此，上述方法充分考虑了专业网站与社交网络的结合，能够为用户客观准确地推荐问题所属领域内的专家解答用户的问题，提高解决用户问题的准确率。It can be seen from the above technical solutions that the computer network-based expert question answering system and its construction method of the present invention, by constructing the domain knowledge base, and determining the experts related to the concepts or entities in the domain knowledge base, and then when receiving the questions raised by the user, , the first similarity between each expert and the question can be determined, and one of the more similar experts can be selected to answer the question raised by the user. Therefore, the above method fully considers the combination of professional websites and social networks, and can provide users with an objective Accurately recommend experts in the field to which the question belongs to answer the user's question, improving the accuracy of solving the user's question.

上述说明仅是本发明技术方案的概述，为了更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其他目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable, Specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些示例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only of the present invention. For some examples, those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.

图1为本发明一实施例提供的专家问答系统的构建方法的流程示意图；Fig. 1 is a schematic flow chart of a construction method of an expert question answering system provided by an embodiment of the present invention;

图2A为本发明一实施例提供的构建领域知识库的流程示意图；FIG. 2A is a schematic flow diagram of building a domain knowledge base provided by an embodiment of the present invention;

图2B为本发明一实施例提供的确定领域所属专家的流程示意图；FIG. 2B is a schematic flow diagram of determining experts in a field provided by an embodiment of the present invention;

图2C为本发明一实施例提供的问题与领域内专家匹配的流程示意图；FIG. 2C is a schematic flow diagram of matching the questions provided by an embodiment of the present invention with experts in the field;

图3为本发明一实施例提供的自动问答方法的流程示意图；Fig. 3 is a schematic flow chart of an automatic question answering method provided by an embodiment of the present invention;

图4为本发明一实施例提供的专家问答系统的结构示意图；FIG. 4 is a schematic structural diagram of an expert question answering system provided by an embodiment of the present invention;

图5为本发明另一实施例提供的专家问答系统的结构示意图；5 is a schematic structural diagram of an expert question answering system provided by another embodiment of the present invention;

图6为本发明另一实施例提供的专家问答系统的结构示意图；FIG. 6 is a schematic structural diagram of an expert question answering system provided by another embodiment of the present invention;

图7为本发明一实施例提供的专家候选集合P和消息集合E构成的社交网络图。FIG. 7 is a social network diagram formed by an expert candidate set P and a message set E provided by an embodiment of the present invention.

具体实施方式detailed description

下面将参照附图更详细地描述本发明公开的示例性实施例。虽然附图中显示了本发明公开的示例性实施例，然而应当理解，可以以各种形式实现本发明公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that the present invention can be more thoroughly understood and the scope of the present disclosure can be fully conveyed to those skilled in the art.

图1示出了本发明一实施例提供的专家问答系统的构建方法的流程示意图，如图1所示，本实施例的专家问答系统的构建方法如下文所述。Fig. 1 shows a schematic flowchart of a method for constructing an expert question answering system provided by an embodiment of the present invention. As shown in Fig. 1 , the method for constructing an expert question answering system in this embodiment is as follows.

101、构建领域知识库，所述领域知识库包括：所述领域的至少一个概念、与每一概念对应的多个实体；101. Build a domain knowledge base, where the domain knowledge base includes: at least one concept in the domain, and multiple entities corresponding to each concept;

102、根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，所述信息集合中的信息为从所述领域相关的网站或评论中获取的与所述概念或所述实体关联的信息，所述专家为所述信息的发出者或所述信息的接收者；102. According to the information collection in the field, determine the experts to whom the information in the information collection belongs, and the information in the information collection is obtained from websites or comments related to the field and related to the concept or the information associated with entities, said expert being the sender of said information or the recipient of said information;

103、若所述专家问答系统接收到问题，则确定所述专家与所述问题的第一相似度，将所述第一相似度按照大小排序，选取排在前N位的第一相似度对应的专家解答所述问题。103. If the expert question answering system receives a question, determine the first degree of similarity between the expert and the question, sort the first degree of similarity according to size, and select the top N positions corresponding to the first degree of similarity experts to answer the questions.

其中，N为大于等于1的自然数，N的取值可根据实际需要设置。Wherein, N is a natural number greater than or equal to 1, and the value of N can be set according to actual needs.

本实施例中优选排在第一位的专家解答用户的问题，但可能出现的情况是，排在第一位的专家在解答其他用户的问题，此时可选择排在第二位的专家解答用户的问题。In this embodiment, it is preferred that the expert ranked first answers the user's questions, but it may happen that the expert ranked first is answering the questions of other users, and at this time, the expert ranked second can be selected to answer User's question.

本实施例的专家问答系统的构建方法，通过构建领域知识库，并确定与领域知识库中概念或实体相关的专家，进而在接收到用户提出的问题时，可确定每一专家与该问题的第一相似度，选择比较相似的专家中的一位解答用户提出的问题，充分考虑了专业网站与社交网络的结合，能够为用户客观准确地推荐问题所属领域内的专家解答用户的问题，提高解决用户问题的准确率。The construction method of the expert question answering system in this embodiment, by constructing the domain knowledge base, and determining the experts related to the concepts or entities in the domain knowledge base, and then when receiving the question raised by the user, can determine the relationship between each expert and the question The first degree of similarity selects one of the similar experts to answer the questions raised by the user, fully considering the combination of professional websites and social networks, and can objectively and accurately recommend experts in the field to which the question belongs to answer the user's question, improving the Accuracy of solving user problems.

在本实施例中，前述的领域知识库可包括有概念和实体，其中概念和实体之间是建立有关系的。构建领域知识库的关键在于获得概念下的实体。In this embodiment, the foregoing domain knowledge base may include concepts and entities, wherein a relationship is established between concepts and entities. The key to building a domain knowledge base is to obtain entities under concepts.

通常，可以在领域知识库中通过人工建立该领域内的至少一个概念，且还可以通过人工方式补充每一概念下的多个实体。当然，由于领域非常多，且领域知识库中的概念也非常多，通过人工方式建立领域知识库比较耗时，成本高。由此，本实施例中可通过自动方式获取领域知识库中概念下的实体，例如通过下属的子步骤实现。Usually, at least one concept in the domain can be manually established in the domain knowledge base, and multiple entities under each concept can also be supplemented manually. Of course, since there are many fields and concepts in the domain knowledge base, it is time-consuming and costly to build the domain knowledge base manually. Therefore, in this embodiment, entities under concepts in the domain knowledge base can be obtained automatically, for example, through subordinate sub-steps.

前述的步骤101还可包括下述的图2A中所示的子步骤1011和子步骤1012；The aforementioned step 101 may also include the following sub-steps 1011 and 1012 shown in FIG. 2A;

1011、向所述领域对应的网站进行定向抓取，建立二元组表单的表单集合，所述表单集合中的表单包括：导航词、所述导航词对应的多个元素组成的元素集合。1011. Perform directional capture to the website corresponding to the field, and establish a form set of two-tuple forms, where the forms in the form set include: a navigation word, and an element set composed of multiple elements corresponding to the navigation word.

举例来说，定向抓取专业网站中的各种信息，例如，抽取该专业网站中的类目导航作为导航词，建立固定格式的表单等。For example, directional capture of various information in a professional website, for example, extracting category navigation in the professional website as a navigation word, creating a form with a fixed format, etc.

本实施例中的表单可为一个二元组信息：t＝<导航词,{表单内出现的元素集合}>，多个二元组表单可组成一个表单集合，例如，表单集合H＝{t|t＝<导航词,{表单内出现的元素集合}>。应说明的是该处的元素集合可采用t_s表示。The form in this embodiment can be a two-tuple information: t=<navigation word, {element set that appears in the form}>, and multiple two-tuple forms can form a form set, for example, form set H={t |t=<navigation word, {set of elements appearing in the form}>. It should be noted that the set of elements here can be represented by t _s .

1012、确定所述表单集合中每一表单的导航词与所述至少一个概念是否匹配，若一表单中所述导航词与所述至少一个概念匹配，则将所述导航词所属表单中的元素作为所述至少一个概念对应的核心实体，且每一概念对应的核心实体组成所述概念的实体集合。1012. Determine whether the navigation word of each form in the form set matches the at least one concept, and if the navigation word in a form matches the at least one concept, set the element in the form to which the navigation word belongs As the core entity corresponding to the at least one concept, and the core entity corresponding to each concept constitutes an entity set of the concept.

举例来说，可通过抽取表单集合H中的每一个表单t，如果表单t的导航词和领域知识库中概念的名字相关如相同，则可把表单t的元素集合分配到该概念下，作为概念下的核心实体，记录概念与核心实体的相似度为1。此时，概念下的核心实体可组成实体集合c_t。For example, by extracting each form t in the form set H, if the navigation word of the form t is related to the name of the concept in the domain knowledge base, the element set of the form t can be assigned to the concept, as The core entity under the concept, record the similarity between the concept and the core entity as 1. At this point, the core entities under the concept can form the entity set c _t .

可选地，为方便后续计算，可将表单t从表单集合H中删除。Optionally, for the convenience of subsequent calculations, the form t can be deleted from the form set H.

在具体应用中，还可能出现一个或多个表单中的导航词与概念不匹配的场景，为此，为防止遗漏所属领域的实体，保证领域内概念下的实体时充分的、完整的，前述的步骤还可包括下述的图2A中示出的子步骤1013和子步骤1014。In a specific application, there may also be a scenario where the navigation words in one or more forms do not match the concepts. Therefore, in order to prevent the entities in the domain from being missed and ensure that the entities under the concepts in the domain are sufficient and complete, the aforementioned The step may also include sub-step 1013 and sub-step 1014 shown in FIG. 2A described below.

1013、若所述表单集合中存在至少一个表单的导航词未与所述至少一个概念相匹配，则分别获取未与所述至少一个概念相匹配的导航词所属表单中的元素集合与每一概念的实体集合的第二相似度；1013. If there is at least one navigation word in the form set that does not match the at least one concept, obtain the element set and each concept in the form that the navigation word that does not match the at least one concept belongs to The second similarity of the entity set;

1014、针对每一未匹配的导航词的多个第二相似度，将该导航词的多个所述第二相似度按照大小排序，该导航词所属表单中的元素作为排在前M位的第二相似度对应的概念中的非核心实体。1014. For multiple second similarities of each unmatched navigation word, sort the multiple second similarities of the navigation word according to size, and the element in the list to which the navigation word belongs is taken as the top M Non-core entities in the concept corresponding to the second similarity.

M为大于等于1的自然数。在具体应用中，M优选为5、6或8等。M is a natural number greater than or equal to 1. In specific applications, M is preferably 5, 6 or 8, etc.

本实施例中的每一概念对应的多个实体可包括核心实体和/或非核心实体。Multiple entities corresponding to each concept in this embodiment may include core entities and/or non-core entities.

举例来说，在子步骤1013和子步骤1014中，抽取表单集合H中的每一个表单t，用表单t的元素集合t_s与领域知识库中每一个概念所包含的核心实体的实体集合c_t计算第二相似度：For example, in sub-step 1013 and sub-step 1014, extract each form t in the form set H, use the element set t _s of form t and the entity set c _t of core entities contained in each concept in the domain knowledge base Calculate the second similarity:

选择第二相似度sim最高的n(n＝5)个概念，将元素集合ts分配的到这些概念下，并记录该sim作为实体与概念的相似度。Select n (n=5) concepts with the highest second similarity sim, assign the element set ts to these concepts, and record the similarity between the sim as an entity and the concept.

应说明的是，该处的第二相似度中的“第二”无特别含义，仅仅是为了区分本发明中在不同位置出现的相似度的说明。It should be noted that the "second" in the second similarity here has no special meaning, and is only used to distinguish the description of the similarity appearing in different positions in the present invention.

当前，前述构建领域知识库已经可以使用。遍历领域知识库中每一概念，确定每一概念中都有对应的实体如核心实体或非核心实体。Currently, the aforementioned construction of the domain knowledge base is already available. Traverse each concept in the domain knowledge base, and make sure that each concept has a corresponding entity such as a core entity or a non-core entity.

若前述的概念中未包括核心实体和非核心实体，此时，需要人工补充与概念相关的核心实体。If the aforementioned concepts do not include core entities and non-core entities, at this time, it is necessary to manually supplement the core entities related to the concept.

例如，若所述概念中未包括核心实体和非核心实体，则补充所述概念对应的核心实体；For example, if the concept does not include core entities and non-core entities, supplement the core entities corresponding to the concept;

可理解的是，在领域知识库中若还存在部分概念下没有包含实体(该实体包括核心实体和非核心实体)，则可以人工补充一部分核心实体，确保领域知识库中每一个概念都包含实体。It is understandable that if there are some concepts in the domain knowledge base that do not contain entities (the entities include core entities and non-core entities), some core entities can be manually supplemented to ensure that each concept in the domain knowledge base contains entities .

在具体的使用过程中，可重复上述步骤101中的所有子步骤，以获得稳定的领域知识库。In a specific use process, all the sub-steps in the above step 101 can be repeated to obtain a stable domain knowledge base.

当然，也可定期更新本实施例中构建的领域知识库，例如可一周更新一次，或者两天更新一次领域知识库，本实施例不对其进行限定，可根据实际需要设置。Of course, the domain knowledge base built in this embodiment can also be updated regularly, for example, once a week, or once every two days, which is not limited in this embodiment and can be set according to actual needs.

由此，本实施例中构建的领域知识库中的概念中都包括有实体，且所述概念对应的多个实体包括：所述核心实体和/或所述非核心实体。Therefore, the concepts in the domain knowledge base constructed in this embodiment all include entities, and the multiple entities corresponding to the concepts include: the core entity and/or the non-core entity.

在其他实施例中，构建领域知识库还可通过领域本体论模型(ontology)的方式构建。例如，现有的各种领域相关的网站已经存在大量的本体论模型，可方便直接抓取使用。In other embodiments, the construction of the domain knowledge base can also be constructed in the form of a domain ontology model (ontology). For example, there are already a large number of ontology models in various domain-related websites, which can be easily grabbed and used directly.

在一种可能的实现方式中，前述的步骤102可包括下述的图2B中示出的子步骤1021至子步骤1025；In a possible implementation manner, the aforementioned step 102 may include the following sub-steps 1021 to 1025 shown in FIG. 2B;

1021、获取所述领域对应的社交网站中的信息，确定所述信息内容是否包括所述领域知识库中的概念名称或实体名称；1021. Obtain the information in the social networking site corresponding to the domain, and determine whether the information content includes the concept name or entity name in the domain knowledge base;

1022、若所述信息内容包括所述概念名称或实体名称，则根据所述信息的发送者、接收者生成专家候选集合，以及1022. If the information content includes the concept name or entity name, generate an expert candidate set according to the sender and receiver of the information, and

1023、计算所述信息与所述领域的第三相似度，将所述信息的发送者、接收者和所述信息的第三相似度作为一个三元组信息，生成所述信息集合；1023. Calculate the third degree of similarity between the information and the field, and use the sender and receiver of the information and the third degree of similarity between the information as a triplet information to generate the information set;

也就是说，上述子步骤1021至子步骤1023需要确定领域内的专家候选集合P，领域内的消息集合/信息集合E。That is to say, the above sub-steps 1021 to 1023 need to determine the expert candidate set P in the field and the message set/information set E in the field.

举例来说，通过抓取社交网络中的消息(微博)，对消息内容进行切词，如果该消息中含有领域知识库的概念名称或者实体名称，则将消息的发送者和接收者加入专家候选集合P；同时，还可计算该消息与领域的第三相似度：For example, by grabbing messages (microblogging) in social networks, word segmentation is performed on the message content, and if the message contains the concept name or entity name of the domain knowledge base, the sender and receiver of the message are added to the expert Candidate set P; at the same time, the third similarity between the message and the field can also be calculated:

m_sim＝sum{sim(w)}/n,m_sim=sum{sim(w)}/n,

其中，sim(w)表示消息中的词w与领域的第三相似度，n表示消息中词的个数。Among them, sim(w) represents the third similarity between the word w in the message and the domain, and n represents the number of words in the message.

由此，可将三元组<消息发送者，消息接收者，该消息的第三相似度m_sim>加入消息集合E。Thus, the triplet <message sender, message receiver, third similarity m_sim of the message> can be added to the message set E.

该处的消息可为信息的一种。The message here can be a kind of information.

1024、根据所述专家候选集合的专家和所述信息集合中的信息，获取所述专家候选集合中每一专家的排名。1024. Obtain the ranking of each expert in the expert candidate set according to the experts in the expert candidate set and the information in the information set.

举例来说，前述的专家候选集合P、消息集合E实际形成了一个领域内的社交网络图，利用PeopleRank算法可以获得专家候选集合P中每一个专家的排名分数。For example, the aforementioned expert candidate set P and message set E actually form a social network graph in a field, and the ranking score of each expert in the expert candidate set P can be obtained by using the PeopleRank algorithm.

1025、选取排名靠前的X个专家作为所述信息集合中所述信息所属的专家，X为大于等于1的自然数。1025. Select top X experts as the experts to whom the information in the information set belongs, where X is a natural number greater than or equal to 1.

可理解的是，前述选取部分专家可为可选的步骤。It can be understood that the aforementioned selection of some experts may be an optional step.

也就是说，若专家候选集合中包括较多的专家，但部分专家可能不常出现，为此，可选取专家候选集合中的部分专家作为专家问答系统中的常用专家。That is to say, if there are many experts in the expert candidate set, some experts may not appear frequently. Therefore, some experts in the expert candidate set can be selected as common experts in the expert question answering system.

上述获取的专家属于在领域的专业网站中发出信息/消息的发出者，或者接收信息/消息的接收者，故，可能出现，某些专家只发出过一次或两三次的信息之后，不再出现在该专业网站中，鉴于此，可获取专家候选集合中所有专家的排名，并筛选部分的专家。The experts obtained above belong to the senders of information/messages in the professional websites in the field, or the recipients of information/messages, so it may appear that some experts only send out information once or two or three times, and then no longer appear In this professional website, in view of this, the rankings of all experts in the expert candidate set can be obtained, and part of the experts can be screened.

可选地，还可预先获取每一专家对所有概念的概念相似向量，例如通过下述图2B中未示出的子步骤A01和子步骤A02的方式获取。Optionally, the conceptual similarity vectors of each expert for all concepts can also be obtained in advance, for example, by means of sub-step A01 and sub-step A02 not shown in FIG. 2B below.

A01、针对所述专家候选集合中的每一专家，获取每一专家在所述信息集合中的所有信息；A01. For each expert in the expert candidate set, obtain all information of each expert in the information set;

A02、根据每一专家在所述信息集合中的所有信息和所述领域知识库中的所有概念，获取每一专家对所有概念的概念相似向量。A02. According to all information of each expert in the information set and all concepts in the domain knowledge base, obtain concept similarity vectors of each expert for all concepts.

可理解的是，对于专家候选集合中的某一个专家p，搜集该专家发出的领域相关的消息集合E_p，对于领域知识库中的每一个概念c，计算专家与所有概念的概念相似向量CSVec：It is understandable that, for an expert p in the expert candidate set, collect the domain-related message set E _p sent by the expert, and for each concept c in the domain knowledge base, calculate the concept similarity vector CSVec between the expert and all concepts :

其中，c_t表示概念c包含的实体。Among them, c _t represents the entity contained in the concept c.

本实施例中，概念相似向量CSVec是一个多维的向量。In this embodiment, the conceptual similarity vector CSVec is a multidimensional vector.

通过上述步骤102的子步骤可以明确领域内各专家的排名以及对应的排名分数，进而获知专家在该领域内的影响力；Through the sub-steps of step 102 above, the ranking and corresponding ranking scores of experts in the field can be clarified, and then the influence of the experts in the field can be known;

进一步地，还可根据前述的概念相似向量可更清楚的获悉专家在该领域内侧重哪些子方向。Furthermore, according to the aforementioned conceptual similarity vectors, it can be more clearly known which sub-directions experts focus on in this field.

在另一种可能的实现方式中，前述的步骤103可包括下述的图2C中示出的子步骤1031至子步骤1033；In another possible implementation, the foregoing step 103 may include the following sub-steps 1031 to 1033 shown in FIG. 2C ;

1031、对所述问题进行切词处理，得到与所述问题对应的词的第一集合；1031. Perform word segmentation processing on the question to obtain a first set of words corresponding to the question;

1032、获取所述第一集合与所述领域知识库中所有概念的问题相似向量；1032. Obtain question similarity vectors between the first set and all concepts in the domain knowledge base;

1033、根据所述概念相似向量和所述问题相似向量，确定所述专家与所述问题的第一相似度。1033. Determine a first degree of similarity between the expert and the question according to the concept similarity vector and the question similarity vector.

由此，通过上述方法建立的专家问答系统可准确、客观的为用户提供用户问题所属领域内的专家，进而可提高专家与问题的匹配度，提高了解答问题的准确率。Therefore, the expert question answering system established by the above method can accurately and objectively provide users with experts in the field to which the user's question belongs, thereby improving the matching degree between experts and questions, and improving the accuracy of answering questions.

图3示出了本发明一实施例提供的自动问答方法的流程示意图，如图3所示，本实施例的自动问答方法如下所述。FIG. 3 shows a schematic flowchart of an automatic question answering method provided by an embodiment of the present invention. As shown in FIG. 3 , the automatic question answering method in this embodiment is as follows.

301、接收用户输入的问题，确定所述问题与专家问答系统中每一专家的相似度，所述专家为所述问题所属领域的技术熟悉人；301. Receive a question input by the user, and determine the similarity between the question and each expert in the expert question answering system, where the expert is a technically familiar person in the field to which the question belongs;

302、将所述相似度按照大小排序，选取排在前N位的相似度对应的专家解答所述问题，N为大于等于1的自然数。302. Sort the similarities according to their size, and select experts corresponding to the top N similarities to answer the question, where N is a natural number greater than or equal to 1.

举例来说，在步骤301中的确定所述问题与专家问答系统中专家的相似度可具体通过下述的图中未示出的子步骤实现。For example, determining the similarity between the question and the experts in the expert question answering system in step 301 may be specifically implemented through sub-steps not shown in the figure below.

3011、对所述问题进行切词处理，得到与所述问题对应的词的第一集合；3011. Perform word segmentation processing on the question to obtain a first set of words corresponding to the question;

3012、获取所述第一集合与领域知识库中所有概念的问题相似向量，所述领域知识库为所述专家问答系统中预先获取的包括至少一个概念、所述至少一个概念对应的多个实体的知识库；3012. Obtain question similarity vectors between the first set and all concepts in the domain knowledge base, where the domain knowledge base is pre-acquired in the expert question answering system including at least one concept and multiple entities corresponding to the at least one concept knowledge base;

3013、根据每一专家的概念相似向量和所述问题相似向量，确定所述专家问答系统中专家与所述问题的相似度。3013. According to the concept similarity vector of each expert and the question similarity vector, determine the similarity between the expert and the question in the expert question answering system.

应说明的是，该子步骤3013中每一专家的概念相似向量可为专家问答系统中预先获取的。例如通过前述的一个方法实施例中的步骤102的子步骤A01和A02获取的概念相似向量。It should be noted that the conceptual similarity vector of each expert in this sub-step 3013 may be pre-acquired in the expert question answering system. For example, the conceptual similarity vectors obtained through sub-steps A01 and A02 of step 102 in one of the aforementioned method embodiments.

可理解的是：每一专家的概念相似向量为根据该专家发送的所有信息和所述领域知识库中的所有概念预先获取的；所述专家发送的所有信息为从所述领域相关网站或评论中获取的与所述概念或者所述实体关联的信息。It can be understood that: the concept similarity vector of each expert is pre-acquired according to all the information sent by the expert and all the concepts in the domain knowledge base; The information associated with the concept or the entity acquired in .

本实施例中的自动问答方法可实现本领域的专家能够在线自动回答本领域的技术问题，进而提高用户获取问题答案的准确率。The automatic question answering method in this embodiment can enable experts in the field to automatically answer technical questions in the field online, thereby improving the accuracy of the user's answer to the question.

图4示出了本发明一实施例提供的专家问答系统的结构示意图，如图4所示，本实施例的专家问答系统可包括：知识库构建单元41、领域专家确定单元42、问题接收单元43、相似度确定单元44和专家匹配单元45；Figure 4 shows a schematic structural diagram of an expert question answering system provided by an embodiment of the present invention. As shown in Figure 4, the expert question answering system of this embodiment may include: a knowledge base construction unit 41, a domain expert determination unit 42, and a question receiving unit 43. Similarity determination unit 44 and expert matching unit 45;

其中，知识库构建单元41用于构建领域知识库，所述领域知识库包括：所述领域的至少一个概念、与每一概念对应的多个实体；Wherein, the knowledge base construction unit 41 is used to construct a domain knowledge base, and the domain knowledge base includes: at least one concept in the domain, and multiple entities corresponding to each concept;

领域专家确定单元42用于根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，所述信息集合中的信息为从所述领域相关的网站或评论中获取的与所述概念或所述实体关联的信息，所述专家为所述信息的发出者或所述信息的接收者；The domain expert determining unit 42 is configured to determine the experts to whom the information in the information collection belongs according to the information collection in the field, and the information in the information collection is obtained from websites or comments related to the field. The concept or the information associated with the entity, the expert is the sender of the information or the recipient of the information;

问题接收单元43用于接收用户输入的问题；The question receiving unit 43 is used for receiving the question input by the user;

相似度确定单元44用于确定所述领域专家确定单元42确定的专家与所述问题接收单元43接收的问题的第一相似度；The similarity determining unit 44 is used to determine the first similarity between the expert determined by the domain expert determining unit 42 and the question received by the question receiving unit 43;

专家匹配单元45用于将所述相似度确定单元44确定的第一相似度按照大小排序，选取排在前N位的第一相似度对应的专家解答所述问题，N为大于等于1的自然数。The expert matching unit 45 is used to sort the first similarities determined by the similarity determining unit 44 according to size, and select the experts corresponding to the first N similarities to answer the questions, and N is a natural number greater than or equal to 1 .

举例来说，所述知识库构建单元41具体用于，向所述领域对应的网站进行定向抓取，建立二元组表单的表单集合，所述表单集合中的表单包括：导航词、所述导航词对应的多个元素组成的元素集合；For example, the knowledge base construction unit 41 is specifically configured to perform directional crawling to the website corresponding to the field, and establish a form set of two-tuple forms, and the forms in the form set include: navigation words, the An element set composed of multiple elements corresponding to the navigation word;

在一种可能的实现方式中，上述的知识库构建单元41还用于，在所述表单集合中存在至少一个表单的导航词未与所述至少一个概念相匹配时，则分别获取未与所述至少一个概念相匹配的导航词所属表单中的元素集合与每一概念的实体集合的第二相似度；In a possible implementation, the above-mentioned knowledge base construction unit 41 is further configured to, when there is at least one navigation word in a form in the form set that does not match the at least one concept, respectively acquire The second similarity between the element set in the form to which the at least one concept matches the navigation word and the entity set of each concept;

可选地，所述知识库构建单元41还用于Optionally, the knowledge base construction unit 41 is also used to

在第二种可选的实现方式中，所述领域专家确定单元42具体用于，获取所述领域对应的社交网站中的信息，确定所述信息内容是否包括所述领域知识库中的概念名称或实体名称；In a second optional implementation manner, the domain expert determination unit 42 is specifically configured to acquire information in social networking sites corresponding to the domain, and determine whether the information content includes concept names in the domain knowledge base or entity name;

和/或，and / or,

可选地，所述领域专家确定单元42还用于，针对所述专家候选集合中的每一专家，获取每一专家在所述信息集合中的所有信息；Optionally, the domain expert determination unit 42 is further configured to, for each expert in the expert candidate set, obtain all information of each expert in the information set;

另外，前述相似度确定单元44具体用于，对所述问题接收单元接收的所述问题进行切词处理，得到与所述问题对应的词的第一集合；In addition, the aforementioned similarity determination unit 44 is specifically configured to perform word segmentation processing on the question received by the question receiving unit to obtain a first set of words corresponding to the question;

本实施例的专家问答系统能够实现在用户提出问题时，查找所述问题所属领域的专家，以使该专家解答用户提出的问题，进而提高了解答问题的准确率，同时提高了专家问答系统的效率。The expert question answering system in this embodiment can realize that when a user asks a question, find an expert in the field to which the question belongs, so that the expert can answer the question raised by the user, thereby improving the accuracy of answering the question, and improving the performance of the expert question answering system at the same time. efficiency.

本实施例的装置，可以用于执行图1至图3所示方法实施例的技术方案，其实现原理和技术效果类似，相关之处参见方法实施例的部分说明即可，此处不再赘述。The device of this embodiment can be used to implement the technical solutions of the method embodiments shown in Figures 1 to 3, and its implementation principles and technical effects are similar, and for relevant parts, please refer to the part of the description of the method embodiments, and will not repeat them here .

图5示出了本发明另一实施例提供的专家问答系统的结构示意图，如图5所示，本实施例的专家问答系统可包括：接收单元51、相似度确定单元52、专家选取单元53和问题解答单元54；Figure 5 shows a schematic structural diagram of an expert question answering system provided by another embodiment of the present invention, as shown in Figure 5, the expert question answering system of this embodiment may include: a receiving unit 51, a similarity determining unit 52, and an expert selecting unit 53 and question answering unit 54;

其中，接收单元51用于接收用户输入的问题；Wherein, the receiving unit 51 is used for receiving the question input by the user;

相似度确定单元52用于确定所述问题与专家问答系统中每一专家的相似度，所述专家为所述问题所属领域的技术熟悉人；The similarity determining unit 52 is used to determine the similarity between the question and each expert in the expert question answering system, and the expert is a technical person familiar with the field to which the question belongs;

专家选取单元53用于将所述相似度按照大小排序，选取排在前N位的相似度对应的专家，N为大于等于1的自然数；The expert selection unit 53 is used to sort the similarity according to the size, and select the expert corresponding to the similarity in the top N positions, where N is a natural number greater than or equal to 1;

问题解答单元54用于使所述专家选取单元53选取的专家为所述用户解答所述问题。The question answering unit 54 is configured to make the expert selected by the expert selecting unit 53 answer the question for the user.

在本实施例中，所述相似度确定单元52具体用于，对所述问题进行切词处理，得到与所述问题对应的词的第一集合；In this embodiment, the similarity determining unit 52 is specifically configured to perform word segmentation processing on the question to obtain a first set of words corresponding to the question;

本实施例中的专家问答系统可实现本领域的专家能够在线自动回答本领域的技术问题，进而提高用户获取问题答案的准确率。The expert question answering system in this embodiment can enable experts in the field to automatically answer technical questions in the field online, thereby improving the accuracy rate of users obtaining answers to questions.

另外，本发明还可采用另一实施例进行说明前述构建的专家问答系统。如图6所示，专家问答系统按照不同领域的划分进行构建，所构建的专家问答系统包括离线处理部分和在线处理部分，对于离线处理部分，在每一特定领域，所述系统包括构建领域知识库模块、专家分数模块，在线处理部分为用户提问模块，如图6所示。In addition, another embodiment of the present invention may be used to describe the expert question answering system constructed above. As shown in Figure 6, the expert question answering system is constructed according to the division of different fields. The constructed expert question answering system includes an offline processing part and an online processing part. For the offline processing part, in each specific field, the system includes building domain knowledge The library module, the expert score module, and the online processing part are the user question module, as shown in Figure 6.

其中，构建领域知识库模块和专家分数模块由专家问答系统在后台构建，以便用户利用用户提问模块提出问题并获取答案和/或推荐专家。Among them, the building domain knowledge base module and the expert score module are built by the expert question answering system in the background, so that users can use the user question module to ask questions and obtain answers and/or recommend experts.

专家问答系统面向多个领域，对于每个特定领域，对应构建领域知识库模块和专家分数模块，构建领域知识库模块中包括概念、分配于概念之下的实体及两者之间的相互关系。对于领域知识库中的领域、概念和关系，可以按照专家问答系统的定位、功能等通过人工或半自动的方式确定。The expert question answering system is oriented to multiple fields. For each specific field, a domain knowledge base module and an expert score module are correspondingly constructed. The domain knowledge base module includes concepts, entities assigned under concepts, and the relationship between the two. The domain, concept and relationship in the domain knowledge base can be determined manually or semi-automatically according to the positioning and function of the expert question answering system.

目前存在的一些领域本体论模型，也可以直接使用。对于知识库中的实体，是构建领域知识库模块的关键部分，也是提高推荐专家与所提问题相关性的基础内容，其内容主要来源于特定领域的专业网站。在本发明的一个优选实施例中，挖掘特定领域的实体通过以下几个图中未示出的步骤完成：Some existing domain ontology models can also be used directly. For the entities in the knowledge base, it is a key part of building the domain knowledge base module, and it is also the basic content to improve the relevance of the recommended experts and the questions asked. The content mainly comes from professional websites in specific fields. In a preferred embodiment of the present invention, mining entities in a specific field is accomplished through the following steps not shown in the figure:

S601、对特定领域的相关专业网站进行定向抓取，抽取其类目导航以及固定格式的表单t，获得表单集合H。S601. Perform directional capture on related professional websites in a specific field, extract their category navigation and form t in a fixed format, and obtain form set H.

本实施例中的每一个表单可为一个二元组信息，例如：表单t＝<导航词,{表单内出现的元素集合ts}>，最终得到一个表单t的表单集合H＝{t|t＝<导航词,{表单内出现的元素集合ts}>。Each form in this embodiment can be a two-tuple information, for example: form t=<navigation word, {element set ts appearing in the form}>, and finally obtain a form set H={t|t of form t ＝<navigation word, {set of elements ts appearing in the form}>.

专业网站的选择可以通过人工或半自动的方式确定，并随着技术和社会的发展不断更新。本实施例中的定向抓取为现有技术，本实施例不对其详述。The selection of professional websites can be determined manually or semi-automatically, and is constantly updated with the development of technology and society. The directional grabbing in this embodiment is a prior art, which is not described in detail in this embodiment.

S602、针对表单集合H中的每一个表单t，判断表单t的导航词和领域知识库中的概念的名字是否相关，若表单的导航词与所述概念相关，则将该表单t的元素集合分配到该概念底下，作为核心实体。S602. For each form t in the form set H, determine whether the navigation word of the form t is related to the name of the concept in the domain knowledge base, if the navigation word of the form is related to the concept, set the elements of the form t Assigned under this concept as a core entity.

此外，本实施例中每一概念的核心实体可组成该概念的实体集合ct。In addition, the core entities of each concept in this embodiment may form the entity set ct of the concept.

本实施例中，还可记录概念与核心实体的相似度为1，进一步地可将该与概念相关的表单t从表单集合H中删除。In this embodiment, it can also be recorded that the similarity between the concept and the core entity is 1, and further, the form t related to the concept can be deleted from the form set H.

S603、若表单集合H中还存在未与领域知识库中概念相关的表单，则抽取集合H中剩余的未匹配的每一个表单t，用表单t的元素集合ts与S602中每一个概念的实体集合ct计算相似度：S603. If there are forms not related to concepts in the domain knowledge base in the form set H, then extract each unmatched form t remaining in the set H, and use the element set ts of the form t and the entity of each concept in S602 Set ct to calculate similarity:

sim＝|t_s∩c_t|/|t_s∪c_t|,sim＝|t _s ∩c _t |/|t _s ∪c _t |,

由此，可选择相似度sim最高的n个概念，将该表单的元素集合ts分配到这些概念下作为非核心实体，并将计算的sim作为非核心实体与概念的相似度。Therefore, the n concepts with the highest similarity sim can be selected, the element set ts of the form can be assigned to these concepts as non-core entities, and the calculated sim can be used as the similarity between non-core entities and concepts.

本实施例中，该步骤中的n根据专家问答系统的准确度和运算效率的综合考虑确定，优选n＝5。In this embodiment, n in this step is determined according to the comprehensive consideration of the accuracy and operation efficiency of the expert question answering system, preferably n=5.

S604、若经过前述的步骤S602和步骤603之后，领域知识库中还存在部分概念未有实体，为确保每一概念都包含实体，可人工补充该概念下的实体。S604. If after the aforementioned steps S602 and 603, there are still some concepts without entities in the domain knowledge base, in order to ensure that each concept contains entities, the entities under the concept can be supplemented manually.

经过以上S601～S604步骤遍历领域知识库中的每一概念，进而形成稳定的知识库。Through the above steps S601-S604, each concept in the domain knowledge base is traversed to form a stable knowledge base.

根据以上步骤形成的知识库，立足于专业网站，对于特定领域和其中的每一个概念，根据相似度算法，量化了每一概念下的实体与概念的相关程度，从而为在后续中客观确定专家和用户所提问题的相关程度奠定了基础。Based on the knowledge base formed by the above steps, based on the professional website, for the specific field and each concept in it, according to the similarity algorithm, the degree of correlation between the entity and the concept under each concept is quantified, so as to objectively determine the expert in the follow-up The degree of relevance to the questions asked by users has laid the foundation.

应当注意的是，本实施例中的专业网站并不局限于技术领域的专业网站，只要包含能够解决特定领域的问题的信息、内容的网站，无论所含信息量的大小，都可以作为候选的专业网站。本领域技术人员可以根据问答系统的覆盖面和精确度要求进行合理的选择。It should be noted that the professional websites in this embodiment are not limited to professional websites in the technical field, as long as the websites contain information and content that can solve problems in specific fields, regardless of the amount of information contained, they can be used as candidates Professional website. Those skilled in the art can make a reasonable choice according to the coverage and accuracy requirements of the question answering system.

在专家分数模块中，专家分数包括两个部分：1)专家排名分数PR2)专家概念相似分数CSVec。前者表示了专家在该领域内的影响力，后者更具体地量化表示专家在领域内偏重于哪些子方向，即与该领域中每一概念的相关程度。In the expert score module, the expert score includes two parts: 1) expert ranking score PR2) expert concept similarity score CSVec. The former indicates the influence of experts in the field, and the latter quantifies which sub-directions experts focus on in the field more specifically, that is, the degree of relevance to each concept in the field.

关于专家排名分数PR，首先，需要确定专家候选集合P，消息集合E。专家问答系统首先抓取社交网络中的消息，例如微博、BBS等，对消息内容进行切词，如果该消息中含有知识库的概念名称或者实体名称，则将消息的发送和接收者加入专家候选集合P；同时，计算该消息与领域的相似度：Regarding the expert ranking score PR, firstly, it is necessary to determine the expert candidate set P and the message set E. The expert question answering system first captures messages in social networks, such as Weibo, BBS, etc., and segments the message content. If the message contains the concept name or entity name of the knowledge base, the sender and receiver of the message are added to the expert list. Candidate set P; at the same time, calculate the similarity between the message and the domain:

m_sim＝sum{sim(w)}/n,m_sim=sum{sim(w)}/n,

其中，sim(w)表示词w与领域的相似度，n表示消息中词的个数。将三元组<消息发送者，消息接收者，消息相似度m_sim>加入消息集合E。Among them, sim(w) represents the similarity between word w and domain, and n represents the number of words in the message. Add the triple <message sender, message receiver, message similarity m_sim> to the message set E.

专家候选集合P、消息E实际形成了一个领域内的社交网络图，如图7所示，在该网络中以五位候选专家为例，对于P1、P2两位候选专家而言，P1、P2之间的箭头表示P1向P2通过社交网络发出了一条消息，那么P1作为消息发送者、P2作为消息接收者都加入到候选集合P中，同时，该消息经过切词后，得到n个词，计算出得到的每一个词与领域的相似度，从而得到该消息与所属领域的相似度m_sim12，由此得到三元组<P1,P2,m_sim12>，将该三元组加入消息集合E中。与此类似，P1向P3也发送了一条消息，所发送的消息与所属领域的相似度为m_sim13，P4向P2也发送了一条消息，所发送的消息与所属领域的相似度为m_sim42，将这些发送者、接收者和消息相似度作为三元组也相应地分别加入到消息结合E中。Expert candidate set P and message E actually form a social network graph in a field, as shown in Figure 7. In this network, five candidate experts are taken as an example. For two candidate experts P1 and P2, P1 and P2 The arrows in between indicate that P1 sends a message to P2 through the social network, then P1 as the message sender and P2 as the message receiver are added to the candidate set P. At the same time, after the message is word-cut, n words are obtained. Calculate the similarity between each word and the field, so as to obtain the similarity m_sim12 between the message and the field to which it belongs, and thus obtain the triplet <P1, P2, m_sim12>, and add the triplet to the message set E. Similarly, P1 also sends a message to P3, and the similarity between the message and the domain it belongs to is m_sim13, and P4 also sends a message to P2, and the similarity between the message it sends and the domain it belongs to is m_sim42. The sender, receiver and message similarity are also added to the message combination E as triplets.

由此，上述得到的专家候选集合P、消息集合E形成了所属领域内的社交网络图，优选可利用现有的PeopleRank算法可以计算出每一个专家的排名分数。Thus, the above obtained expert candidate set P and message set E form a social network graph in the field, and preferably the existing PeopleRank algorithm can be used to calculate the ranking score of each expert.

关于专家的概念相似向量CSVec，对于专家候选集合P中的某一个专家p，搜集该专家p发出的在所属领域的所有消息的消息集合Ep，对于领域中的每一个概念c，计算专家与概念c的相似度：Regarding the concept similarity vector CSVec of experts, for an expert p in the expert candidate set P, collect the message set Ep of all the messages in the field sent by the expert p, and for each concept c in the field, calculate the expert and concept Similarity of c:

其中，c_t表示概念c包含的实体(该实体包括前述的核心实体和非核心实体)。Among them, c _t represents the entity contained in the concept c (the entity includes the aforementioned core entity and non-core entity).

由此得到专家的概念相似向量CSVec。From this, the concept similarity vector CSVec of the expert is obtained.

在现有技术的问答系统中，尚不存在由计算机网络通过计算客观确定的专家相关性和“重要性”指标。本发明实施例的专家问答系统中的专家分数模块，通过在社交网络中对消息进行切词，并将得到的词与知识库模块中的领域进行相关性分析，一方面量化确定了特定领域的专家排名(根据专家在社交网络中发出或接收的消息确定)，另一方面，量化确定了特定专家在每一概念上的相关程度，从而能够在为用户推荐用户问题所属领域内的专家时，具有客观的推荐顺序，同时在推荐时能考虑专家在不同领域的不同表现。In the question answering system of the prior art, there is no expert correlation and "importance" index objectively determined by computer network through calculation. The expert score module in the expert question answering system of the embodiment of the present invention cuts the words of the message in the social network, and performs correlation analysis between the obtained words and the fields in the knowledge base module. Expert ranking (determined according to the messages sent or received by experts in social networks), on the other hand, quantitatively determines the degree of relevance of a specific expert on each concept, so that when recommending experts in the field to which the user's question belongs, It has an objective recommendation order, and at the same time, the different performances of experts in different fields can be considered when recommending.

值得注意的是，由于用户在计算机网络上提交问题时，用户最关注的往往是所推荐的专家尽可能快速地给出回答，而不是所推荐的专家在这个领域内有多强的学术能力或专业知识。因此，在本发明的实施例中，在确定专家排名分数时，主要的影响因素是专家在社交网络中的活跃程度(发消息的次数和对象)以及关注与受关注的程度，而不是专家实际的学术能力或对相关知识的掌握多少。但是，一般而言，在社交网络中排名分数高的专家，往往其所发消息被社会广泛认可的程度相应也越高，因此，本领域实施例中的专家排名方式既能确保用户的提问被专家解答的可能性大，同时也具有较高的客观性和准确性。It is worth noting that when users submit questions on the computer network, what users pay most attention to is that the recommended experts give answers as quickly as possible, rather than how strong the recommended experts are in this field. expertise. Therefore, in the embodiment of the present invention, when determining the expert ranking score, the main influencing factors are the activity degree of the expert in the social network (the number of times and the object of the message) and the degree of attention and attention, rather than the expert's actual academic ability or mastery of relevant knowledge. However, generally speaking, experts with high ranking scores in social networks tend to have a higher degree of social acceptance of their messages. Therefore, the expert ranking method in this field embodiment can ensure that users' questions Experts are more likely to answer, but also have a high degree of objectivity and accuracy.

用户问答模块中，用户可提交问题Q，该模块中的切词单元对问题Q进行切词，得到词的集合Qw＝{w}，进而计算集合Qw与所确定领域中每一个概念的相似度，得到问题概念相似度向量QSVec；In the user question and answer module, the user can submit a question Q, and the word segmentation unit in this module performs word segmentation on the question Q to obtain a word set Qw={w}, and then calculate the similarity between the set Qw and each concept in the determined field , get the question concept similarity vector QSVec;

对于每一个领域专家p，计算该专家与该问题的相似度：For each domain expert p, calculate the similarity between the expert and the problem:

rank_sim＝(CSVec*QSVec)*PR，rank_sim=(CSVec*QSVec)*PR,

其中，CSVec*QSVec表示相似度向量内积，PR为专家领域排名分数，选取rank_sim最高的专家p，作为推荐专家。Among them, CSVec*QSVec represents the inner product of similarity vectors, PR is the ranking score of the expert field, and the expert p with the highest rank_sim is selected as the recommended expert.

此时，专家问答系统通过社交网络应用程序编程接口(ApplicationProgrammingInterface，简称API)接口，将用户的问题推送给专家，由专家解答，从而完成整个问答过程。At this time, the expert question answering system pushes the user's question to the expert through the social network application programming interface (Application Programming Interface, API for short), and the expert answers, thereby completing the entire question answering process.

在本发明实施例中的用户问答模块中，通过对专家概念相似度CSVec向量和问题概念相似度向量QSVec做内积计算，量化确定了问题、概念与专家的相关程度，同时结合专家分数模块中确定的专家“重要性”和相关性指标PR，从而最大可能地提高了用户所提问题与所确定领域的专家之间的相关程度，从而能够客观地为用户推荐专家。In the user question answering module in the embodiment of the present invention, by doing inner product calculation on the expert concept similarity CSVec vector and the question concept similarity vector QSVec, the degree of correlation between the question, the concept and the expert is quantified and determined, and at the same time combined with the expert score module The determined "importance" of experts and the correlation index PR can maximize the degree of correlation between the questions raised by users and the experts in the identified field, so that experts can be objectively recommended to users.

上述通过多个实施例举例说明了C10的一种基于计算机网络的专家问答系统的构建方法，该方法可包括：The above has illustrated a method for constructing a computer network-based expert question answering system of C10 through multiple embodiments, and the method may include:

根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，所述信息集合中的信息为从所述领域相关的网站中获取的与所述概念或所述实体关联的信息，所述专家为所述信息的发出者或所述信息的接收者；According to the information set in the field, determine the expert to whom the information in the information set belongs, and the information in the information set is the information associated with the concept or the entity obtained from a website related to the field , the expert is the sender of the information or the receiver of the information;

若所述专家问答系统接收到问题，则确定所述专家与所述问题的第一相似度，将所述第一相似度按照大小排序，选取排在前N位的第一相似度对应的专家解答所述问题，N为大于等于1的自然数。If the expert question answering system receives a question, determine the first degree of similarity between the expert and the question, sort the first degree of similarity according to size, and select an expert corresponding to the first degree of similarity in the top N places To answer the question, N is a natural number greater than or equal to 1.

本发明的其他实施例还公开了：Other embodiments of the present invention also disclose:

C11、根据前述C10所述的方法，所述构建领域知识库，包括：C11. According to the method described in the aforementioned C10, the construction of domain knowledge base includes:

C12、根据前述C11所述的方法，所述构建领域知识库，还包括：C12. According to the method described in the aforementioned C11, the construction of the domain knowledge base also includes:

其中，M为大于等于1的自然数。Wherein, M is a natural number greater than or equal to 1.

C13、根据前述C11或C12所述的方法，，所述构建领域知识库，还包括：C13. According to the method described in the aforementioned C11 or C12, the construction of the domain knowledge base also includes:

若所述概念中未包括核心实体和非核心实体，则补充所述概念对应的核心实体；If the concept does not include core entities and non-core entities, supplement the core entities corresponding to the concept;

C14、根据前述C10所述的方法，所述根据所述领域的信息集合，确定所述信息集合中所述信息所属的专家，包括：获取所述领域对应的社交网站中的信息，确定所述信息内容是否包括所述领域知识库中的概念名称或实体名称；C14. According to the method described in the aforementioned C10, the determining the expert to whom the information in the information set belongs according to the information set of the field includes: obtaining the information in the social networking site corresponding to the field, and determining the Whether the information content includes the concept name or entity name in the domain knowledge base;

和/或，and / or,

C15、根据前述C14所述的方法，还包括：C15. The method according to the aforementioned C14, further comprising:

C16、根据前述C15所述的方法，若所述专家问答系统接收到问题，确定所述专家与所述问题的第一相似度，包括：C16. According to the method described in C15 above, if the expert question answering system receives a question, determine the first similarity between the expert and the question, including:

D17、一种自动问答方法，包括：D17, an automatic question answering method, comprising:

接收用户输入的问题；Receive user input questions;

确定所述问题与专家问答系统中每一专家的相似度，所述专家为所述问题所属领域的技术熟悉人；determining the degree of similarity between the question and each expert in the expert question answering system, the expert being a technically familiar person in the field to which the question belongs;

将所述相似度按照大小排序，选取排在前N位的相似度对应的专家，N为大于等于1的自然数；Sorting the degree of similarity according to size, selecting experts corresponding to the degree of similarity in the top N places, where N is a natural number greater than or equal to 1;

推荐所选取的专家为所述用户解答所述问题。Recommending the selected expert to answer the question for the user.

D18、根据前述D17所述的方法，所述确定所述问题与专家问答系统中专家的相似度，包括：D18. According to the method described in D17 above, the determination of the similarity between the question and the experts in the expert question answering system includes:

本发明实施例中所涉及的算法或显示不与任何特定计算机、虚拟系统或其他设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容。The algorithms or displays involved in the embodiments of the present invention are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be appreciated that various programming languages can be utilized to implement the inventive concepts described herein.

本发明的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description of the invention, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本发明公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释呈反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, in order to streamline the present disclosure and to facilitate understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together into a single embodiment , figure, or description of it. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在于该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是互相排斥之处，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and installed in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except where at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的一种浏览器终端的设备中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) can be used in practice to implement some or all functions of some or all components in a browser terminal device according to an embodiment of the present invention . The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围，其均应涵盖在本发明的权利要求和说明书的范围当中。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. All of them should be covered by the scope of the claims and description of the present invention.

Claims

1. A computer network-based expert question answering system, characterized in that it comprises:

A knowledge base construction unit, configured to construct a domain knowledge base, the domain knowledge base includes: at least one concept in the domain, and multiple entities corresponding to each concept;

A field expert determining unit, configured to determine the experts to whom the information in the information set belongs according to the information set in the field, where the information in the information set is obtained from websites related to the field and related to the concept or information associated with said entity, said expert being the sender of said information or the recipient of said information;

a question receiving unit, configured to receive a question input by a user;

a similarity determining unit, configured to determine a first degree of similarity between the expert determined by the domain expert determining unit and the question received by the question receiving unit;

The expert matching unit is used to sort the first similarities determined by the similarity determination unit according to their size, and select the experts corresponding to the first similarities in the top N positions to answer the question, where N is a natural number greater than or equal to 1.

2. The system according to claim 1, wherein the knowledge base construction unit is specifically used for

Carry out directional grabbing to the website corresponding to the field, and set up a form set of two-tuple forms, the forms in the form set include: a navigation word, an element set composed of a plurality of elements corresponding to the navigation word;

Determine whether the navigation word of each form in the form set matches the at least one concept, and if the navigation word in a form matches the at least one concept, use the element in the form to which the navigation word belongs as the The core entity corresponding to the at least one concept, and the core entity corresponding to each concept constitutes the entity set of the concept.

3. The system according to claim 2, wherein the knowledge base construction unit is also used for

When the navigation word of at least one form in the form collection does not match the at least one concept, then obtain the element set in the form to which the navigation word that does not match the at least one concept belongs and each concept the second similarity of the entity set;

For multiple second similarities of each unmatched navigation word, the multiple second similarities of the navigation word are sorted according to size, and the element in the list to which the navigation word belongs is regarded as the first M second similarity. The non-core entity in the concept corresponding to the similarity; M is a natural number greater than or equal to 1.

4. The system according to claim 2 or 3, wherein the knowledge base construction unit is also used for

When the concept does not include core entities and non-core entities, supplement the core entities corresponding to the concept;

Wherein, the concept corresponds to multiple entities including: the core entity and/or the non-core entity.

5. The system according to claim 1, wherein the domain expert determination unit is specifically used for

Obtain the information in the social networking site corresponding to the field, and determine whether the information content includes the concept name or entity name in the field knowledge base;

If the information content includes the concept name or entity name, generate a candidate set of experts according to the sender and receiver of the information, and

calculating the third similarity between the information and the field, using the sender of the information, the receiver and the third similarity of the information as a triplet information to generate an information set;

Obtain the ranking of each expert in the expert candidate set according to the experts in the expert candidate set and the information in the information set;

and / or,

The top X experts are selected as the experts to whom the information in the information set belongs, and X is a natural number greater than or equal to 1.

6. The system according to claim 5, wherein the domain expert determination unit is also used for

For each expert in the expert candidate set, obtain all information of each expert in the information set;

According to all information of each expert in the information set and all concepts in the domain knowledge base, the concept similarity vectors of each expert for all concepts are obtained.

7. The system according to claim 6, wherein the similarity determining unit is specifically used for

performing word segmentation processing on the question received by the question receiving unit to obtain a first set of words corresponding to the question;

Obtaining question similarity vectors between the first set and all concepts in the domain knowledge base;

A first degree of similarity between the expert and the question is determined according to the concept similarity vector and the question similarity vector.

8. An expert question answering system, characterized in that, comprising:

a receiving unit, configured to receive a question input by a user;

A similarity determination unit, configured to determine the similarity between the question and each expert in the expert question answering system, where the expert is a technically familiar person in the field to which the question belongs;

An expert selection unit, configured to sort the similarities according to their size, and select experts corresponding to the top N similarities, where N is a natural number greater than or equal to 1;

A question answering unit, configured to recommend an expert selected by the expert selection unit to answer the question for the user.

9. The system according to claim 8, wherein the similarity determining unit is specifically used for

performing word segmentation processing on the question to obtain a first set of words corresponding to the question;

Obtaining the question similarity vectors between the first set and all the concepts in the domain knowledge base, the domain knowledge base is the pre-acquired knowledge of at least one concept and multiple entities corresponding to the at least one concept in the expert question answering system library;

According to the conceptual similarity vector of each expert and the similarity vector of the question, determine the similarity between the expert and the question in the expert question answering system; the conceptual similarity vector of each expert is based on all the information sent by the expert and the All the concepts in the domain knowledge base are obtained in advance, and all the information sent by the experts is the information associated with the concepts or the entities obtained from the domain-related websites or comments.

10. A method for constructing an expert question answering system based on a computer network, characterized in that it comprises:

Constructing a domain knowledge base, the domain knowledge base includes: at least one concept in the domain, and multiple entities corresponding to each concept;

According to the information set in the field, determine the expert to whom the information in the information set belongs, and the information in the information set is the information associated with the concept or the entity obtained from a website related to the field , the expert is the sender of the information or the recipient of the information;

If the expert question answering system receives a question, determine the first degree of similarity between the expert and the question, sort the first degree of similarity according to size, and select an expert corresponding to the first degree of similarity in the top N places To answer the question, N is a natural number greater than or equal to 1.