CN112269909A

CN112269909A - Expert recommendation method based on multi-source information fusion technology

Info

Publication number: CN112269909A
Application number: CN202010964492.1A
Authority: CN
Inventors: 朱全银; 方强强; 李翔; 马甲林; 张柯文; 王文川; 胥心心; 王胜标; 丁行硕; 成洁怡
Original assignee: Huaiyin Institute of Technology
Current assignee: Greater Bay Area Technology Innovation Service Center (Guangzhou) Co.,Ltd.; Guangzhou Jingzhi Information Technology Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-01-26
Anticipated expiration: 2040-09-15
Also published as: CN112269909B

Abstract

The invention discloses an expert recommendation method based on a multi-source information fusion technology, which comprises the following steps: crawling a technical expert scientific and technological paper, an invention patent, fund project information and Web page information to construct a knowledge base, and constructing a keyword dictionary keywords according to keyword fields of the knowledge base; extracting author fields of a knowledge base to perform word frequency co-occurrence analysis, and constructing an expert cooperative relationship subnet; respectively extracting the research direction of Web page experts and personal information by using a regular expression and a named entity recognition algorithm to construct a Web subnet; respectively extracting document-subject and subject-keyword from the abstract field of the knowledge base by lda algorithm, and extracting 5 words with the maximum abstract field weight by TF-IDF algorithm to jointly construct a subject subnet; and constructing and calculating expert centrality values in the expert information network by taking the expert names-mechanisms as constraint conditions and combining the three subnets, sequencing the expert centrality values and recommending the experts ranked at the top 5 as recommendation results.

Description

Expert recommendation method based on multi-source information fusion technology

Technical Field

The invention belongs to the field of multi-source information fusion and expert recommendation, and particularly relates to an expert recommendation method based on a multi-source information fusion technology.

Background

The traditional technical expert recommendation algorithm generally adopts a single data source for recommendation, the single data source recommendation is easily restricted by the data source, the expert information is easily lost, the expert information cannot be comprehensively displayed, the expert recommendation is isolated, the cooperative relationship, the region and the supply and employment organization relationship among experts cannot be effectively expanded, a researcher can only recommend the expert information by one attribute, the multi-dimensional attributes of the experts can be fused according to the restriction conditions by the multi-source information fusion method, the technical expert information can be comprehensively displayed, and the expert relationship information can be expanded by the three subnets of an expert cooperative relationship subnet, a Web subnet and a subject subnet which are constructed by the multi-source information fusion technology, so that the technical expert recommendation is more comprehensive and accurate, and the breadth and the depth of a recommendation result are improved.

The existing research bases of Zhuquanhui, Lixiang, von Willi and the like comprise: zhaoyang, Zhuquan Yin, Huronglin, Dianthus superbus, a mixed recommendation algorithm [ J ] based on a self-coding machine and clustering, 2018,35(11): 52-56; lexiang, Zhu-Mi, Co-clustering and Scoring matrix shared collaborative filtering recommendations [ J ] computer science and exploration 2014,8(06): 751-; liu jin Ling, von Wanli, Zhang Yao red Chinese text clustering method based on rescaling [ J ] computer engineering and applications, 2012,48(21): 146-; the Web science and technology news classification extraction algorithm [ J ] proceedings of the Huaiyin institute of Industrial science and technology, 2015,24(5): 18-24. The patent is applied, published and authorized by cinnabar, plum-circle, von willebra and the like: von Wanli, Zhu quan Yin, Shibenmin, etc., a recommendation method of image review experts based on Pearson similarity and FP # Growth: CN106897370A, 2017.06.27; a logistics recommendation method based on clustering and cosine similarity comprises the following steps of: CN106886872A, 2017.06.23; li Xiang, Zhu quan Yin, Hurong forest, Zhonghong a spectral clustering-based cold-chain logistics stowage can only recommend the method: CN105654267A, 2016.06.08; the university student professional recommendation method based on deep learning comprises the following steps of: CN110188978A, 2019.08.30; an expert combination recommendation method based on image quantity, such as Zhu Quanyin, Ji Rui, Nijinxun, and the like, comprises the following steps: CN110162638A, 2019.08.23; zhu quan Yin, in Shi Min; an expert combined recommendation method based on knowledge graph, such as Huronglin and Von Wanli, comprises the following steps: CN109062961A, 2018.12.21.

The multi-source information fusion technology comprises the following steps:

information fusion, also known as data fusion, also known as sensor information fusion or multi-sensor information fusion, is an information processing process that correlates, correlates and synthesizes data and information obtained from single and multiple information sources to obtain accurate position and identity estimation, and comprehensively and timely evaluates situations, threats and their importance levels; the process is a continuous refining (refining) process for estimating, evaluating and evaluating the demand of the additional information source, and is a process for continuously self-correcting the information processing process so as to improve the result.

The prior patent application of the multi-source information fusion technology comprises the following steps: the information fusion method and system based on the information fusion engine of the ship networking gateway are characterized by comprising the following steps: CN 109814444A, 2019.05.28, the problem of redundant backup and system structure of the information fusion module of the sensor data acquisition system is solved; the transformer substation fire detection system based on multi-sensor information fusion and the detection information fusion method are characterized in that the transformer substation fire detection system comprises a fire detection system and a fire detection information fusion method, wherein the fire detection system comprises the following steps: CN 105185022A, 2015.12.23, the invention can flexibly adapt to complex detection environment, expand detection range, improve sensitivity, reduce false alarm rate, and greatly improve the capability of reliably distinguishing true and false fire and the efficiency and accuracy of fire early warning of a transformer substation; the plum courage and intelligent information fusion image type fire detector and the detection information fusion method comprise the following steps: CN 103630948A, 2014.03.12, the phenomena of false alarm and false alarm are reduced to the utmost extent, and the accuracy and reliability of the image type fire detector are effectively improved; the complex background target identification method based on multi-dimensional information fusion comprises the following steps of: CN 109492700A, 2019.03.19, the accuracy and reliability of target identification are improved.

Prior experts have recommended patent applications including: suyurong and Lishenghua, an expert recommendation method and system: although CN 111160699 a,2020.05.15 can implement a more standard recommendation result for users on the basis of multiple recommendation systems, there may be a problem of information redundancy and a problem of information loss cannot be avoided; zyongfeng, Tantaniusau and Lizhenhua, an expert recommendation method and system based on multiple data sources: CN 111008330A, 2020.04.14, generates recommendation results by adding score fields to experts and sorting the experts according to values corresponding to the score fields, although the patent relates to multi-source data, only scores the expert fields and fails to mine implicit expert relationships among multiple data; wangjian, Sunjiao and forest hongfei, a community question and answer expert recommendation method based on a recurrent neural network comprises the following steps: CN 108021616A, 2018.05.11, the recommendation method can effectively represent sentence grammar and semantic information, reduces manual intervention on recommendation results, but is easily influenced by an original corpus so as to influence the recommendation results.

Although the above patents effectively improve the recommendation result, the recommendation result is not related to the recommendation of the expert relationship and the regional relationship, and the factors such as the regional relationship and the cooperation relationship cannot be comprehensively considered by only recommending the expert, so that the recommendation result is invalid and cannot be applied to actual recommendation.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides an expert recommendation method based on a multi-source information fusion technology, which constructs an expert information network by constructing an expert cooperative relationship subnet, a Web subnet and a subject subnet, fusing the three subnets by taking an expert name-mechanism as a constraint condition, calculating and sequencing the centrality values of experts in the expert information network, and recommending the experts according to a sequencing result.

The technical scheme is as follows: in order to solve the technical problem, the invention provides an expert recommendation method based on a multi-source information fusion technology, which comprises the following specific steps:

(1) and crawling technical expert data to construct a knowledge base and constructing keyword dictionaries.

(2) And extracting author fields of the knowledge base to perform word frequency co-occurrence analysis to construct an expert cooperative relationship subnet.

(3) And respectively extracting the research direction and personal information of Web page experts by using a regular expression and a named entity recognition algorithm to construct an expert Web subnet.

(4) And respectively extracting document-subject and subject-keyword from the abstract field of the knowledge base by using lda algorithm, and extracting 5 words with the maximum weight of the abstract field by using TF-IDF algorithm to jointly construct a subject subnet.

(5) And constructing and calculating expert centrality values in the expert information network by taking the expert names-mechanisms as constraint conditions and combining the three subnets, sequencing the expert centrality values and recommending the experts ranked at the top 5 as recommendation results.

Further, the specific steps of the step (1) are as follows:

(1.1) acquiring a scientific and technical paper document W from a knowledge base, wherein the total number of the W sections is M, and creating a null keyword dictionary keywords;

(1.2) define the Global Loop variable Vi initialized to 1 for traversal W, Vi ∈ (1, M), where W is_ViThe Vi-th document is shown;

(1.3) judging whether Vi is less than or equal to M, if so, executing the step (1.4), and if not, executing the step (1.11);

(1.4) defining initialization of the Loop variable Vij to 1 as document W_VijThe j (th) keyword, Vij ∈ (1, N), N is the document W_VijThe number of keywords;

(1.5) judging whether Vij is belonged to keywords or not, if yes, executing the step (1.6), and if not, executing the step (1.10);

(1.6) the keywords Vij exist in the keyword table, and the writing in of the Vij is abandoned;

(1.7) let Vij ═ Vij + 1;

(1.8) judging whether Vij is less than or equal to N, if so, executing the step (1.5), and if not, executing the step (1.9);

(1.9) let Vi be Vi +1 and perform step (1.3);

(1.10) writing the keyword Vij into a keyword table keywords, and executing the step (1.7);

(1.11) obtaining keyword tables keywords containing all keywords.

Further, the specific steps of the step (2) are as follows:

(2.1) acquiring scientific and technical paper documents W from the knowledge base, wherein the total number of W sections is M, the cyclic variable Vi and the documents W_Vi；

(2.2) judging whether Vi is less than or equal to M, if so, executing the step (2.3), and if not, executing the step (2.5);

(2.3) to W_ViThe authors of scientific and technological articles are separated to obtain the relationship R ═ W_Vi，W_ViaIn which W_ViaIs the W-th_ViThe a-th author name of the article;

(2.4) let Vi be Vi +1 and perform step (2.2);

(2.5) obtaining all the literature authors' relations R after the separation;

(2.6) carrying out frequency statistics on all authors in the relationship R of the authors in the document to obtain the frequency A of the authors, which is { m, Na }, wherein Na is the name of the author, and m is the total number of times of Na occurrence;

(2.7) counting the co-occurrence frequency G ═ { m, Nap Naq }, wherein G represents that the author Nap and Naq co-occur m times;

(2.8) converting the co-occurrence frequency G of the authors into a co-occurrence network to obtain an author relation subnet.

Further, the specific steps of the step (3) are as follows:

(3.1) acquiring expert Web page information from a knowledge base;

(3.2) acquiring expert information of the expert Web page through a named entity recognition algorithm;

(3.3) obtaining personal information of experts;

(3.4) defining a regular expression rule Ru;

(3.5) judging whether the value of the rule Ru on the Web page is empty, if so, executing the step (3.8), and if not, executing the step (3.6);

(3.6) obtaining the research direction of an expert;

(3.7) obtaining the research direction of an expert and the personal information of the expert and constructing a Web subnet;

and (3.8) obtaining the personal information of the experts and constructing the Web subnet.

Further, the specific steps of the step (4) are as follows:

(4.1) acquiring a scientific and technological thesis document W from a knowledge base, wherein the total number of the W sections is M, circulating variables Vi, and creating an empty Abstract text Abstract;

(4.2) judging whether Vi is less than or equal to M, if so, executing the step (4.3), and if not, executing the step (4.5);

(4.3) preparation of document W_ViThe Abstract of the text Abstract is written into an Abstract text Abstract;

(4.4) let Vi be Vi +1 and perform step (4.2);

(4.5) obtaining Abstract text Abstract containing all documents W;

(4.6) adding keyword dictionary keywords and carrying out jieba word segmentation on the Abstract text Abstract to obtain the Abstract text Abstract' after word segmentation;

(4.7) performing document-topic and topic-keyword calculations on Abstract' by the lda algorithm;

(4.8) obtaining document-subject and subject-keyword of Abstract text Abstract;

(4.9) carrying out weight calculation on Abstract' by using a TF-IDF algorithm;

(4.10) acquiring 5 words with the maximum weight in the Abstract;

and (4.11) constructing a topic subnet by the document-topic, the topic-keyword and the 5 words with the maximum weight in the abstract.

Further, the specific steps of the step (5) are as follows:

(5.1) taking an expert cooperative relationship subnet, a Web subnet and a subject subnet;

(5.2) associating the expert cooperative relationship subnet, the Web subnet and the subject subnet by taking the expert name-mechanism as a constraint condition;

(5.3) obtaining an expert information network;

(5.4) calculating and sequencing the expert centrality values in the expert information network;

and (5.5) taking the expert at the top 5 as a final recommendation result according to the sorting result.

By adopting the technical scheme, the invention has the following beneficial effects:

the invention solves the problems of single expert attribute expression and insufficient expert relationship expression caused by insufficient data sources of the conventional recommendation system, constructs an expert cooperative relationship subnet, a Web subnet and a subject subnet by using multi-source information, and integrates the three subnets by taking an expert name-mechanism as a constraint condition to construct a technical expert information network. The recommendation system constructed based on the expert information network can expand the expert information, enhances the relation between the cooperation relation and the regional relation among technical experts, enables the technical expert to recommend more comprehensively and accurately, and improves the breadth and the depth of a recommendation result.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a flow diagram of constructing a keyword dictionary in an exemplary embodiment;

FIG. 3 is a flow diagram of construction of an expert partnership subnet in a specific embodiment;

FIG. 4 is a flow diagram of constructing a Web subnet in an exemplary embodiment;

FIG. 5 is a flow diagram of constructing a topic subnet in a specific embodiment;

FIG. 6 is a flow chart of expert recommendation in an exemplary embodiment.

Detailed Description

The present invention is further illustrated by the following specific examples in conjunction with the national standards of engineering, it being understood that these examples are intended only to illustrate the invention and not to limit the scope of the invention, which is defined in the claims appended hereto, as modifications of various equivalent forms by those skilled in the art upon reading the present invention.

As shown in fig. 1 to 6, the expert recommendation method based on the multi-source information fusion technology according to the present invention includes the following steps:

step 1: crawling technical expert data to construct a knowledge base, and constructing a keyword dictionary keywords:

step 1.1: acquiring scientific and technical paper documents W from a knowledge base, wherein the total number of the W sections is M, and creating a null keyword dictionary keywords;

step 1.2: defining a Global Loop variable Vi is initialized to 1 for traversing W, Vi ∈ (1, M), where W is_ViThe Vi-th document is shown;

step 1.3: judging whether the Vi is less than or equal to M, if so, executing the step 1.4, and if not, executing the step 1.11;

step 1.4: defining initialization of a Loop variable Vij to 1 as document W_VijThe j (th) keyword, Vij ∈ (1, N), N is the document W_VijThe number of keywords;

step 1.5: judging whether Vij belongs to keywords or not, if so, executing the step 1.6, and if not, executing the step 1.10;

step 1.6: the keyword Vij exists in the keyword table, and the writing in of the Vij is abandoned;

step 1.7: let Vij ═ Vij + 1;

step 1.8: judging whether N is equal to or less than Vij, if so, executing the step 1.5, and if not, executing the step 1.9;

step 1.9: let Vi be Vi +1 and perform step 1.3;

step 1.10: writing the keyword Vij into a keyword table keywords, and executing the step 1.7;

step 1.11: keyword tables comprising all keywords are obtained.

Step 2: extracting author fields of a knowledge base to perform word frequency co-occurrence analysis to construct an expert cooperative relationship subnet:

step 2.1: acquiring scientific and technical paper documents W, the total number of W sections is M, cyclic variables Vi and documents W from a knowledge base_Vi；

Step 2.2: judging whether the Vi is less than or equal to M, if so, executing the step 2.3, and if not, executing the step 2.5;

step 2.3: to the W th_ViThe authors of scientific and technological articles are separated to obtain the relationship R ═ W_Vi，W_ViaIn which W_ViaIs the W-th_ViThe a-th author name of the article;

step 2.4: let Vi be Vi +1 and perform step 2.2;

step 2.5: obtaining all separated literature author relations R;

step 2.6: performing frequency statistics on all authors in the relationship R of the authors in the document to obtain the frequency A of the authors, which is { m, Na }, wherein Na is the name of the author, and m is the total number of times of Na occurrence;

step 2.7: counting the co-occurrence frequency G ═ { m, Nap Naq }, wherein G represents that the author Nap and Naq co-occur m times;

step 2.8: and converting the co-occurrence frequency G of the author into a co-occurrence network to obtain an author relation subnet.

And step 3: respectively extracting the research direction of Web page experts and personal information by using a regular expression and a named entity recognition algorithm to construct an expert Web subnet:

step 3.1: acquiring expert Web page information from a knowledge base;

step 3.2: acquiring expert information of an expert Web page through a named entity recognition algorithm;

step 3.3: obtaining expert personal information;

step 3.4: defining regular expression rules Ru;

step 3.5: judging whether the value of the rule Ru on the Web page is empty or not, if so, executing the step 3.8, and if not, executing the step 3.6;

step 3.6: obtaining the research direction of an expert;

step 3.7: obtaining an expert research direction and expert personal information and constructing a Web subnet;

step 3.8: and obtaining the personal information of the expert and constructing a Web subnet.

And 4, step 4: respectively extracting document-subject and subject-keyword from the knowledge base abstract field by lda algorithm, extracting 5 words with the maximum abstract field weight by TF-IDF algorithm, and jointly constructing a subject subnet:

step 4.1: acquiring scientific and technical thesis documents W from a knowledge base, wherein the total number of W sections is M, circulating variables Vi are obtained, and a blank Abstract text Abstract is created;

step 4.2: judging whether the Vi is less than or equal to M, if so, executing a step 4.3, and if not, executing a step 4.5;

step 4.3: document W_ViThe Abstract of the text Abstract is written into an Abstract text Abstract;

step 4.4: let Vi be Vi +1 and perform step 4.2;

step 4.5: obtaining Abstract text Abstract containing all documents W;

step 4.6: adding keyword dictionary keywords and carrying out jieba word segmentation on the Abstract text Abstract to obtain the Abstract text Abstract' after word segmentation;

step 4.7: carrying out document-subject and subject-keyword calculation on Abstract' by an lda algorithm;

step 4.8: obtaining a document-subject and a subject-keyword of the Abstract text Abstract;

step 4.9: carrying out weight calculation on Abstract' by a TF-IDF algorithm;

step 4.10: acquiring 5 words with the maximum weight in the Abstract;

step 4.11: and (4) constructing a topic subnet by the document-topic, the topic-keyword and the 5 words with the maximum weight in the abstract.

And 5: constructing and calculating expert centrality values in an expert information network by taking an expert name-mechanism as a constraint condition and combining three subnets, sequencing the expert centrality values and recommending the experts ranked 5 above as recommendation results:

step 5.1: taking an expert cooperative relationship subnet, a Web subnet and a subject subnet;

step 5.2: associating the expert cooperative relationship subnet, the Web subnet and the subject subnet by taking the expert name-mechanism as a constraint condition;

step 5.3: acquiring an expert information network;

step 5.4: calculating and sequencing the centrality values of the experts in the expert information network;

step 5.5: and taking the expert at the top 5 as a final recommendation result according to the sorting result.

The variables involved in the above steps are shown in the following table:

39382 pieces of data are processed, and expert information, document abstracts, keywords and Web page information are extracted from the crawled data to construct a knowledge base. The expert cooperation relationship subnet, the Web subnet and the subject subnet are established through a multi-source information fusion technology, the technical expert information network is established by taking an expert name-mechanism as a constraint condition, and the expert recommendation system is established by combining the technical expert information network.

The invention creatively provides an expert recommendation method based on a multi-source information fusion technology, which solves the problem of single attribute expression of the existing expert recommendation system, constructs an expert cooperative relationship subnet, a Web subnet and a subject subnet by using multi-source information, and fuses three subnets to construct a technical expert information base by taking an expert name-mechanism as a constraint condition. The technical expert recommendation system fusing the three subnets can comprehensively display expert information, deep level association recommendation can be performed according to the cooperation relationship and the regional relationship among the technical experts, the breadth and the depth of a recommendation range are improved, and the expert recommendation has higher accuracy and expansibility.

Claims

1. An expert recommendation method based on a multi-source information fusion technology is characterized by comprising the following specific steps:

(1) crawling technical expert data to construct a knowledge base and constructing keyword dictionaries;

(2) extracting author fields of a knowledge base to perform word frequency co-occurrence analysis to construct an expert cooperative relationship subnet;

(3) respectively extracting the research direction of Web page experts and personal information by using a regular expression and a named entity recognition algorithm to construct an expert Web subnet;

(4) respectively extracting document-subject and subject-keyword from the abstract field of the knowledge base by lda algorithm, extracting 5 words with the maximum weight of the abstract field by TF-IDF algorithm, and constructing a subject subnet together;

2. The expert recommendation method based on the multi-source information fusion technology according to claim 1, wherein the specific steps of constructing the keyword dictionary keywords in the step (1) are as follows:

(1.4) define Loop variable Vij initializationIs 1 in the document W_VijThe j (th) keyword, Vij ∈ (1, N), N is the document W_VijThe number of keywords;

(1.7) let Vij ═ Vij + 1;

(1.9) let Vi be Vi +1 and perform step (1.3);

(1.11) obtaining keyword tables keywords containing all keywords.

3. The expert recommendation method based on the multi-source information fusion technology according to claim 1, wherein the specific steps of extracting author fields of the knowledge base in the step (2) and performing word frequency co-occurrence analysis to construct an expert cooperation relationship subnet are as follows:

(2.4) let Vi be Vi +1 and perform step (2.2);

(2.5) obtaining all the literature authors' relations R after the separation;

(2.7) counting the co-occurrence frequency G ═ { n, Nap Naq }, wherein G denotes that the author Nap and Naq co-occur n times;

4. The expert recommendation method based on the multi-source information fusion technology according to claim 1, wherein the step (3) of extracting the research direction of the Web page expert and the personal information by using the regular expression and the named entity recognition algorithm respectively to construct the expert Web subnet specifically comprises the following steps:

(3.1) acquiring expert Web page information from a knowledge base;

(3.3) obtaining personal information of experts;

(3.4) defining a regular expression rule Ru;

(3.6) obtaining the research direction of an expert;

5. The expert recommendation method based on multi-source information fusion technology according to claim 1, characterized in that the specific steps of obtaining document-subject, subject-keyword and 5 words with the largest weight of the summary field by lda and TF-IDF algorithm in the step (4) are as follows:

(4.4) let Vi be Vi +1 and perform step (4.2);

(4.5) obtaining Abstract text Abstract containing all documents W;

(4.8) obtaining document-subject and subject-keyword of Abstract text Abstract;

(4.9) carrying out weight calculation on Abstract' by using a TF-IDF algorithm;

(4.10) acquiring 5 words with the maximum weight in the Abstract;

6. The expert recommendation method based on the multi-source information fusion technology according to claim 1, wherein the specific steps of constructing and calculating the centrality value of the expert in the expert information network by using the expert name-organization as a constraint condition and combining three subnets in step (5), and using the expert with the centrality value ranked 5 top as a recommendation result are as follows:

(5.3) obtaining an expert information network;