CN101075942A - Method and system for processing social network expert information based on expert value progation algorithm

Info

Abstract

Description

Claims

CN101075942A

Publication number: CN101075942A
Application number: CNA2007101177193A
Authority: CN
Inventors: 唐杰; 张静; 李涓子
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2007-06-22
Filing date: 2007-06-22
Publication date: 2007-11-21
Anticipated expiration: 2027-06-22
Also published as: CN100583804C

The method comprises: establishing a society relation network graph by using society relation network generation server; said graph is described by using personal description information, human relation, relation type and the mapping function between human relation and relation type; the expert value calculation server reads the information from the database server, and according to the weighting, analyzes the correlation between the key word, the person description information and the thesis information to get a initiative expert value; then according to the importance and osculation of human relation, constructs a propagator matrix; according to the initiative expert value and the propagator matrix, makes iteration to get the expert value of all experts; after normalization, arranging it in descending order; outputs it to the Web server to provide the option for user.

The community network expert info treatment system and the method that are worth propagation algorithm based on the expert

Technical field

The invention belongs to the community network field of information processing, relate in particular to the community network search under the Internet.

Background technology

Along with rapid development of Internet and the Web1.0 transformation to Web2.0, community network becomes an extremely important Web gradually and uses.For example, we can utilize the community network job search, seek the employee, seek the friend of common interest, and seek the affiliate.

For we provide the lot of data source, help to carry out on this basis some data minings, Knowledge Discovery based on the community network of Web, these researchs have great challenge in the Web2.0 epoch.For example, the excavation of trust chain, personalized recommendation and expert's discovery etc.

The target that expert info is handled is the expert who finds to have a certain domain-specific knowledge automatically.By finding the expert, we can obtain the answer of some professional problems apace by the expert, have remedied " The High Cost ofNot Finding Information " problem that traditional search engines exists, and have improved effectiveness of retrieval greatly.This technology will be brought huge user and frequency of utilization to the Internet, is the important means that embodies the Internet new technology and economic worth.

The expert's discover method that has proposed mainly is conceived to carry out expert info and handles from the data of Web or non-structureization.Usual way is that the expert is found to regard as traditional information retrieval: some individuals that at first everyone are associated describe documents (as personal homepage, Email, publish thesis) and comprehensively generate a document, and everyone just can represent with this document like this; Use the method for conventional information retrieval then, these documents are sorted, promptly can obtain corresponding expert and tabulate according to the degree of correlation of document and searching keyword.Yet investigation finds, the work of finding the expert based on community network also seldom.In community network, except the individual descriptor that everyone depended on, the interpersonal relation that also has complexity, these relation informations are for recommending the expert that important effect is arranged.Traditional method has often been ignored the importance of community network relation, and we think and utilize interpersonal relation in community network to find that the expert has crucial meaning.

At the problems referred to above, the present invention proposes expert info treatment system and method in the community network that is worth propagation algorithm based on the expert.This method comprises two stages.In the phase I, mainly utilize individual descriptor to carry out candidate expert's discovery, and give initial expert's value for each candidate expert.In second stage, (each node is represented a people among the figure to utilize initial expert's value of the candidate expert that the phase I obtains and graph structure of the structure of the incidence relation between them, every interpersonal a kind of relation is represented on the limit), propagate expert's value along the direction on limit in the drawings then, promptly the incidence relation of representing according to the expert's value and the limit of node is revised expert's value of adjacent node, make each node obtain the new expert value relevant, thereby realize more high-precision expert info processing based on community network with community network.

Summary of the invention

The object of the present invention is to provide a kind of community network expert info treatment system and method that in community network, is worth propagation algorithm based on the expert.

The thinking of system and method proposed by the invention is: adopt a kind of general community network descriptive model, based on the target of this model definition expert info processing.Under the guidance of this target, at first utilize individual descriptor to retrieve the relevant expert in certain field (as: data mining field), as candidate expert, and calculate initial expert's value for each candidate expert.This step is based on a hypothesis, very many (for example: a people has delivered a lot of papers in the data mining field if people is about the descriptive information in certain field, repeatedly mention data mining in a people's the personal homepage), so very possible this people is exactly the expert in this field; Utilize candidate expert's initial expert's value and the incidence relation between them to make up community network subgraphs then, and everyone expert's value is propagated neighbours to him along the direction on limit based on this figure for these candidate's brainstrusts.This communication process iteration is always gone down up to algorithmic statement, finally obtains expert's tabulation that all expert's value all no longer changes, and feeds back to the user after being worth ordering according to the expert.This step is also based on a hypothesis, if people is familiar with the expert in many certain fields, and recommended by these experts, and he probably also is the expert in this field so.

Our idea comes from the observation of actual life being sought the expert.In reality, we seek the introduction that the expert a) reads a people by following two kinds of methods usually and judge whether he reaches the standard of expert's qualification; B) the seek help expert of our understanding helps and recommends other expert.Yet,, be difficult to directly to introduce the expert level of judging a people and determine the degree (for example: the strong recommendation, the general recommendation) that other people recommend by the individual for machine.Therefore, we are with a people's the introduction individual descriptor specific to the expert level that can embody him, comprise individual essential information (as: position, unit, research interest, homepage address, phone, E-mail address) and paper information that he delivers (as: title, deliver meeting title, partner); Simultaneously, the recommendation degree can be embodied as and concern weight (calculating of weight will specifically be set forth in step 3) between nominator's expert level and nominator and the presentee, so-called here " nominator " refer to and the presentee between have the candidate expert of certain relation.In a word, we judge a people's expert level, mainly factor aspect the individual descriptor by taking all factors into consideration him and he and other people these two of the incidence relations in community network.

Described method is based on community network that has existed specific implementation according to the following steps successively, and the step block diagram that is adopted is asked for an interview Fig. 1, and this method comprises the steps.

Step 1: make up community network.

In the present invention, we are described as a figure (Graph) to community network.

If: community network is G=(V, E, T, τ), wherein V is a set of node, and each node v ∈ V represents a people in the community network, and a people can have multiple descriptor, as individual essential information (comprising position, unit, research interest, homepage address, phone, E-mail address) and paper information that he delivers (as: title, deliver meeting title, partner); E  V * V is the set on limit, e ^t _Ij∈ E represents the people v in the community network _iAnd v _jBetween a kind of t that concerns of existing; T is relationship type set, t ∈ T represent interpersonal a kind of social relationships type (as " with the article author ", " and quilt ... instruct ", " in same project, working ", " being friend "); τ: E → T is a mapping function from the limit to the relationship type, τ (e ^t _Ij)=t.Limit among the figure can be unidirectional, also can be two-way.A kind of relation of symmetry has been represented on two-way limit.

After community network of formalization, the task that expert info is handled just can be defined as: a given field keyword q

In the present invention, we have used the keyword of 13 computer realms to do test, these 13 fields comprise Ontology Mapping, Semantic Web, data mining, information extraction, Boosting study, SVMs, Planning study, intelligent agent, machine learning, natural language processing, encrypt and learn, computer vision, neural net, corresponding keyword is OntologyAlignment, Semantic Web, Data Mining, Information Extraction, Boosting, Support VectorMachine, Planning, Intelligent Agent, Machine Learning, Natural Language Processing, Cryptography, Computer Vision, Neural Networks.), from entire society's network G, search expert collection, and with them with a tabulation R={ (v that ordering is good _i, s (v _i)) return s (v wherein _i) represented a people v _IExpert value.

Illustrate a community network below, one really based on breadboard community network as shown in Figure 2.All relations and the weight that exist among Fig. 2 are as shown in table 1:

Table 1, all relations and the weight handled among Fig. 2

Relation	Expressed meaning	Weight
Relation	Expressed meaning	Weight	With the article author	The author of two artificial same piece of writing papers	2
Quilt ... instruct	A people is another person's student	4	With the article author	The author of two artificial same piece of writing papers	2
Quilt ... instruct	A people is another person's student	4	In same project, work	Two people work in same project	3
Be friend	Two people are good friends	1	In same project, work	Two people work in same project	3

In this drawing, we can find for " doctor Tang ":

1) he has a sensing " Wang professor's " unidirectional quilt ... the relation that instructs;

2) he has four two-way relations, as and " Xiao Hong " between exist with the article author relationships.

May there be multiple relation between two people, as existing between " doctor Tang " and " Wang professor " with article author and quilt ... the two kinds of relations that instruct.

Step 1 is exactly to create the figure of a community network, and the node among the figure is exactly the people, and the limit among the figure i.e. 4 kinds of limits of definition in the table 1.Community network is stored in the relational database, and the present invention adopts following database structure to store social relation network:

1) relation and weight table, all relations and weight in this table storage social relationships net, as shown in table 2:

Table 2, the relationship type among the community network figure and concern weight table

Relation identity	Concern title	Explanation	Weight
Relation identity	Concern title	Explanation	Weight		1	With the article author	The author of two artificial same piece of writing papers	2
2	Quilt ... instruct	A people is another person's student	4		1	With the article author	The author of two artificial same piece of writing papers	2
2	Quilt ... instruct	A people is another person's student	4	3	In same project, work	Two people work in same project	3
4	Be friend	Two people are good friends	1	3	In same project, work	Two people work in same project	3

2) individual Basic Information Table, the proprietary essential information in this table storage community network, in the present invention, we have stored " position ", " unit " " research interest ", and are as shown in table 3:

Table 3, the individual Basic Information Table among the community network figure

People's sign	Name	Position	Unit	Research interest
People's sign	Name	Position	Unit	Research interest	1	Doctor Tang	Ph.D (doctor)	Keg, tsinghua (Tsing-Hua University's knowledge engineering group)	Semantic Web (semantic network)
2	Xiao Hong	Master (master)	Keg, tsinghua (Tsing-Hua University's knowledge engineering group)	Information Extraction (information extraction)	1	Doctor Tang	Ph.D (doctor)		Semantic Web (semantic network)
2	Xiao Hong	Master (master)		Information Extraction (information extraction)	3	Wang professor	Ph.Dmentor (doctoral advisor)	Keg, ts inghua (Tsing-Hua University's knowledge engineering group)	Semantic Web (semantic network)
4	The Cai professor	Master mentor (master supervisor; )	Keg, tsinghua (Tsing-Hua University's knowledge engineering group)	Data Mining (data mining)	3	Wang professor	Ph.Dmentor (doctoral advisor)		Semantic Web (semantic network)
4	The Cai professor	Master mentor (master supervisor; )		Data Mining (data mining)	…	…	…	…	…

3) paper information table, the paper information that everyone delivers in this table storage community network, in the present invention, we have stored " paper title " and " delivering meeting ".As shown in table 4:

Table 4, the paper information table among the community network figure

The paper sign	The paper title	Deliver meeting
The paper sign	The paper title	Deliver meeting	1	A Unified Tagging Approach to Text Normalization (the normalized unified labeling method of text)	ACL ' 2007 (computational linguistics annual meeting)
2	Semantic annotation using horizontal and vertical contexts (utilizing level to carry out semantic tagger) with vertical context	ASWC ' 2006 (Asia semantic network meeting)	1		ACL ' 2007 (computational linguistics annual meeting)
2		ASWC ' 2006 (Asia semantic network meeting)	3	Multiple strategies detection in ontology mapping (many strategies of Ontology Mapping are found)	WWW ' 2005 (WWW meeting)
…	…	…	3		WWW ' 2005 (WWW meeting)

4) people and the paper table of comparisons, people and the contrast information that publishes thesis in this table storage community network, as shown in table 5:

Table 5, the people among the community network figure and the paper table of comparisons

Sign	People's sign	The paper sign
Sign	People's sign	The paper sign	1	1	1
2	1	2	1	1	1
2	1	2	3	1	3

Sign	People's sign	The paper sign
Sign	People's sign	The paper sign	4	2	2
5	3	3	4	2	2
5	3	3	…	…	…

Biao first line display wherein, people's identification nodes 1 (doctor Tang) is the author of paper identification nodes 1 (A Unified TaggingApproach to Text Normalization).

5) node relationships table, the relation between the people in the sign community network, as shown in table 6:

Table 6, community network node relationships table

Wherein the row of first in the table has between identification nodes 2 (Xiao Hong) and the node 4 (professor Cai) to concern 2 (quilts ... instruct), second line display node 4 (professor Cai) and node 3 (Wang professor) have concern 3 (working) in same projects.

So far, a community network makes up and finishes.

Step 2: retrieve the candidate expert relevant, obtain a community network subgraph with a certain field.

In this step, the user (for example: data mining) imports any one field keyword q, from entire society's network, search the expert relevant as candidate expert, and calculate initial expert's value, constitute the community network subgraph in this field for each candidate expert with this field.In this step, we have only utilized individual descriptor described in the step 1 (comprising table 3, the content of table 4 and table 5) to go to retrieve candidate expert.

For everyone v _iHis individual essential information is connected into a big document d (in the present invention, we have used the connection of " position ", " unit " in the table 3, " research interest " three fields as d, at this time we no longer distinguish the field restriction of " position ", " unit ", " research interest ", handle but regard d as a common document); And use p _kRepresent that (in the present invention, we have only used " the paper title " of table 4, and the connection of " delivering meeting " two fields is as p for the information of his each piece paper _k, we no longer distinguish " paper title " equally, the field restriction of " delivering meeting ", but with p _kRegarding a common document as handles).

Probabilistic model in the user imports a field keyword q, and exploit information is retrieved is estimated the probability that field keyword q occurs in document d, represent this probability with p (q|d); Estimate that simultaneously field keyword q is at every piece of paper p _kThe probability that occurs is with p (q|p _k) represent this probability.

Because the field keyword q of user's input does might comprise a plurality of speech after the participle, for example: comprise " data " and " excavation " two speech behind " data mining " participle.At this time, the probability that " data " this speech that obtains behind the field that the is actually keyword q participle that we will estimate occurs in d, and the probability that in d, occurs of " excavation " this speech, and then these two probable values are combined with (formula 1 has been represented the method that we are used) someway, promptly obtain the p (q|d) that we finally will estimate, p _kIn like manner.We represent certain speech (for example: " data " in " data mining " or " excavation ") behind the q participle with t.We say to have only certain the speech t that occurs among the q also can appear at d and p simultaneously _kIn, p (q|d) and p (q|p _k) these two probable values just can be greater than 0, otherwise probable value is 0.

Suppose that each speech t is separate in document d, then can connect each t and appear at Probability p (t|d) among the d with connecting multiplication sign, so the Probability p (q|d) that field keyword q appears among the document d can be expressed as:

p (q | d) = \underset{t &Element; q}{Π} p {(t | d)}^{n (t, q)} . . . 1)

Wherein (t, q) expression t appears at the number of times of q (for example, q is " data mining ", and speech t is " data ", then the frequency n (t is 1 q)) that occurs in q of t n.In this formula, the Probability p (t|d) that speech t appears among the document d can be estimated except that the total speech number that comprises among the d with the number of times that t occurs in d.Consider that p (t|d) might be 0, can cause the result who connects in the formula (1) after taking advantage of like this is 0, so need do smoothing processing to p (t|d):

p(t|d)＝(1-λ)p(t|d)+λp(t) 2)

Wherein, the λ value is [0,1], and the number of times that p (t) can word t occurs in proprietary document d removes the total speech number that comprises among the proprietary document d and estimates.The expansion of formula (2) has been arranged, and formula (1) just can be write as:

p (q | d) = \underset{t &Element; q}{Π} {((1 - λ) p (t | d) + λp (t))}^{n (t, q)} . . . 3)

Equally, use another probabilistic model p (q|p _k) estimate that field keyword q is at v _iEvery piece of paper p _kThe middle probability that occurs.Model definition following (the wherein same p of the estimation of the definition of symbol and probability (q|d)):

p (q | p_{k}) = \underset{t &Element; q}{Π} {((1 - λ) p (t | p_{k}) + λp (t))}^{n (t, q)} . . . 4)

Based on formula (3) and formula (4), we can calculate v _iEssential information and the degree of correlation of each piece paper and field keyword q, this two parts degree of correlation is carried out linear fit:

s {(v_{i})}^{0} = α \cdot p (q | d) + (1 - α) \cdot \underset{p_{k} &Element; P}{Σ} if (p_{k}) \times p (q | p_{k}) . . . 5)

Wherein the α span is [0,1] (we establish a=0.5 in experiment); P represents v _iWhole papers of delivering; p _kOne piece of paper among the expression P; If (p _k) expression p _KThe factor of influence of the meeting of delivering or magazine, this factor of influence has reflected the authoritative degree of meeting or magazine, be generally and determine by hand (we in experiment from Http:// citeseer.ist.psu.edu/impact.htmlThe meeting factor of influence that last this network of collection is put in order).Finally, s (v _i) ⁰Be everyone v _iInitial expert value.In experiment, we select initial expert's value to come the candidate expert of preceding 1000 people as the algorithm second stage.

Be that example is calculated " doctor Tang " initial expert's value about keyword " Semantic Web " with " doctor Tang " and " Semantic Web " below:

1) obtains t according to " Semantic Web " ₁Be " Semantic ", t ₂Be " Web ", suppose p (t ₁)=0.01, p (t ₂)=0.05.

2) according to table 3,4,5, the d of " doctor Tang " is expressed as " ph.D Keg tsinghua Semantic Web ", p ₁Be " A Unified Tagging Approach to Text Normalization ACL ' 2007 ", p ₂Be " Semanticannotation using horizontal and vertical contexts ASWC ' 2006 ", p ₃Be " Multiplestrategies detection in ontology mapping WWW ' 2005 ".

3) p (t|d) estimates except that the total speech number that comprises among the d with the number of times that t occurs in d, so p (t ₁| d)=and 1/5=0.2, p (t ₂| d)=1/5=0.2; P (t ₁| p ₁)=0, p (t ₂| p ₁)=0; P (t ₁| p ₂)=1/7=0.143 (denominator is that 7 reason is to have filtered out high frequency words " and ", so only remaining 7 speech), p (t ₂| p ₂)=0; P (t ₁| p ₃)=0, p (t ₂| p ₃)=0.

4) carry out smoothly according to formula (2), establishing λ is 0.5, so obtain new p (t ₁| d)=and 0.5*0.2+0.5*0.01=0.105, p (t ₂| d)=0.5*0.2+0.5*0.05=0.125; P (t ₁| p ₁)=0.5*0+0.5*0.01=0.005, p (t ₂| p ₁)=0.5*0+0.5*0.05=0.025; P (t ₁| p ₂)=0.5*0.143+0.5*0.01=0.077, p (t ₂| p ₂)=0.5*0+0.5*0.05=0.025; P (t ₁| p ₃)=0.5*0+0.5*0.01=0.005, p (t ₂| p ₃)=0.5*0+0.5*0.05=0.025.

5) according to formula (3) and formula (4) (n (t wherein ₁, q)=1, n (t ₂, q)=1), p (q|d)=p (t then ₁| d) ¹* p (t ₂| d) ¹=0.105*0.125=0.013, p (q|p ₁)=p (t ₁| p ₁) ¹* p (t ₂| p ₁) ¹=0.005*0.025=0.000, p (q|p ₂)=p (t ₁| p ₂) ¹* p (t ₂| p ₂) ¹=0.077*0.025=0.002, p (q|p ₃)=p (t ₁| p ₃) ¹* p (t ₂| p ₃) ¹=0.005*0.025=0.000.

6) (suppose wherein if (p according to formula (5) at last ₁)=3, if (p ₁)=2, if (p ₁)=1, α=0.5), so obtain s (v _i) ⁰=0.5*0.013+0.5* (3*0.000+2*0.002+1*0.000)=0.009 is initial expert's value of doctor Tang.

Step 3: utilize one based on the iterative algorithm of propagating thought, upgrade expert's value of candidate expert.

The candidate expert's of phase I initial expert's value and the relation between them have constituted a community network subgraph about field q (wherein each node is represented a candidate expert, and a kind of relation between two experts is represented on every limit).On the basis of this community network subgraph, we have proposed to propagate the iterative algorithm of expert's value.In iteration each time, v _iThe expert be worth s (v _i) ⁿPropagate to be given and the candidate expert of relevant relation between him.Therefore, the renewal of expert's value of each candidate expert is influenced by two factors: 1) nominator's expert is worth 2) weight that concerns between nominator and the presentee

We indicate expert's value of a node to propagate into the degree of his neighbors with propagation coefficient.M is used for representing the propagation coefficient matrix; M _IjExpression is from v _iTo v _jPropagation coefficient (value is between 0 to 1).M _IjBe defined as follows:

M_{ij} = \underset{t}{Σ} c (τ (e_{ij}^{t})) * w (e_{ij}^{t}) . . . 6)

Wherein, c (τ (e ^t _Ij)) a kind of e that concerns of representative ^t _IjWeight (at present we manual these weights are provided with).W (e ^t _Ij) represent v _iAnd v _jBetween concern e ^t _IjLevel of intimate.(level of intimate can be calculated by multiple distinct methods, and as the relation for " with the article author ", we are used as level of intimate to the quantity of cooperation paper).

In the present invention, because we only collect a kind of relation data, " with the article author ", so formula (6) just is simplified to M _Ij=w (e _Ij ^{With the article author}), w (e _Ij ^{With the article author}) be defined as follows:

Wherein | U _i| expression v _iAll " with article authors " concern number (we regard a kind of two-way relation as with " with the article author ").

In our algorithm,, only need expert's value is passed to destination node from source node for unidirectional relation; For two-way relation, the expert is worth needs and propagates mutually.

The expert is worth vectorial S during the n+1 time iteration ^N+1(S ^N+1Represent that all experts' expert is worth the vector of composition, each dimension S wherein _i ⁿRepresent v _iExpert value) computing formula is as follows:

S ⁿ⁺¹＝(1-ω)S ⁿ+ωM ^TS ⁿ 8)

Wherein S represents that the expert is worth vector (being proprietary expert's value); The ω span is between 0 to 1, and it has represented the punishment to propagation distance, and promptly number of iterations is many more, and the propagation distance of relation is just far away more, its confidence level just low more (in experiment, we it be set to 0.85).After each iteration finished, proprietary expert's degree will be done normalization, made that maximum expert's value is always 1 among everyone, was calculated as follows:

S_{i}^{n + 1} = \frac{S_{i}^{n + 1}}{\max (S_{i}^{n + 1})} . . . 9)

This algorithm can iterate to a certain end condition always.The variation that the present end condition that is provided with is proprietary expert's value in the subgraph all is limited in the threshold range (the present invention is set to 0.05), and perhaps algorithm iteration stops (the present invention is set to 100) to a certain fixing number of times.

Proposed by the invention be worth the community network expert info treatment system of propagation algorithm and the outstanding contributions of method are based on the expert, we have not only utilized the degree of correlation of individual descriptor and field keyword to differentiate a people's expert level, also utilize in the community network interpersonal incidence relation to propagate expert's value simultaneously, thereby reached the effect that the expert recommends the expert.

Illustrate an iterative computation process of step 3 below, for example as shown in Figure 3.

Suppose that Fig. 3 left-half is expert's value of the n time each node,

S_{1}^{n} = 0.6, S_{2}^{n} = 0.7, S_{3}^{n} = 1.0, S_{4}^{n} = 0.2,

" with the article author " relation is represented to exist between two nodes in limit among the figure.

Can get w (e by formula (7) ₁₂)=w (e ₁₃)=w (e ₁₄)=1/3, w (e ₂₁)=w (e ₃₁)=w (e ₄₁)=1 is then got by formula (8):

S_{1}^{n + 1} = 0.15 * 0.6 + 0.85 * (1 * 0.7 + 1 * 1.0 + 1 * 0.2) = 0.94;

S_{2}^{n + 1} = 0.15 * 0.7 + 0.85 * 1 / 3 * 0.6 = 0.275;

S_{3}^{n + 1} = 0.15 * 1.0 + 0.85 * 1 / 3 * 0.6 = 0.32;

S_{4}^{n + 1} = 0.15 * 0.6 + 0.85 * 1 / 3 * 0.6 = 0.2 .

Wherein

\max (S_{i}^{n + 1}) = 0.94,

Can get according to formula (9) at last:

S_{1}^{n + 1} = 0.94 / 0.94 = 1.0;

S_{2}^{n + 1} = 0.275 / 0.94 = 0.29;

S_{3}^{n + 1} = 0.32 / 0.94 = 0.34;

S_{4}^{n + 1} = 0.2 / 0.94 = 0.21 .

Be the result of the n+1 time iteration shown in Fig. 3 right half part.

Step 4: the expert that will propagate after finishing sorts according to expert's value, exports to the user.

Fig. 4 is the example of relevant " data mining " inquiry output.

The method summary that we are proposed is following handling process:

Input: field keyword q (for example: data mining) and community network G=who builds (V, E, T, λ)

(the community network G here is static structure, and promptly our step 1 needs to carry out once, and is irrelevant with inquiry)

Output: expert's tabulation of sorting by expert's value

Step 2. retrieval candidate expert;

// be that example is explained with the academic research network here

1. for everyone v _i, the utilization probabilistic model calculates his the individual essential information d and the degree of correlation of q;

2. use another probabilistic model to calculate v _iEvery piece of paper p _kThe degree of correlation with q;

3. 2 and 3 the degree of correlation is integrated consideration, the value that calculates is as v _iInitial expert value;

4. select initial expert to be worth higher some people, and utilize the relation between them to make up a community network subgraph as candidate;

Step 3. is propagated candidate expert's value;

5.do{

6. calculate the propagation coefficient matrix M according to formula (6);

7. for each v in the subgraph _i{

8. upgrade his expert's degree s (v according to formula (8) _i);

9. }

10. do normalization according to formula (9);

11. while (end condition satisfies);

The expert that step 4. will be propagated after finishing sorts according to expert's value, exports to the user

The combine closely new feature of current Web data of the present invention, promptly Web goes up this feature of emerge in multitude community network focusing on people.Utilize person to person's complicated social relation in the community network, strengthen Web and go up the expert info treatment effect.Novelty of the present invention is embodied in, and we propose to find the expert in community network.Traditional expert's discovery only is to utilize the similarity of document and searching keyword to judge expert's degree of a people, and the method that we propose is not only considered the document similarity, but also considers that interpersonal relation is strengthened the effect that expert info is handled in the community network.Its creativeness shows propose how to utilize the method that interpersonal relation is carried out the expert info processing in the community network.We turn to a graph structure with the community network form, propose the propagation renewal that a kind of propagation algorithm carries out expert's value based on this figure then.The function of utilizing expert info to handle, we can obtain the answer of some professional problems apace by the expert, thereby have improved effectiveness of retrieval greatly.Simultaneously, we can also handle by expert info and go to make friends with the friend who has similar tastes and interests, and seek affiliate, employee, consultant.This technology will be brought huge user and frequency of utilization to the Internet, is the important means that embodies the Internet new technology and economic worth.

Description of drawings

Fig. 1. the entire block diagram of system;

Fig. 2. the example of individual descriptor and human relationships in community network;

Individual's descriptor is as follows:

Individual's essential information

Position: master unit: department of computer science, Tsinghua university knowledge engineering group

Homepage address: Http:// hmc.arnetmianer.orgPhone: 62788788

Email: Hmc@keg.cs.tsinghua.edu.cnResearch interest: information labeling

Paper information

Title: Semantic Annotation Using Horizontal and Vertical Contexts

Deliver meeting: ASWC2006 partner: Tang Jie, Li Juanzi

Fig. 3. the expert is worth the example that iteration is propagated;

Fig. 4. based on the interface of the expert info treatment system of propagation algorithm;

Fig. 5. the hardware structure diagram of expert retrieval system.

Embodiment

Utilize step 1-4 of the present invention, created a researcher's community network, and find that at this community network the expert in certain field verifies the invention of this paper.All experiments of the present invention realize with the Java programming, and move on the server that is configured to double-core Intel Xeon3.0GHz processor, 2GB internal memory.

(1) modeling of researcher's community network

The structure of our researcher's community network mainly comprises two parts information at present: each researcher has the individual descriptor (comprising position, unit, research interest, paper title and the corresponding meeting of delivering) of oneself; The relation that has " with the article author " between researcher and the researcher is promptly worked as two people and was once delivered article together, then can set up the relation of " with the article author " between them.

(2) generation of researcher's community network

By (data analysis of http://www.informatik.uni-trier.de/ ~ ley/db/) at first obtains the paper tabulation, and paper information is added in the paper information table 4 to specific scientific paper website; Then obtain the author of each piece paper, author for each piece paper does following steps: if there is not this person in the individual Basic Information Table 3, then adopt the method for information extraction that the Web data are excavated, obtain his corresponding essential information, and add in the table 3; Corresponding relation with this author and corresponding paper adds in the people and the paper table of comparisons 5 simultaneously; Then, for per two authors of this paper, in node relationships table 6, insert " with the article author " relation.

Through this construction method, collected the researcher of 448,289 computer realms, 725,655 pieces publish thesis, and the interpersonal number that concerns amounts to 2,413, and 208, on average everyone and other people have 5.38 relations.

(3) foundation of test set

For algorithm is tested, we have collected the test set in 13 fields from Web, and each test set comprises expert's tabulation in a certain field respectively.Table 8 has been listed the statistical information and the source of these 13 test sets.Wherein, " Ontology Mapping " tabulates with the committee member that " Semantic Web " comes from relevant meeting; " data mining " is from the personnel of the relevant data excavation of kmining.com arrangement; The researcher who extracts for information about that " information extraction " collected from Dr.Ion Muslea; " encrypt and learn " researcher of the relevant encryption of collecting from Kevin McCurley; " computer vision " is from the researcher of the relevant calculation machine vision of Dr.Margaret Fleck collection; " neural net " is from the relevant study of neural networks person of Open Directory; " Boosting study " and " SVMs " are respectively from their official website." Planning study ", " intelligent agent ", " machine learning " and " natural language processing " all come from the website of an artificial intelligence.

Table 8, expert's test set in 13 fields

The field	The field keyword	Expert's number	The source
The field	The field keyword	Expert's number	The source	Ontology Mapping	Ontology Alignment	57	EON2003﹠2004; OAEI2005﹠2006, OM workshop2006 etc. are about the meeting committee member of Ontology Mapping conference
Semantic Web	Semantic Web	412	The meeting committee member of calendar year 2001 to 2006 year international semantic network conference	Ontology Mapping	Ontology Alignment	57
Semantic Web	Semantic Web	412		Data mining	Data Mining	351	http://www.kmining.com/info_people.html
Information extraction	Information Extraction	91	http://www.isi.edu/info-agents/RISE/people.html	Data mining	Data Mining	351	http://www.kmining.com/info_people.html
Information extraction	Information Extraction	91	http://www.isi.edu/info-agents/RISE/people.html	Boosting study	Boosting	57	http://www boosting.org/people.html
SVMs	Support Vector Machine	111	http://www.svms.org/people-frames.html	Boosting study	Boosting	57	http://www boosting.org/people.html
SVMs	Support Vector Machine	111	http://www.svms.org/people-frames.html	Planning study	Planning	26	http://aima.cs.berkeley.edu/ai.html#learning
Intelligent agent	Intelligent Agent	35	http://aima.cs.berkeley.edu/ai.html#learning	Planning study	Planning	26	http://aima.cs.berkeley.edu/ai.html#learning
Intelligent agent	Intelligent Agent	35	http://aima.cs.berkeley.edu/ai.html#learning	Machine learning	Machine Learning	76	http://aima.cs.berkeley.edu/ai.html#learning
Natural language processing	Natural Language Processing	54	http://aima.cs.berkeley.edu/ai.html#learning	Machine learning	Machine Learning	76	http://aima.cs.berkeley.edu/ai.html#learning
Natural language processing	Natural Language Processing	54	http://aima.cs.berkeley.edu/ai.html#learning	Encrypt and learn	Cryptography	174	http://www.swcp.com/～mccurley/cryptographers/cryptographers.html
Computer vision	Computer Vision	215	http://www.cs.hmc.edu/～fleck/computer-vision-handbook/vision-people.html	Encrypt and learn	Cryptography	174
Computer vision	Computer Vision	215		Neural net	Neural Networks	122	http://dmoz.org/Computers/Artificial_Intelligence/Neural_Networks/People/

(4) test philosophy

We adopt P@5, P@10, and P@20, P@30, R-pre, MAP, bref is as the evaluation and test criterion.The definition accuracy rate is the shared ratio of the correct result in the expert info result (correct result refers to the expert who occurs in test set).P@5 evaluates and tests the accuracy rate of preceding 5 return results, P@10, and P@20, P@30 are in like manner; The accuracy rate of R return results before the R-pre evaluation and test, wherein R represents the expert's sum in the test set; The mean value of the accuracy rate when MAP evaluates and tests each correct result and occurs; Bref mainly is conceived to the average proportions that the wrong expert who finds comes correct expert front, and formula definition is as follows:

bpref = \frac{1}{R} \underset{r}{Σ} 1 - \frac{| n ranked higher than r |}{R} . . . 10)

Wherein, R represents the expert's sum in the test set, and r is a correct expert in the test set, and n is one of preceding R wrong expert who finds in the lookup result.

(5) experimental result

Method described in the present invention and another kind of method compare, and this comparative approach is only considered to use individual descriptor, and does not consider relation information, and is closely similar with the step 2 of our method.Table 9 has been listed these two kinds of methods and respectively the evaluation result that expert info is handled has been carried out in 13 fields.

Table 9 is searched evaluation result (%) with our method and two kinds of method experts in 13 fields of comparative approach

The field	Method	P@5	P@10	P@20	P@30	R-prec	MAP	bpref
The field	Method	P@5	P@10	P@20	P@30	R-prec	MAP	bpref	Ontology Mapping	Comparative approach	20.00	20.00	15.00	23.33	15.39	3.32	9.51
Our method	40.00	40.00	35.00	23.33	16.00	5.60	11.73			Comparative approach	20.00	20.00	15.00	23.33	15.39	3.32	9.51
Our method	40.00	40.00	35.00	23.33	16.00	5.60	11.73	Semantic Web		Comparative approach	80.00	70.00	70.00	60.00	71.43	9.55	15.75
Our method	80.00	90.00	70.00	66.67	90.91	13.77	20.83			Comparative approach	80.00	70.00	70.00	60.00	71.43	9.55	15.75
Our method	80.00	90.00	70.00	66.67	90.91	13.77	20.83		Data mining	Comparative approach	80.00	70.00	60.00	56.67	7142	9.88	15.27
Our method	10000	80.00	75.00	66.67	66.67	12.63	19.16			Comparative approach	80.00	70.00	60.00	56.67	7142	9.88	15.27
Our method	10000	80.00	75.00	66.67	66.67	12.63	19.16	Information extraction		Comparative approach	8000	70.00	65.00	53.33	71.43	19.52	22.89
Our method	100.00	60.00	60.00	53.33	58.82	19.19	23.15			Comparative approach	8000	70.00	65.00	53.33	71.43	19.52	22.89
Our method	100.00	60.00	60.00	53.33	58.82	19.19	23.15		Boosting study	Comparative approach	60.00	60.00	4500	33.33	37.04	19.57	22.01
Our method	80.00	50.00	40.00	33.33	35.71	17.39	18.78			Comparative approach	60.00	60.00	4500	33.33	37.04	19.57	22.01
Our method	80.00	50.00	40.00	33.33	35.71	17.39	18.78	Planning study		Comparative approach	60.00	40.00	30.00	23.33	23.26	23.12	20.56
Our method	40.00	30.00	30.00	26.67	19.23	14.97	20.71			Comparative approach	60.00	40.00	30.00	23.33	23.26	23.12	20.56
Our method	40.00	30.00	30.00	26.67	19.23	14.97	20.71		Intelligent agent	Comparative approach	20.00	40.00	25.00	23.33	6.33	8.86	15.59
Our method	80.00	50.00	30.00	26.67	19.23	14.30	18.69			Comparative approach	20.00	40.00	25.00	23.33	6.33	8.86	15.59
Our method	80.00	50.00	30.00	26.67	19.23	14.30	18.69	SVMs		Comparative approach	40.00	20.00	15.00	13.33	3.72	2.70	6.09
Our method	20.00	20.00	10.00	13.33	13.70	1.82	6.47			Comparative approach	40.00	20.00	15.00	13.33	3.72	2.70	6.09
Our method	20.00	20.00	10.00	13.33	13.70	1.82	6.47		Machine learning	Comparative approach	40.00	20.00	15.00	16.67	14.71	4.78	8.59
Our method	40.00	40.00	25.00	16.67	14.93	6.54	9.92			Comparative approach	40.00	20.00	15.00	16.67	14.71	4.78	8.59
Our method	40.00	40.00	25.00	16.67	14.93	6.54	9.92	Natural language processing		Comparative approach	20.00	10.00	15.00	26.67	17.24	7.50	11.69
Our method	40.00	20.00	25.00	26.67	29.41	9.99	18.24			Comparative approach	20.00	10.00	15.00	26.67	17.24	7.50	11.69
Our method	40.00	20.00	25.00	26.67	29.41	9.99	18.24		Encrypt and learn	Comparative approach	80.00	70.00	60.00	50.00	76.92	12.15	15.04
Our method	100.00	70.00	55.00	56.67	58.82	17.71	23.21			Comparative approach	80.00	70.00	60.00	50.00	76.92	12.15	15.04
Our method	100.00	70.00	55.00	56.67	58.82	17.71	23.21	Computer vision		Comparative approach	20.00	10.00	35.00	36.67	37.04	4.92	11.13
Our method	60.00	50.00	55.00	43.33	62.50	7.93	13.77			Comparative approach	20.00	10.00	35.00	36.67	37.04	4.92	11.13
Our method	60.00	50.00	55.00	43.33	62.50	7.93	13.77		Neural net	Comparative approach	0.00	0.00	15.00	10.00	3.85	0.65	3.20
Our method	20.00	30.00	15.00	16.67	5.75	1.50	4.75			Comparative approach	0.00	0.00	15.00	10.00	3.85	0.65	3.20
Our method	20.00	30.00	15.00	16.67	5.75	1.50	4.75	Mean value		Comparative approach	46.15	38.00	35.80	32.82	34.60	9.73	13.64
Our method	61.54	48.00	40.40	36.15	37.82	11.03	16.11			Comparative approach	46.15	38.00	35.80	32.82	34.60	9.73	13.64

From experimental result as can be seen, the evaluation result of our method on most of field all is better than comparative approach.Experiment shows, the community network expert info treatment system and the method that are worth propagation algorithm based on the expert that we propose are effective.

This shows that the present invention has reached intended purposes.

Relationship type

Source node

Destination node

Relationship type

Source node

Destination node

1, the community network expert info treatment system that is worth propagation algorithm based on the expert, it is characterized in that this system by the social relation network of series connection successively generate server, database server, the expert is worth calculation server and Web server constitutes, wherein:

Social relation network generates server, makes up a social relation network figure G successively according to the following steps:

Step (1), community network be G=(V, E, T, τ),

Wherein, V is a set of node, and v ∈ V, each node v represent a people in the community network, and he has following individual descriptor:

Individual's essential information wherein has: position, unit, research interest and homepage address at least;

The paper information that the individual delivers wherein contains: the paper title, deliver meeting title and cooperation author at least; Relation between individual and the paper is described with its contribution of delivering corresponding expression with a people, comprising: described people's the sign and the sign of paper;

E  V * V is the set on limit among the social relation network figure G, e ^t _Ij∈ E represents the people v among the G _iAnd v _jBetween a kind of relation of existing, represent with t;

T is the set of various relationship types among the described set E, t ∈ T represents interpersonal a kind of social relationships type, describe with a social relationships type list, wherein contain at least: relation, be no less than paper author cooperative relationship, the directive relationship of paper or project, four kinds of the cooperative relationship of research project and friends; The weight of various relations and the sign of relation;

τ: E → T is an interpersonal mapping function that is related to described relationship type, with τ (e ^t _Ij)=t represents, when described interpersonal relationships is bidirectional relationship, represents a kind of symmetric relation;

In described social relation network figure G, all interpersonal relationships constitutes a social relation network node table, comprising: type of interpersonal relations τ, source node v _iWith destination node v _j

Step (2), the social relation network figure G that step (1) is obtained is input to database server;

Step (3), the expert is worth calculation server and retrieves all candidate experts in entire society's relational network, and the relevant expert's value in each candidate expert's calculating field of giving:

Step (3.1) is for everyone v _i, his individual basic descriptor is connected into a big document d, and uses p _kRepresent this v _iThe information of each piece paper;

Step (3.2), a given field keyword q, the probabilistic model in the exploit information retrieval is estimated the degree of correlation p (q|d) of this field keyword q and individual essential information e and the degree of correlation (q|p of the information of field keyword q and each piece paper _k):

p (q | d) = \underset{t &Element; q}{Π} {((1 - λ) p (t | d) + λp (t))}^{n (t, q)}

Wherein: t represents after field keyword q is divided into participle, the wherein symbolic representation of each speech;

N (t, q) expression participle t appears at the number of times among the q;

λ is the smoothing processing coefficient, and value is [0,1];

The number of times that p (t) occurs in proprietary big document d for participle t removes the total speech number that comprises among the proprietary big document d and estimates;

P (t|d) is that participle t is at v _iBig document d in the number of times that occurs remove v _iBig document d in the total number of word that comprises estimate;

p (q | p_{k}) = \underset{t &Element; q}{Π} {((1 - λ) p (t | p_{k}) + λp (t))}^{n (t, q)}

Wherein: p (q|p _k) be that participle t is at v _iThe information p of every piece of paper _kThe middle number of times that occurs is divided by having filtered the paper information p after the high frequency words _kIn the total number of word that comprises estimate;

Step (3.2), the p (q|d) and the p (q|p that obtain according to step (3.1) _k) calculate everyone v _jIndividual essential information and the information of every piece of paper separately with the degree of correlation of field keyword q, again these two degrees of correlation are carried out linear fit, obtain everyone represented v of following formula _iExpert's initial value s (v _i) ⁰:

s {(v_{i})}^{0} = α \cdot p (q | d) + (1 - α) \cdot \underset{p_{k} &Element; P}{Σ} if (p_{k}) \times p (q | p_{k})

Wherein, the α span is [0,1],

P represents this v _jWhole papers of delivering, p _kOne piece of paper among the expression P;

If (p _k) expression paper p _kThe meeting of being delivered or the factor of influence of magazine are given value;

Step (3.3), the expert who selects initial expert's value to come preceding N position forms candidate expert group, and N is a set point;

Step (4), described expert is worth the calculation server utilization based on the iterative algorithm of propagating thought, makes up a community network subgraph according to the following steps, and upgrades expert's value of obtaining candidate expert on the basis of this subgraph:

Step (4.1), the candidate expert group that obtains according to step (3.3) obtains a social relation network subnet about field keyword q, and wherein node is represented a candidate expert, concerns between two experts that every limit is represented to connect;

The social relation network subgraph that step (4.2) obtains for step (4.1) makes up a broadcasting system matrix M, element M wherein _IjExpression is from v _iTo v _jPropagation coefficient, M _IjSpan is [0,1], described M _IjAs follows:

M_{ij} = \underset{t}{Σ} c (τ (e_{ij}^{t})) * w (e_{ij}^{t})

Wherein, c (τ (e ^t _Ij) a kind of e that concerns of representative ^t _IjWeight, be set point,

W (e ^t _Ij) represent v _iAnd v _jBetween concern e ^t _IjLevel of intimate, according to concerning among the described social relation network figure that the big person of weight chooses the corresponding title that concerns, calculate the corresponding weight w (e that concerns in view of the above _Ij ^t):

Wherein, | U _i| expression v _iThe number of all this relations;

Step (4.3), the expert when computational algorithm iterates to n+1 is worth vectorial S ^N+1(S ^N+1Represent that all experts' expert is worth the vector of composition, each dimension S wherein _i ⁿRepresent v ⁱExpert value):

S ⁿ⁺¹＝(1-ω)S ⁿ+ωM ^TS ⁿ

Wherein, ω represents the penalty coefficient of propagation distance, and span is [0,1], and the propagation coefficient matrix M is changeless falling for process;

Step (4.4) is normalized to the expert's value of all experts in the step (4.3):

S_{i}^{n + 1} = \frac{S_{i}^{n + 1}}{\max (S_{i}^{n + 1})}

Expert's value is 1 to the maximum;

Step (4.5) is exported to the user to the recommendation expert collection that step (4.4) obtains by the Web network.

2. the community network expert info processing method that is worth propagation algorithm based on the expert, be characterised in that, this method is to generate server by social relation network successively at one, database server, the expert is worth expert that the series connection of calculation server and Web server constitutes and collects and progressively realize according to the following steps in the searching system:

Step (1), community network be G=(V, E, T, τ),

p (q | d) = \underset{t &Element; q}{Π} {((1 - λ) p (t | d) + λp (t))}^{n (t, q)}

N (t, q) expression participle t appears at the number of times among the q;

λ is the smoothing processing coefficient, and value is [0,1];

p (q | p_{k}) = \underset{t &Element; q}{Π} {((1 - λ) p (t | p_{k}) + λp (t))}^{n (t, q)}

Step (3.2), the p (q|d) and the p (q|p that obtain according to step (3.1) _k) calculate everyone v _iIndividual essential information and the information of every piece of paper separately with the degree of correlation of field keyword q, again these two degrees of correlation are carried out linear fit, obtain everyone represented v of following formula _iExpert's initial value s (v _i) ⁰:

s {(v_{i})}^{0} = α \cdot p (q | d) + (1 - α) \cdot \underset{p_{k} &Element; P}{Σ} if (p_{k}) \times p (q | p_{k})

Wherein, a span is [0,1],

M_{ij} = \underset{t}{Σ} c (τ (e_{ij}^{t})) * w (e_{ij}^{t})

Wherein, | U _i| expression v _iThe number of all this relations;

Step (4.3), the expert when computational algorithm iterates to n+1 is worth vectorial S ^N+1(S ^N+1Represent that all experts' expert is worth the vector of composition, each dimension S wherein _i ⁿRepresent v _iExpert value):

S ⁿ⁺¹＝(1-ω)S ⁿ+ωM ^TS ⁿ

S_{i}^{n + 1} = \frac{S_{i}^{n + 1}}{\max (S_{i}^{n + 1})}

Expert's value is 1 to the maximum;