CN107832312A

CN107832312A - A kind of text based on deep semantic discrimination recommends method

Info

Publication number: CN107832312A
Application number: CN201710000406.3A
Authority: CN
Inventors: 郐弘智; 陈建辉; 盛文瑾; 闫健卓
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-01-03
Filing date: 2017-01-03
Publication date: 2018-03-23
Anticipated expiration: 2037-01-03
Also published as: CN107832312B

Abstract

The invention discloses the text based on deep semantic discrimination to recommend method, text subject is extracted according to deep semantic grid model automatically, it is semantic according to scene of the theme scene Semantic Discrimination method reasoning under different text backgrounds, the text subject tree of fusion scene state is realized, is that every document constructs user version interest portrait according to the real-time scene state of user.The real-time fluctuations of user situation state are directed at inquiry end, the semantic screening of scene is carried out to text subject tree, inquiry content is carried out to inquire about interest topic modeling, secondary potential applications reasoning is carried out to user's direct interest theme according to activation method of diffusion, the global activation value of theme is calculated, the semantic user of the structure fusion situation of presence inquires about interest portrait.It is that document is scored by similarity calculating method, according to scoring height generation text recommendation list.

Description

Text recommendation method based on deep semantic resolution

Technical Field

The invention relates to the technical field of recommendation, in particular to a text recommendation method based on deep semantic resolution, and particularly relates to a deep semantic grid model constructed based on a brain-like layered-divergent thinking mode and a text subject situation semantic resolution recommendation method.

Background

Recommendation systems were proposed in the last 90 s, and early recommendation systems mainly focused on the form similarity of search results, but neglected the semantic relevance of the search results and queries, resulting in great noise of the recommendation results. In recent years, with the explosive growth of paperless data, the effectiveness problem of information retrieval has attracted extensive attention of researchers, and various information retrieval methods based on semantics have been proposed. In the aspect of personalized semantic recommendation, the method is mainly divided into two methods of formal semantics and social semantics.

On one hand, the social semantic method is used for constructing a human body portrait of a user by analyzing information such as user logs, user labels, field popularity and user activity, so that the effect of personalized recommendation is achieved; on the other hand, based on the method of user similarity and item similarity, the score of a certain item by a plurality of most similar users approaches the score of the target user to the item, so that a recommendation effect is achieved, for example, a collaborative filtering method. The former improves the interest relevance of retrieval results, but a large amount of user behavior data needs to be analyzed, obviously most of user data cannot meet the requirement, and meanwhile, the essence of the method is the form matching of interest keywords, and the capabilities of semantic analysis and potential interest mining are lacked; the latter is more humanized and has stronger capability of mining potential interest documents, but the feedback results are complex and diverse, so that a great amount of contents irrelevant to the query appear. Meanwhile, with the continuous expansion of data recommendation dimensions, the cold start problem caused by data sparsity is caused, and particularly when a new user or a series of new field literature data enter the system, the recommendation effect is reduced due to insufficient information support.

Most formal semantics recommendation systems adopt ontology-based semantic query technology. The method abstracts the document information to a concept layer, and the concepts are connected together by using different semantic relations to form a network structure similar to a brain thinking mode. Because the method directly operates the text from the concept layer and is mostly applied to the retrieval of the structured knowledge base, the semantic relevance of the result is obviously improved. However, when the text is recommended by using these methods, the situation that the situation semantics of the concept implicit in the text is not considered, which causes semantic ambiguity in the process of mapping the document to the ontology. Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the defects of the prior art, the invention provides a text recommendation method based on deep semantic resolution, and aims to solve the problem that the semantic relevance of the existing recommendation method needs to be improved.

In order to solve the technical problems, the technical scheme adopted by the invention specifically comprises the following steps:

step 1: constructing a depth semantic grid model based on a brain-like 'layering-diverging' thinking mode;

step 2: the method comprises the steps of reasoning a grid theme set of a text by combining a grid theme-synonym bag model and a word matching technology, then connecting scattered themes by utilizing an association-memory function of the grid model, then reasoning scene labels of different activated themes under the current text by utilizing a scene semantic analysis function, and finally constructing a text theme tree which integrates various scene semantics and memory connection;

and step 3: pruning the text theme tree according to the user interest, namely filtering out themes and relations which do not accord with the current situation state of the user, thereby constructing the text theme tree based on situation semantic screening;

and 4, step 4: counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating the weight value of a topic and mapping the weight value to a corresponding grid topic node, thereby constructing a user text interest portrait for each document;

and 5: extracting documents relevant to the user query content and a text theme tree after corresponding scene semantic screening according to a pseudo-relevant feedback method, counting the frequency of themes in the feedback tree and carrying out normalization processing to obtain an initial interest theme activation value;

step 6: calculating global dynamic activation values of the initial interest grid theme and the potential interest grid theme under feedback learning by using an activation diffusion mechanism, assigning the calculation results to corresponding theme nodes in a grid model, and constructing a user query interest image fused with current scene semantics;

and 7: and grading the depth semantic correlation between the user query interest portrait and the user text interest portrait by using a grid-based cosine similarity calculation method, and generating a recommendation list for recommendation.

Further, the deep semantic grid model described in step 1 of the present invention is a construction method based on a brain-like "layered-divergent" thinking mode, and the construction process in step 1 specifically includes:

step 1-1, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

step 1-2, a semantic mapping model of 'grid theme-synonym bag' is constructed, the 'theme' represents a core theme in a hierarchical grid model, and the 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary. If the term appears in the text in the topic-bag-of-words model, the topic is activated and the corresponding grid node attribute is set to be 1, so that the function of mining the shallow semantic topic of the text is realized;

step 1-3, traversing a ' theme-label-abstract ' triple in a DBpedia knowledge base, matching a theme in the triple with a term in a ' theme-bag of words ' model, extracting labels and abstract data corresponding to the matched theme in the knowledge base, mapping ' grid theme-DBpedia theme-label-abstract ' layer by layer, and associating the grid theme-DBpedia theme-label-abstract ' by using a semantic correlation type;

and 1-4, taking a 'layering-memorizing' grid model as a framework to realize a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid theme-DBpedia theme-label-abstract'.

Further, in step 2 of the present invention, a subject context semantic analysis method based on a DBpedia knowledge base is adopted, and the subject context semantic analysis method specifically includes:

first, a term set, Key, is generated that activates the context after windowing the dynamic span of the topic s in a document_s；

Secondly, generating a term set T of abstracts under different scene labels m corresponding to the activated subject s in the DBpedia knowledge base_m,s(ii) a Counting the number of abstract terms under the scene label, N_m；

Thirdly, calculating the semantic similarity of the theme scenes according to the following formula:

wherein counter (T)_m,s,Key_s) Representation set T_m,sAnd Key_sThe co-occurrence frequency of the term (1).

And fourthly, selecting the scene label corresponding to the abstract with the maximum correlation degree as the scene semantic state of the document activation theme s to form a triple of 'text-activation theme-scene label'.

Further, the specific steps of constructing the user text interest topic portrait in step 4 of the invention are as follows:

step one, counting subject frequency in all text subject trees in a database under a current contextual model;

secondly, calculating the subject frequency TF and the inverse document frequency IDF of each document, wherein TF is C_M/R_NRepresenting the ratio of the frequency of the activated topics in each document to the total word frequency of the activated topics in the current document under the current user interest contextual model; the IDF (log (S/N)) is the ratio of the total number of documents in the database to the number of documents containing the activated topics in the current user interest situation state, and then the result is obtained after the numerical value is selected;

thirdly, calculating interest topic semantic weight C fusing emotion semantic resolution of the user_w,iCalculated as follows:

C_w,i＝TF_i*IDF_i(i＝1,2,…,n)，

and mapping the theme semantic weight of each document to a grid theme attribute unit group to construct a user text interest theme portrait.

Further, the specific steps of querying the interest portrait by the user in step 6 of the present invention are as follows:

the method comprises the steps of firstly, obtaining a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;

secondly, filtering the theme tree of the original document according to the current scene state set by the user, screening out the theme irrelevant to the current scene state of the user, and leaving the theme tree interested by the user; counting the occurrence frequency of each theme in the user interest theme tree, carrying out normalization processing as an initial interest activation value of the user, and mapping the activation value to an attribute label of a grid theme node;

thirdly, performing semantic activation diffusion on the initially activated interest topic according to the relationship type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;

the grid diffusion formula is:

wherein, theta_ijFor all topic paths in the topic grid model with the topic node j as the destination node and the topic node I related to the node j as the source node, I_i(t) activation attribute values, O, for each potential topic node in the mesh model at time t_j(t +1) is the activation attribute value of the global subject node in the grid model at the time t +1, w_ijFor the associative association value of the active topic and the potential topic in the current scene state, α is a decay factor, which is set to 0.75, and the associative path length is set to 3.

And fourthly, mapping the global theme activation value to a grid theme attribute unit group to construct a user query interest theme portrait.

Furthermore, in step 7 of the present invention, a cosine similarity formula is used to calculate the correlation between the user text interest grid portrait and the language "domain" of the user query interest grid portrait, and the formula is expressed as follows:

wherein,for a user text interest grid sketch, q ═ o₁,o₂,…,o_nFind out the userInterest grid portrayal.

The invention can be applied to all recommendation systems based on text retrieval, and has the following beneficial effects:

1. in the invention, at a user side, in the face of query contents submitted by a user, a primary feedback topic learning method based on scene semantic analysis and a topic expansion method based on secondary semantic activation diffusion are adopted, so that the problems of user query semantic relevance and potential interest mining are solved;

2. according to the method, at the document end, the document theme and the scene semantic characteristics of the theme are automatically inferred according to the deep semantic grid model, and the functions of automatically extracting the text theme and mining deep interest semantics are realized.

Drawings

FIG. 1 is a flowchart of a text recommendation method based on deep semantic parsing according to a preferred embodiment of the present invention.

Fig. 2 is a detailed flowchart of step S100 in the method shown in fig. 1.

Fig. 3 is a detailed flowchart of step S102 in the method shown in fig. 1.

Fig. 4 is a detailed flowchart of step S103 in the method shown in fig. 1.

Fig. 5 is a detailed flowchart of step S104 in the method shown in fig. 1.

FIG. 6 is a comparison between a deep semantic recommendation method and a conventional semantic recommendation method in a system Ranking Score (RS) under different user interest content input conditions.

Detailed Description

The invention provides a text recommendation method based on deep semantic resolution, which is further described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart of a preferred embodiment of a text recommendation method based on deep semantic parsing according to the present invention, as shown in the figure, the implementation steps are as follows:

s100, constructing a depth semantic grid model based on a brain-like layering-divergence thinking mode;

s101, inputting interested contents by a user and setting a current scene state;

s102, performing theme reasoning and theme scene semantic resolution on the text, constructing a text theme tree, and performing theme semantic screening on the document theme tree according to the scene state of the current user, thereby constructing a text theme tree fused with scene semantic screening;

s103, counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating weight values of topics and mapping the weight values to corresponding grid topic nodes, so as to construct a user text interest portrait for each document;

s104, extracting documents relevant to the user query content and a text theme tree after corresponding scene semantic screening according to a pseudo-relevant feedback method, counting the frequency of themes in the feedback tree and carrying out normalization processing to obtain an initial interest theme activation value; calculating global dynamic activation values of the initial interest grid theme and the potential interest grid theme under feedback learning by using an activation diffusion mechanism, assigning the calculation results to corresponding theme nodes in a grid model, and constructing a user query interest image fused with current scene semantics;

s105, calculating semantic similarity between the user text interest image and the user query interest image in the current contextual model by a grid-based cosine similarity algorithm, and grading;

and S106, sorting according to the relevance scores of the models from large to small to generate a recommendation list, and recommending interesting documents for the user.

Further, as shown in fig. 2, the step S100 specifically includes:

s001, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

s002, constructing a semantic mapping model of 'grid theme-synonym bag', wherein a 'theme' represents a core theme in a hierarchical grid model, and a 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary;

s003, traversing a 'theme-label-abstract' triple in a DBpedia knowledge base, matching a theme in the triple with a term in a 'theme-bag of words' model, mapping a grid theme with a DBpedia theme, and associating the grid theme with the DBpedia theme by using semantic correlation types;

s004, extracting labels and abstract data corresponding to matched topics from a DBpedia knowledge base, and realizing a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid topic-DBpedia topic-label-abstract' by taking a 'layering-memory' grid model as a framework.

Further, as shown in fig. 3, the step S102 specifically includes:

s201, performing keyword matching on text terms by utilizing semantic relevance of a theme-bag of words in a grid model, if the terms in the bag of words appear in the text, activating the theme, setting the attribute of a corresponding grid node as 1, and realizing a text shallow semantic topic mining function;

s202, constructing the scattered topics into a text topic tree according to the association-memory characteristic of the deep semantic grid model;

s203, performing scene state resolution on the document theme, which comprises the following specific steps:

wherein counter (T)_m,s,Key_s) Representation set T_m,sAnd Key_sCo-occurrence frequency of the term (1);

Further, as shown in fig. 4, the step S103 specifically includes:

s301, counting subject frequency in all text subject trees in the database under the current contextual model;

s302, calculating subject frequency TF and inverse document frequency IDF of each document, wherein TF is C_M/R_NRepresenting the ratio of the frequency of the activated topics in each document to the total word frequency of the activated topics in the current document under the current user interest contextual model; the IDF (log (S/N)) is the ratio of the total number of documents in the database to the number of documents containing the activated topics in the current user interest situation state, and then the result is obtained after the numerical value is selected;

s303, calculating interest topic semantic weight C fusing emotion semantic resolution of user_w,iCalculated as follows:

C_w,i＝TF_i*IDF_i(i＝1,2,…,n)，

Further, as shown in fig. 5, the step S104 specifically includes:

s401, acquiring a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;

s402, filtering the theme tree of the original document according to the current scene state set by the user, screening out the theme irrelevant to the current scene state of the user, and leaving the theme tree interested by the user; counting the occurrence frequency of each theme in the user interest theme tree, carrying out normalization processing as an initial interest activation value of the user, and mapping the activation value to an attribute label of a grid theme node;

s403, performing semantic activation diffusion on the initially activated interest topic according to the relationship type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;

the grid diffusion formula is:

S404, mapping the global theme activation value to a grid theme attribute unit group to construct a user query interest theme portrait.

Further, according to the step S105, a cosine similarity formula is used to calculate the correlation between the user text interest grid portrait and the language "domain" of the user query interest grid portrait, and the formula is expressed as follows:

wherein,for a user text interest grid sketch, q ═ o₁,o₂,…,o_nQuery the user for interest grid representation.

According to the method, the relevance of the recommended documents is improved by applying the situation semantic analysis technology in the process of inquiring and document theme learning of the user, so that the documents are recommended more intelligently, the influence of similar but irrelevant documents on the recommendation result can be effectively reduced, the semantic relevance of a recommendation system is improved, the true personal interest of the user is found, and the accuracy and the personalized analysis capability of the recommendation system are improved.

Comparing and verifying the text recommendation method based on deep semantic resolution and the traditional semantic recommendation method, experimental parameters are selected as follows: the simulation data set is selected from 2005 document data in a PubMed database, which contains 26000 abstracts of various biomedical aspects. The deep semantic mesh model is constructed by an ontology in an ACM Digital Library full-text database and a DBpedia knowledge base. The text processing tool employs a series of open source Java text analysis tools provided by the stanford university natural language processing group.

The influence of the method on the sequencing accuracy of the recommendation system is verified, and the experimental result is as follows:

FIG. 6 is a comparison between a conventional semantic recommendation method and a deep semantic recommendation method in a system Ranking Score (RS) under different user interest content input conditions. The traditional semantic recommendation method represents a shallow semantic recommendation method without scene semantic resolution, and the deep semantic recommendation method represents the method provided by the invention; as can be seen from fig. 6, in the case of 5 experiments, the ranking score of the present invention is always lower than that of the conventional semantic recommendation method. Because the smaller the ranking score is, the more the system tends to rank the favorite commodities of the user in front, therefore, the experimental result shows that the method provided by the invention has better recommendation effect.

It should be noted that the scope of the present invention includes but is not limited to the examples described above, and any modifications or changes made by those skilled in the art based on the above description should fall within the scope of the present invention.

Claims

1. A text recommendation method based on deep semantic resolution is characterized by comprising the following steps:

step 2: the method comprises the steps of reasoning a grid theme set of a text by combining a grid theme-synonym bag model and a word matching technology, connecting scattered themes by utilizing an association-memory function of the grid model, then reasoning scene labels of different activated themes under the current text by utilizing a scene semantic analysis function, and finally constructing a text theme tree which integrates various scene semantics and memory connection;

and 4, step 4: counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating the weight value of a topic, mapping the weight value into a corresponding grid topic node, and constructing a user text interest portrait for each document;

2. The method as claimed in claim 1, wherein the deep semantic mesh model in step 1 is constructed according to a brain-like "hierarchical-divergent" thinking model, and the construction process specifically includes:

firstly, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

and secondly, constructing a semantic mapping model of 'grid theme-synonym bag', wherein the 'theme' represents a core theme in the hierarchical grid model, and the 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary. If the term appears in the text in the topic-bag-of-words model, the topic is activated and the corresponding grid node attribute is set to be 1, so that the function of mining the shallow semantic topic of the text is realized;

traversing a 'theme-label-abstract' triple in a DBpedia knowledge base, matching the theme in the triple with terms in a 'theme-bag of words' model, extracting labels and abstract data corresponding to the matched theme in the knowledge base, mapping 'grid theme-DBpedia theme-label-abstract' layer by layer, and associating by using a semantic correlation type;

and fourthly, taking the 'layering-memorizing' grid model as a framework to realize a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid theme-DBpedia theme-label-abstract'.

3. The text recommendation method based on deep semantic resolution as claimed in claim 1, wherein the step 2 adopts a topic scene semantic resolution method based on a DBpedia knowledge base, and the semantic resolution method specifically comprises the following steps:

first, a term set, Key, is generated that activates the context after windowing the dynamic span of the topic s in a document_s。

Secondly, generating a term set T of abstracts under different scene labels m corresponding to the activated subject s in the DBpedia knowledge base_m，s(ii) a Counting the number of abstract terms under the scene label, N_m。

wherein counter (T)_m，s，Key_s) Representation set T_m，sAnd Key_sThe co-occurrence frequency of the term (1).

4. The text recommendation method based on deep semantic resolution as claimed in claim 2, wherein the specific steps of constructing the user text interest topic representation in step 4 are as follows:

thirdly, calculating interest topic semantic weight C fusing emotion semantic resolution of the user_w，iCalculated as follows:

C_w，i＝TF_i*IDF_i(i＝1，2，…，n)，

5. The text recommendation method based on deep semantic resolution according to claim 2, wherein the step 6 of querying the interest representation by the user comprises the following specific steps:

the grid diffusion formula is:

<mrow> <msub> <mi>O</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>&theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>i</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>*</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>*</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

6. The text recommendation method based on deep semantic resolution as claimed in claim 1, wherein a grid-based cosine similarity calculation method is adopted in step 7, and the method uses a cosine similarity formula to calculate the relevance of the text interest grid representation of the user and the language "domain" of the query interest grid representation of the user, wherein the formula is expressed as follows:

<mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>,</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&times;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> </mfrac> <mo>.</mo> </mrow>

wherein,for a user text interest grid sketch, q ═ o₁，o₂，…，o_nQuery the user for interest grid representation.