CN107832312A - A kind of text based on deep semantic discrimination recommends method - Google Patents
A kind of text based on deep semantic discrimination recommends method Download PDFInfo
- Publication number
- CN107832312A CN107832312A CN201710000406.3A CN201710000406A CN107832312A CN 107832312 A CN107832312 A CN 107832312A CN 201710000406 A CN201710000406 A CN 201710000406A CN 107832312 A CN107832312 A CN 107832312A
- Authority
- CN
- China
- Prior art keywords
- theme
- semantic
- user
- grid
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000004913 activation Effects 0.000 claims abstract description 36
- 238000012216 screening Methods 0.000 claims abstract description 14
- 238000009792 diffusion process Methods 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 238000013507 mapping Methods 0.000 claims description 19
- 238000005065 mining Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 230000008451 emotion Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 2
- 238000012850 discrimination method Methods 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 5
- 230000003213 activating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the text based on deep semantic discrimination to recommend method, text subject is extracted according to deep semantic grid model automatically, it is semantic according to scene of the theme scene Semantic Discrimination method reasoning under different text backgrounds, the text subject tree of fusion scene state is realized, is that every document constructs user version interest portrait according to the real-time scene state of user.The real-time fluctuations of user situation state are directed at inquiry end, the semantic screening of scene is carried out to text subject tree, inquiry content is carried out to inquire about interest topic modeling, secondary potential applications reasoning is carried out to user's direct interest theme according to activation method of diffusion, the global activation value of theme is calculated, the semantic user of the structure fusion situation of presence inquires about interest portrait.It is that document is scored by similarity calculating method, according to scoring height generation text recommendation list.
Description
Technical Field
The invention relates to the technical field of recommendation, in particular to a text recommendation method based on deep semantic resolution, and particularly relates to a deep semantic grid model constructed based on a brain-like layered-divergent thinking mode and a text subject situation semantic resolution recommendation method.
Background
Recommendation systems were proposed in the last 90 s, and early recommendation systems mainly focused on the form similarity of search results, but neglected the semantic relevance of the search results and queries, resulting in great noise of the recommendation results. In recent years, with the explosive growth of paperless data, the effectiveness problem of information retrieval has attracted extensive attention of researchers, and various information retrieval methods based on semantics have been proposed. In the aspect of personalized semantic recommendation, the method is mainly divided into two methods of formal semantics and social semantics.
On one hand, the social semantic method is used for constructing a human body portrait of a user by analyzing information such as user logs, user labels, field popularity and user activity, so that the effect of personalized recommendation is achieved; on the other hand, based on the method of user similarity and item similarity, the score of a certain item by a plurality of most similar users approaches the score of the target user to the item, so that a recommendation effect is achieved, for example, a collaborative filtering method. The former improves the interest relevance of retrieval results, but a large amount of user behavior data needs to be analyzed, obviously most of user data cannot meet the requirement, and meanwhile, the essence of the method is the form matching of interest keywords, and the capabilities of semantic analysis and potential interest mining are lacked; the latter is more humanized and has stronger capability of mining potential interest documents, but the feedback results are complex and diverse, so that a great amount of contents irrelevant to the query appear. Meanwhile, with the continuous expansion of data recommendation dimensions, the cold start problem caused by data sparsity is caused, and particularly when a new user or a series of new field literature data enter the system, the recommendation effect is reduced due to insufficient information support.
Most formal semantics recommendation systems adopt ontology-based semantic query technology. The method abstracts the document information to a concept layer, and the concepts are connected together by using different semantic relations to form a network structure similar to a brain thinking mode. Because the method directly operates the text from the concept layer and is mostly applied to the retrieval of the structured knowledge base, the semantic relevance of the result is obviously improved. However, when the text is recommended by using these methods, the situation that the situation semantics of the concept implicit in the text is not considered, which causes semantic ambiguity in the process of mapping the document to the ontology. Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the defects of the prior art, the invention provides a text recommendation method based on deep semantic resolution, and aims to solve the problem that the semantic relevance of the existing recommendation method needs to be improved.
In order to solve the technical problems, the technical scheme adopted by the invention specifically comprises the following steps:
step 1: constructing a depth semantic grid model based on a brain-like 'layering-diverging' thinking mode;
step 2: the method comprises the steps of reasoning a grid theme set of a text by combining a grid theme-synonym bag model and a word matching technology, then connecting scattered themes by utilizing an association-memory function of the grid model, then reasoning scene labels of different activated themes under the current text by utilizing a scene semantic analysis function, and finally constructing a text theme tree which integrates various scene semantics and memory connection;
and step 3: pruning the text theme tree according to the user interest, namely filtering out themes and relations which do not accord with the current situation state of the user, thereby constructing the text theme tree based on situation semantic screening;
and 4, step 4: counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating the weight value of a topic and mapping the weight value to a corresponding grid topic node, thereby constructing a user text interest portrait for each document;
and 5: extracting documents relevant to the user query content and a text theme tree after corresponding scene semantic screening according to a pseudo-relevant feedback method, counting the frequency of themes in the feedback tree and carrying out normalization processing to obtain an initial interest theme activation value;
step 6: calculating global dynamic activation values of the initial interest grid theme and the potential interest grid theme under feedback learning by using an activation diffusion mechanism, assigning the calculation results to corresponding theme nodes in a grid model, and constructing a user query interest image fused with current scene semantics;
and 7: and grading the depth semantic correlation between the user query interest portrait and the user text interest portrait by using a grid-based cosine similarity calculation method, and generating a recommendation list for recommendation.
Further, the deep semantic grid model described in step 1 of the present invention is a construction method based on a brain-like "layered-divergent" thinking mode, and the construction process in step 1 specifically includes:
step 1-1, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;
step 1-2, a semantic mapping model of 'grid theme-synonym bag' is constructed, the 'theme' represents a core theme in a hierarchical grid model, and the 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary. If the term appears in the text in the topic-bag-of-words model, the topic is activated and the corresponding grid node attribute is set to be 1, so that the function of mining the shallow semantic topic of the text is realized;
step 1-3, traversing a ' theme-label-abstract ' triple in a DBpedia knowledge base, matching a theme in the triple with a term in a ' theme-bag of words ' model, extracting labels and abstract data corresponding to the matched theme in the knowledge base, mapping ' grid theme-DBpedia theme-label-abstract ' layer by layer, and associating the grid theme-DBpedia theme-label-abstract ' by using a semantic correlation type;
and 1-4, taking a 'layering-memorizing' grid model as a framework to realize a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid theme-DBpedia theme-label-abstract'.
Further, in step 2 of the present invention, a subject context semantic analysis method based on a DBpedia knowledge base is adopted, and the subject context semantic analysis method specifically includes:
first, a term set, Key, is generated that activates the context after windowing the dynamic span of the topic s in a documents;
Secondly, generating a term set T of abstracts under different scene labels m corresponding to the activated subject s in the DBpedia knowledge basem,s(ii) a Counting the number of abstract terms under the scene label, Nm;
Thirdly, calculating the semantic similarity of the theme scenes according to the following formula:
wherein counter (T)m,s,Keys) Representation set Tm,sAnd KeysThe co-occurrence frequency of the term (1).
And fourthly, selecting the scene label corresponding to the abstract with the maximum correlation degree as the scene semantic state of the document activation theme s to form a triple of 'text-activation theme-scene label'.
Further, the specific steps of constructing the user text interest topic portrait in step 4 of the invention are as follows:
step one, counting subject frequency in all text subject trees in a database under a current contextual model;
secondly, calculating the subject frequency TF and the inverse document frequency IDF of each document, wherein TF is CM/RNRepresenting the ratio of the frequency of the activated topics in each document to the total word frequency of the activated topics in the current document under the current user interest contextual model; the IDF (log (S/N)) is the ratio of the total number of documents in the database to the number of documents containing the activated topics in the current user interest situation state, and then the result is obtained after the numerical value is selected;
thirdly, calculating interest topic semantic weight C fusing emotion semantic resolution of the userw,iCalculated as follows:
Cw,i=TFi*IDFi(i=1,2,…,n),
and mapping the theme semantic weight of each document to a grid theme attribute unit group to construct a user text interest theme portrait.
Further, the specific steps of querying the interest portrait by the user in step 6 of the present invention are as follows:
the method comprises the steps of firstly, obtaining a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;
secondly, filtering the theme tree of the original document according to the current scene state set by the user, screening out the theme irrelevant to the current scene state of the user, and leaving the theme tree interested by the user; counting the occurrence frequency of each theme in the user interest theme tree, carrying out normalization processing as an initial interest activation value of the user, and mapping the activation value to an attribute label of a grid theme node;
thirdly, performing semantic activation diffusion on the initially activated interest topic according to the relationship type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;
the grid diffusion formula is:
wherein, thetaijFor all topic paths in the topic grid model with the topic node j as the destination node and the topic node I related to the node j as the source node, Ii(t) activation attribute values, O, for each potential topic node in the mesh model at time tj(t +1) is the activation attribute value of the global subject node in the grid model at the time t +1, wijFor the associative association value of the active topic and the potential topic in the current scene state, α is a decay factor, which is set to 0.75, and the associative path length is set to 3.
And fourthly, mapping the global theme activation value to a grid theme attribute unit group to construct a user query interest theme portrait.
Furthermore, in step 7 of the present invention, a cosine similarity formula is used to calculate the correlation between the user text interest grid portrait and the language "domain" of the user query interest grid portrait, and the formula is expressed as follows:
wherein,for a user text interest grid sketch, q ═ o1,o2,…,onFind out the userInterest grid portrayal.
The invention can be applied to all recommendation systems based on text retrieval, and has the following beneficial effects:
1. in the invention, at a user side, in the face of query contents submitted by a user, a primary feedback topic learning method based on scene semantic analysis and a topic expansion method based on secondary semantic activation diffusion are adopted, so that the problems of user query semantic relevance and potential interest mining are solved;
2. according to the method, at the document end, the document theme and the scene semantic characteristics of the theme are automatically inferred according to the deep semantic grid model, and the functions of automatically extracting the text theme and mining deep interest semantics are realized.
Drawings
FIG. 1 is a flowchart of a text recommendation method based on deep semantic parsing according to a preferred embodiment of the present invention.
Fig. 2 is a detailed flowchart of step S100 in the method shown in fig. 1.
Fig. 3 is a detailed flowchart of step S102 in the method shown in fig. 1.
Fig. 4 is a detailed flowchart of step S103 in the method shown in fig. 1.
Fig. 5 is a detailed flowchart of step S104 in the method shown in fig. 1.
FIG. 6 is a comparison between a deep semantic recommendation method and a conventional semantic recommendation method in a system Ranking Score (RS) under different user interest content input conditions.
Detailed Description
The invention provides a text recommendation method based on deep semantic resolution, which is further described in detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart of a preferred embodiment of a text recommendation method based on deep semantic parsing according to the present invention, as shown in the figure, the implementation steps are as follows:
s100, constructing a depth semantic grid model based on a brain-like layering-divergence thinking mode;
s101, inputting interested contents by a user and setting a current scene state;
s102, performing theme reasoning and theme scene semantic resolution on the text, constructing a text theme tree, and performing theme semantic screening on the document theme tree according to the scene state of the current user, thereby constructing a text theme tree fused with scene semantic screening;
s103, counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating weight values of topics and mapping the weight values to corresponding grid topic nodes, so as to construct a user text interest portrait for each document;
s104, extracting documents relevant to the user query content and a text theme tree after corresponding scene semantic screening according to a pseudo-relevant feedback method, counting the frequency of themes in the feedback tree and carrying out normalization processing to obtain an initial interest theme activation value; calculating global dynamic activation values of the initial interest grid theme and the potential interest grid theme under feedback learning by using an activation diffusion mechanism, assigning the calculation results to corresponding theme nodes in a grid model, and constructing a user query interest image fused with current scene semantics;
s105, calculating semantic similarity between the user text interest image and the user query interest image in the current contextual model by a grid-based cosine similarity algorithm, and grading;
and S106, sorting according to the relevance scores of the models from large to small to generate a recommendation list, and recommending interesting documents for the user.
Further, as shown in fig. 2, the step S100 specifically includes:
s001, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;
s002, constructing a semantic mapping model of 'grid theme-synonym bag', wherein a 'theme' represents a core theme in a hierarchical grid model, and a 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary;
s003, traversing a 'theme-label-abstract' triple in a DBpedia knowledge base, matching a theme in the triple with a term in a 'theme-bag of words' model, mapping a grid theme with a DBpedia theme, and associating the grid theme with the DBpedia theme by using semantic correlation types;
s004, extracting labels and abstract data corresponding to matched topics from a DBpedia knowledge base, and realizing a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid topic-DBpedia topic-label-abstract' by taking a 'layering-memory' grid model as a framework.
Further, as shown in fig. 3, the step S102 specifically includes:
s201, performing keyword matching on text terms by utilizing semantic relevance of a theme-bag of words in a grid model, if the terms in the bag of words appear in the text, activating the theme, setting the attribute of a corresponding grid node as 1, and realizing a text shallow semantic topic mining function;
s202, constructing the scattered topics into a text topic tree according to the association-memory characteristic of the deep semantic grid model;
s203, performing scene state resolution on the document theme, which comprises the following specific steps:
first, a term set, Key, is generated that activates the context after windowing the dynamic span of the topic s in a documents;
Secondly, generating a term set T of abstracts under different scene labels m corresponding to the activated subject s in the DBpedia knowledge basem,s(ii) a Counting the number of abstract terms under the scene label, Nm;
Thirdly, calculating the semantic similarity of the theme scenes according to the following formula:
wherein counter (T)m,s,Keys) Representation set Tm,sAnd KeysCo-occurrence frequency of the term (1);
and fourthly, selecting the scene label corresponding to the abstract with the maximum correlation degree as the scene semantic state of the document activation theme s to form a triple of 'text-activation theme-scene label'.
Further, as shown in fig. 4, the step S103 specifically includes:
s301, counting subject frequency in all text subject trees in the database under the current contextual model;
s302, calculating subject frequency TF and inverse document frequency IDF of each document, wherein TF is CM/RNRepresenting the ratio of the frequency of the activated topics in each document to the total word frequency of the activated topics in the current document under the current user interest contextual model; the IDF (log (S/N)) is the ratio of the total number of documents in the database to the number of documents containing the activated topics in the current user interest situation state, and then the result is obtained after the numerical value is selected;
s303, calculating interest topic semantic weight C fusing emotion semantic resolution of userw,iCalculated as follows:
Cw,i=TFi*IDFi(i=1,2,…,n),
and mapping the theme semantic weight of each document to a grid theme attribute unit group to construct a user text interest theme portrait.
Further, as shown in fig. 5, the step S104 specifically includes:
s401, acquiring a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;
s402, filtering the theme tree of the original document according to the current scene state set by the user, screening out the theme irrelevant to the current scene state of the user, and leaving the theme tree interested by the user; counting the occurrence frequency of each theme in the user interest theme tree, carrying out normalization processing as an initial interest activation value of the user, and mapping the activation value to an attribute label of a grid theme node;
s403, performing semantic activation diffusion on the initially activated interest topic according to the relationship type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;
the grid diffusion formula is:
wherein, thetaijFor all topic paths in the topic grid model with the topic node j as the destination node and the topic node I related to the node j as the source node, Ii(t) activation attribute values, O, for each potential topic node in the mesh model at time tj(t +1) is the activation attribute value of the global subject node in the grid model at the time t +1, wijFor the associative association value of the active topic and the potential topic in the current scene state, α is a decay factor, which is set to 0.75, and the associative path length is set to 3.
S404, mapping the global theme activation value to a grid theme attribute unit group to construct a user query interest theme portrait.
Further, according to the step S105, a cosine similarity formula is used to calculate the correlation between the user text interest grid portrait and the language "domain" of the user query interest grid portrait, and the formula is expressed as follows:
wherein,for a user text interest grid sketch, q ═ o1,o2,…,onQuery the user for interest grid representation.
According to the method, the relevance of the recommended documents is improved by applying the situation semantic analysis technology in the process of inquiring and document theme learning of the user, so that the documents are recommended more intelligently, the influence of similar but irrelevant documents on the recommendation result can be effectively reduced, the semantic relevance of a recommendation system is improved, the true personal interest of the user is found, and the accuracy and the personalized analysis capability of the recommendation system are improved.
Comparing and verifying the text recommendation method based on deep semantic resolution and the traditional semantic recommendation method, experimental parameters are selected as follows: the simulation data set is selected from 2005 document data in a PubMed database, which contains 26000 abstracts of various biomedical aspects. The deep semantic mesh model is constructed by an ontology in an ACM Digital Library full-text database and a DBpedia knowledge base. The text processing tool employs a series of open source Java text analysis tools provided by the stanford university natural language processing group.
The influence of the method on the sequencing accuracy of the recommendation system is verified, and the experimental result is as follows:
FIG. 6 is a comparison between a conventional semantic recommendation method and a deep semantic recommendation method in a system Ranking Score (RS) under different user interest content input conditions. The traditional semantic recommendation method represents a shallow semantic recommendation method without scene semantic resolution, and the deep semantic recommendation method represents the method provided by the invention; as can be seen from fig. 6, in the case of 5 experiments, the ranking score of the present invention is always lower than that of the conventional semantic recommendation method. Because the smaller the ranking score is, the more the system tends to rank the favorite commodities of the user in front, therefore, the experimental result shows that the method provided by the invention has better recommendation effect.
It should be noted that the scope of the present invention includes but is not limited to the examples described above, and any modifications or changes made by those skilled in the art based on the above description should fall within the scope of the present invention.
Claims (6)
1. A text recommendation method based on deep semantic resolution is characterized by comprising the following steps:
step 1: constructing a depth semantic grid model based on a brain-like 'layering-diverging' thinking mode;
step 2: the method comprises the steps of reasoning a grid theme set of a text by combining a grid theme-synonym bag model and a word matching technology, connecting scattered themes by utilizing an association-memory function of the grid model, then reasoning scene labels of different activated themes under the current text by utilizing a scene semantic analysis function, and finally constructing a text theme tree which integrates various scene semantics and memory connection;
and step 3: pruning the text theme tree according to the user interest, namely filtering out themes and relations which do not accord with the current situation state of the user, thereby constructing the text theme tree based on situation semantic screening;
and 4, step 4: counting all text topic trees subjected to scene semantic screening in a database by using a TF-IDF algorithm, calculating the weight value of a topic, mapping the weight value into a corresponding grid topic node, and constructing a user text interest portrait for each document;
and 5: extracting documents relevant to the user query content and a text theme tree after corresponding scene semantic screening according to a pseudo-relevant feedback method, counting the frequency of themes in the feedback tree and carrying out normalization processing to obtain an initial interest theme activation value;
step 6: calculating global dynamic activation values of the initial interest grid theme and the potential interest grid theme under feedback learning by using an activation diffusion mechanism, assigning the calculation results to corresponding theme nodes in a grid model, and constructing a user query interest image fused with current scene semantics;
and 7: and grading the depth semantic correlation between the user query interest portrait and the user text interest portrait by using a grid-based cosine similarity calculation method, and generating a recommendation list for recommendation.
2. The method as claimed in claim 1, wherein the deep semantic mesh model in step 1 is constructed according to a brain-like "hierarchical-divergent" thinking model, and the construction process specifically includes:
firstly, selecting a classification ontology with multi-domain fusion, performing semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;
and secondly, constructing a semantic mapping model of 'grid theme-synonym bag', wherein the 'theme' represents a core theme in the hierarchical grid model, and the 'bag of words' is formed by extracting a synonym term set of the theme in a WordNet dictionary. If the term appears in the text in the topic-bag-of-words model, the topic is activated and the corresponding grid node attribute is set to be 1, so that the function of mining the shallow semantic topic of the text is realized;
traversing a 'theme-label-abstract' triple in a DBpedia knowledge base, matching the theme in the triple with terms in a 'theme-bag of words' model, extracting labels and abstract data corresponding to the matched theme in the knowledge base, mapping 'grid theme-DBpedia theme-label-abstract' layer by layer, and associating by using a semantic correlation type;
and fourthly, taking the 'layering-memorizing' grid model as a framework to realize a 'divergence-deep layer' semantic grid model fused with 'synonym bag-grid theme-DBpedia theme-label-abstract'.
3. The text recommendation method based on deep semantic resolution as claimed in claim 1, wherein the step 2 adopts a topic scene semantic resolution method based on a DBpedia knowledge base, and the semantic resolution method specifically comprises the following steps:
first, a term set, Key, is generated that activates the context after windowing the dynamic span of the topic s in a documents。
Secondly, generating a term set T of abstracts under different scene labels m corresponding to the activated subject s in the DBpedia knowledge basem,s(ii) a Counting the number of abstract terms under the scene label, Nm。
Thirdly, calculating the semantic similarity of the theme scenes according to the following formula:
wherein counter (T)m,s,Keys) Representation set Tm,sAnd KeysThe co-occurrence frequency of the term (1).
And fourthly, selecting the scene label corresponding to the abstract with the maximum correlation degree as the scene semantic state of the document activation theme s to form a triple of 'text-activation theme-scene label'.
4. The text recommendation method based on deep semantic resolution as claimed in claim 2, wherein the specific steps of constructing the user text interest topic representation in step 4 are as follows:
step one, counting subject frequency in all text subject trees in a database under a current contextual model;
secondly, calculating the subject frequency TF and the inverse document frequency IDF of each document, wherein TF is CM/RNRepresenting the ratio of the frequency of the activated topics in each document to the total word frequency of the activated topics in the current document under the current user interest contextual model; the IDF (log (S/N)) is the ratio of the total number of documents in the database to the number of documents containing the activated topics in the current user interest situation state, and then the result is obtained after the numerical value is selected;
thirdly, calculating interest topic semantic weight C fusing emotion semantic resolution of the userw,iCalculated as follows:
Cw,i=TFi*IDFi(i=1,2,…,n),
and mapping the theme semantic weight of each document to a grid theme attribute unit group to construct a user text interest theme portrait.
5. The text recommendation method based on deep semantic resolution according to claim 2, wherein the step 6 of querying the interest representation by the user comprises the following specific steps:
the method comprises the steps of firstly, obtaining a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;
secondly, filtering the theme tree of the original document according to the current scene state set by the user, screening out the theme irrelevant to the current scene state of the user, and leaving the theme tree interested by the user; counting the occurrence frequency of each theme in the user interest theme tree, carrying out normalization processing as an initial interest activation value of the user, and mapping the activation value to an attribute label of a grid theme node;
thirdly, performing semantic activation diffusion on the initially activated interest topic according to the relationship type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;
the grid diffusion formula is:
<mrow> <msub> <mi>O</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>&theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>i</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>*</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>*</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
wherein, thetaijFor all topic paths in the topic grid model with the topic node j as the destination node and the topic node I related to the node j as the source node, Ii(t) activation attribute values, O, for each potential topic node in the mesh model at time tj(t +1) is the activation attribute value of the global subject node in the grid model at the time t +1, wijFor the associative association value of the active topic and the potential topic in the current scene state, α is a decay factor, which is set to 0.75, and the associative path length is set to 3.
And fourthly, mapping the global theme activation value to a grid theme attribute unit group to construct a user query interest theme portrait.
6. The text recommendation method based on deep semantic resolution as claimed in claim 1, wherein a grid-based cosine similarity calculation method is adopted in step 7, and the method uses a cosine similarity formula to calculate the relevance of the text interest grid representation of the user and the language "domain" of the query interest grid representation of the user, wherein the formula is expressed as follows:
<mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>,</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&CenterDot;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&times;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> </mfrac> <mo>.</mo> </mrow>
wherein,for a user text interest grid sketch, q ═ o1,o2,…,onQuery the user for interest grid representation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710000406.3A CN107832312B (en) | 2017-01-03 | 2017-01-03 | Text recommendation method based on deep semantic analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710000406.3A CN107832312B (en) | 2017-01-03 | 2017-01-03 | Text recommendation method based on deep semantic analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107832312A true CN107832312A (en) | 2018-03-23 |
CN107832312B CN107832312B (en) | 2023-10-10 |
Family
ID=61643740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710000406.3A Active CN107832312B (en) | 2017-01-03 | 2017-01-03 | Text recommendation method based on deep semantic analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107832312B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595602A (en) * | 2018-04-20 | 2018-09-28 | 昆明理工大学 | The question sentence file classification method combined with depth model based on shallow Model |
CN110188189A (en) * | 2019-05-21 | 2019-08-30 | 浙江工商大学 | A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary |
CN111858901A (en) * | 2019-04-30 | 2020-10-30 | 北京智慧星光信息技术有限公司 | Text recommendation method and system based on semantic similarity |
CN112256834A (en) * | 2020-10-28 | 2021-01-22 | 中国科学院声学研究所 | Marine science data recommendation system based on content and literature |
CN112287218A (en) * | 2020-10-26 | 2021-01-29 | 安徽工业大学 | Knowledge graph-based non-coal mine literature association recommendation method |
CN113658714A (en) * | 2021-05-11 | 2021-11-16 | 武汉大学 | Port health quarantine case scene matching method and system for overseas infectious disease input |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270384A1 (en) * | 2007-04-28 | 2008-10-30 | Raymond Lee Shu Tak | System and method for intelligent ontology based knowledge search engine |
CN103678277A (en) * | 2013-12-04 | 2014-03-26 | 东软集团股份有限公司 | Theme-vocabulary distribution establishing method and system based on document segmenting |
CN103942285A (en) * | 2014-04-09 | 2014-07-23 | 北京搜狗科技发展有限公司 | Recommendation method and system for dynamic page element |
CN104090958A (en) * | 2014-07-04 | 2014-10-08 | 许昌学院 | Semantic information retrieval system and method based on domain ontology |
CN104298732A (en) * | 2014-09-29 | 2015-01-21 | 中国科学院计算技术研究所 | Personalized text sequencing and recommending method for network users |
CN104484431A (en) * | 2014-12-19 | 2015-04-01 | 合肥工业大学 | Multi-source individualized news webpage recommending method based on field body |
US20150310096A1 (en) * | 2014-04-29 | 2015-10-29 | International Business Machines Corporation | Comparing document contents using a constructed topic model |
-
2017
- 2017-01-03 CN CN201710000406.3A patent/CN107832312B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270384A1 (en) * | 2007-04-28 | 2008-10-30 | Raymond Lee Shu Tak | System and method for intelligent ontology based knowledge search engine |
CN103678277A (en) * | 2013-12-04 | 2014-03-26 | 东软集团股份有限公司 | Theme-vocabulary distribution establishing method and system based on document segmenting |
CN103942285A (en) * | 2014-04-09 | 2014-07-23 | 北京搜狗科技发展有限公司 | Recommendation method and system for dynamic page element |
US20150310096A1 (en) * | 2014-04-29 | 2015-10-29 | International Business Machines Corporation | Comparing document contents using a constructed topic model |
CN104090958A (en) * | 2014-07-04 | 2014-10-08 | 许昌学院 | Semantic information retrieval system and method based on domain ontology |
CN104298732A (en) * | 2014-09-29 | 2015-01-21 | 中国科学院计算技术研究所 | Personalized text sequencing and recommending method for network users |
CN104484431A (en) * | 2014-12-19 | 2015-04-01 | 合肥工业大学 | Multi-source individualized news webpage recommending method based on field body |
Non-Patent Citations (4)
Title |
---|
ANA O. ALVES 等: "ASAP-II: From the Alignment of Phrases to Text Similarity" * |
GANGGAO ZHU 等: "Computing Semantic Similarity of Concepts in Knowledge Graphs" * |
张静娴 等: "基于属性结构的本体映射方法" * |
李兰彬: "面向专题情报服务的领域知识库构建平台研究" * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108595602A (en) * | 2018-04-20 | 2018-09-28 | 昆明理工大学 | The question sentence file classification method combined with depth model based on shallow Model |
CN111858901A (en) * | 2019-04-30 | 2020-10-30 | 北京智慧星光信息技术有限公司 | Text recommendation method and system based on semantic similarity |
CN110188189A (en) * | 2019-05-21 | 2019-08-30 | 浙江工商大学 | A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary |
CN110188189B (en) * | 2019-05-21 | 2021-10-08 | 浙江工商大学 | Knowledge-based method for extracting document abstract by adaptive event index cognitive model |
CN112287218A (en) * | 2020-10-26 | 2021-01-29 | 安徽工业大学 | Knowledge graph-based non-coal mine literature association recommendation method |
CN112256834A (en) * | 2020-10-28 | 2021-01-22 | 中国科学院声学研究所 | Marine science data recommendation system based on content and literature |
CN112256834B (en) * | 2020-10-28 | 2021-06-08 | 中国科学院声学研究所 | Marine science data recommendation system based on content and literature |
CN113658714A (en) * | 2021-05-11 | 2021-11-16 | 武汉大学 | Port health quarantine case scene matching method and system for overseas infectious disease input |
CN113658714B (en) * | 2021-05-11 | 2023-08-18 | 武汉大学 | Port health quarantine case scenario matching method and system for inputting foreign infectious diseases |
Also Published As
Publication number | Publication date |
---|---|
CN107832312B (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Subasic et al. | Affect analysis of text using fuzzy semantic typing | |
Beliga | Keyword extraction: a review of methods and approaches | |
Andhale et al. | An overview of text summarization techniques | |
Kolomiyets et al. | A survey on question answering technology from an information retrieval perspective | |
CN107832312A (en) | A kind of text based on deep semantic discrimination recommends method | |
CN107885749B (en) | Ontology semantic expansion and collaborative filtering weighted fusion process knowledge retrieval method | |
CN106095762A (en) | A kind of news based on ontology model storehouse recommends method and device | |
CN103455487B (en) | The extracting method and device of a kind of search term | |
Belhadi et al. | Exploring pattern mining algorithms for hashtag retrieval problem | |
Sarica et al. | Engineering knowledge graph for keyword discovery in patent search | |
Sanagar et al. | Unsupervised genre-based multidomain sentiment lexicon learning using corpus-generated polarity seed words | |
Hallili | Toward an ontology-based chatbot endowed with natural language processing and generation | |
Anoop et al. | A topic modeling guided approach for semantic knowledge discovery in e-commerce | |
Zhang et al. | Exploring coevolution of emotional contagion and behavior for microblog sentiment analysis: a deep learning architecture | |
Balasubramaniam | Hybrid fuzzy-ontology design using FCA based clustering for information retrieval in semantic web | |
Pan et al. | SPRF: A semantic Pseudo-relevance Feedback enhancement for information retrieval via ConceptNet | |
JP2008243024A (en) | Information acquisition device, program therefor and method | |
Sendi et al. | Possibilistic interest discovery from uncertain information in social networks | |
Orimaye et al. | Performance and trends in recent opinion retrieval techniques | |
Nasution et al. | Semantic information retrieval models | |
Chan | Beyond keyword and cue-phrase matching: A sentence-based abstraction technique for information extraction | |
Malik et al. | Ontology development for agriculture domain | |
Rao et al. | Enhancing multi-document summarization using concepts | |
Lyu et al. | Rule-guided graph neural networks for recommender systems | |
Vicente-López et al. | Personalization of Parliamentary Document Retrieval Using Different User Profiles. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |