[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114564594A - Knowledge graph user preference entity recall method based on double-tower model - Google Patents

Knowledge graph user preference entity recall method based on double-tower model Download PDF

Info

Publication number
CN114564594A
CN114564594A CN202210169936.1A CN202210169936A CN114564594A CN 114564594 A CN114564594 A CN 114564594A CN 202210169936 A CN202210169936 A CN 202210169936A CN 114564594 A CN114564594 A CN 114564594A
Authority
CN
China
Prior art keywords
user
embedding
representing
item
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210169936.1A
Other languages
Chinese (zh)
Inventor
陆佳炜
吴俚达
程振波
韦航俊
朱昊天
方静雯
徐俊
肖刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210169936.1A priority Critical patent/CN114564594A/en
Publication of CN114564594A publication Critical patent/CN114564594A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph user preference entity recalling method based on a double-tower model, wherein an optimization method is added in the traditional double-tower model and is used for better learning interaction between a user and an article, and a trained double-tower model can be used for recalling an entity related to user preference on a knowledge graph. Firstly, taking an entity corresponding to an article in a knowledge graph recorded by a user history as a starting point, and searching all neighbor entities along the edge. And then screening the recalled entities through a trained optimized double-tower model. And finally, repeating the operation by taking the recalled entity as a new starting point. Ultimately forming a knowledge graph capable of representing user preferences and potential preferences.

Description

Knowledge graph user preference entity recall method based on double-tower model
Technical Field
The invention relates to the technical field of knowledge graphs and deep learning, in particular to a knowledge graph user preference entity recalling method based on a double-tower model.
Background
The knowledge graph is a concept proposed by google in 2012, and is a knowledge base used by google to enhance the function of its search engine. Essentially, a knowledge graph is intended to describe various entities or concepts and their relationships that exist in the real world, and constitutes a huge semantic network graph, with nodes representing entities or concepts and edges consisting of attributes or relationships. The representation of each piece of knowledge is in the form of a triplet (h, r, t), where h represents the head entity, t represents the tail entity, and r represents the relationship between the head and tail entities. The knowledge graph plays an important role in the fields of recommendation systems, intelligent question answering, information retrieval and the like by virtue of strong semantic processing capability and open organization capability, and lays a foundation for knowledge organization and intelligent application in the internet era.
Conventional recommendation systems use explicit or implicit information as input for prediction, and there are two main problems. Firstly, the sparsity problem, in the actual scene, the mutual information of user and article is often very sparse, uses so few observation data to predict a large amount of unknown information, can greatly increase the risk of overfitting. The second is the cold start problem, and for newly added users or articles, the newly added users or articles do not have corresponding historical information, so that accurate modeling and recommendation are difficult to perform.
The knowledge graph contains rich semantic association between entities, and provides a potential auxiliary information source for a recommendation system. The knowledge graph introduces more semantic relations for the articles, and can deeply discover the user interests. The variety is linked through different relations in the knowledge graph, so that the divergence of recommendation results is facilitated. The knowledge graph can be connected with the historical records of the user and the recommendation results, so that the satisfaction degree and the acceptance degree of the user on the recommendation results are improved, and the trust of the user on the system is enhanced.
The existing methods for knowledge graph recommendation mainly have two types. One type is based on an embedding-based method (embedding-based methods), by means of a knowledge graph vector embedding algorithm, entities and relations in a knowledge graph are learned, vector representations of the entities and the relations are obtained, and then vectors of the entities and the relations are introduced into a recommendation system framework. For example, DKN framework based on convolutional neural Network (Deep Knowledge-aware Network), and CKE framework based on Collaborative Knowledge base Embedding (Collaborative Knowledge base Embedding). Although the knowledge graph recommendation method based on vector embedding has strong flexibility, the method is generally suitable for intra-graph link prediction application, and the recommendation scene needs to mine the potential interest of the user. The other type is based on path-based methods (path-based methods), which explores various connections between various entities in the knowledge graph and provides additional guidance for the recommendation system. For example, Personalized Entity-Based Recommendation methods (Personalized Entity Recommendation), and metagraph-Based Recommendation methods (Meta-Graph Based Recommendation). While path-based knowledge-graph recommendation methods can use knowledge-graphs in a more natural and intuitive way, they rely heavily on manually designed meta-paths, which are difficult to optimize in practice.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a knowledge graph user preference entity recalling method based on a double-tower model. An optimization method is added in a traditional double-tower model and used for better learning interaction between users and articles; the trained two-tower model can be used to recall entities on the knowledge graph that are relevant to the user's preferences; firstly, taking an entity corresponding to an article of a user historical record in a knowledge graph as a starting point, and searching all neighbor entities along the edge; and then screening the recalled entities through a trained optimized double-tower model. Finally, the recalled entity is taken as a new starting point, and the operation is repeated; ultimately forming a knowledge graph that can represent user preferences and potential preferences.
The technical scheme adopted by the invention is as follows:
a knowledge graph user preference entity recalling method based on a double-tower model comprises the following steps:
1. defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;
2. training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;
3. defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;
4. and inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preferred entity transmission mode, screening entities with high probability, and finally obtaining a knowledge graph representing the user preference and the potential preference.
The process of the step 1 is as follows:
1.1, the user characteristics refer to the interaction behaviors of the user on the object, including click records, search records, social data, personal data and sample age, and the user characteristic vector is obtained by converting the interaction data into a vector and splicing (concatemate). The method for converting the original data into the vector is called vector embedding (embedding), which is a method for representing data features commonly used in machine learning and aims to extract features from the original data, namely low-dimensional vectors after mapping through a neural network;
further, the scheme of 1.1 is as follows:
1.1.1, the embedding of the user click record is a weighted average of id types of the clicked items, wherein the id types of the embedding are vectors which map the item unique identifiers to the same dimension, and the weight of the id types of the embedding is in direct proportion to the item browsing time. The imbedding calculation formula of the user click record is as follows:
Figure BDA0003517220730000031
wherein v isclickRepresenting user clicksThe recorded imbedding of the recording is carried out,
Figure BDA0003517220730000032
denotes the ith weight, vclick,iRepresenting id type embedding of the ith item in the click record, wherein n represents the number of the embedding; wherein,
Figure BDA0003517220730000033
can be calculated by the following formula:
Figure BDA0003517220730000034
wherein
Figure BDA0003517220730000035
Representing the time of browsing the item i by the user, N representing the total number of samples, and k representing the total number of positive examples;
1.1.2, the embedding of the user search record is to perform word segmentation on the keywords of the historical search to obtain entries. The process of Word segmentation is to obtain embedding of the corresponding entry through a Word2vec model, and then carry out weighted average on the embedding of the user search record.
The word segmentation is a technology for segmenting a keyword string submitted by a user to be searched into different entries token by a search engine.
The Word2vec model converts content words in the text into space vectors, and the numerical values of the Word vectors are influenced by the context and contain the correlation between words.
The formula for embedding the user search records is as follows:
Figure BDA0003517220730000036
wherein v issearchEmbedding representing a user's search records,
Figure BDA0003517220730000037
denotes the ith weight, vsearch,iIndicating the imbedding of the ith entry in the search record, wherein n indicates the number of the imbedding;
weight calculation of embedding of search records:
Figure BDA0003517220730000038
judging whether the user clicks the article after searching the article according to the searching validity;
1.1.3, the social data of the user comprises an embedding weighted average corresponding to the collection, praise and subscription data. The imbedding corresponding to the collection and approval data refers to the imbedding of the item id class collected and approved by the user; the imbedding corresponding to the subscription data refers to the imbedding of the id class of the responsible person corresponding to the user subscription item.
The formula for embedding the social data of the user is as follows:
Figure BDA0003517220730000041
wherein v issocialEmbedding representing a user's search records,
Figure BDA0003517220730000042
denotes the ith weight, vsocial,iEmbedding representing the ith social data in the search record;
weight calculation for favorites and praise embedding:
Figure BDA0003517220730000043
wherein
Figure BDA0003517220730000044
Representing the time of browsing the item i by the user, N representing the total number of samples, and k representing the total number of positive examples;
embedding weight calculation for a subscription:
Figure BDA0003517220730000045
wherein
Figure BDA0003517220730000046
Showing the browsing time of the ith item of the subscriber, wherein N represents the total number of samples, and k represents the total number of positive examples;
1.1.4, personal data of the user comprises the gender, age and region of the user; the neutral characteristics are simple binary characteristics, the age and the region belong to continuous characteristics, and the characteristics can be normalized into real values in the [0,1] interval. The embedding of the user personal data is a vector obtained by splicing the processed values of the sex, the age and the region;
further, the scheme of 1.1.4 is as follows:
1.1.4.1, calculating the binary expression of the user gender, wherein the formula is as follows:
Figure BDA0003517220730000047
1.1.4.2, calculating the normalized real value of the user age and the region, wherein the normalized formula is as follows:
Figure BDA0003517220730000048
wherein X represents the sample value, μ is the mean of all sample data, and σ is the standard deviation of all sample data;
1.1.4.3, splicing the sex binary value, age and region normalized real value of 1.1.4 to obtain a vector, wherein the vector splicing operation formula is as follows:
vpersonal=[gender,zage,zregion]
wherein v ispersonalRepresenting the user feature vector, gender, zageAnd zregionAre respectively provided withA normalized value representing the age and region of the user;
1.1.5, the user clicks the recorded embedding in the flow of 1.1, the user searches the recorded embedding, the user interacts with the recorded embedding of the data, the embedding of the user personal data is subjected to a locate connection operation to obtain a user characteristic vector, and the formula is as follows:
vuser=concatenate(vclick,vsearch,vsocial,vpersonal)
=[vclick[1],vclick[2],…,vsearch[1],vsearch[2],…,vsocial[1],vsocial[2],…,vpersonal[1],vpersonal[2],…]
wherein v isuserRepresenting a user feature vector, vclick[i]The ith component, v, representing the user clicking on embeddingsearch[i]The i-th component, v, representing the user's search record embeddingsocial[i]The i component, v, representing the user's social data embeddingpersonal[i]An ith component representing user personal data embedding;
1.2, the article characteristics comprise the id of the article and the context information thereof, and the article characteristic vector is formed by splicing the id type imbedding of the article with the imbedding of the context information thereof;
further, the scheme of 1.2 is as follows:
1.2.1, giving id type embedding of the article, wherein the id type embedding is a vector for mapping the unique identifier of the article to the same dimension;
1.2.2, providing context information embedding of the article, wherein the context information is a vector obtained by Word2 vec;
1.2.3, carrying out a concatemate connection operation on the id type embedding and the context information embedding in the step 1.2 to obtain an article feature vector, wherein the formula is as follows:
vitem=concatenate(vid,vcontext)=[vid[1],vid[2],…,vcontext[1],vcontext[2],…]
wherein v isitemA feature vector of the article is represented,vidid class embedding, v representing an itemcontextEmbedding, v representing item context informationid[i]The i component, v, of the id class embedding representing the itemcontext[i]The ith component of embedding representing item context information;
in step 2, the two-column model is derived from DSSM (deep Structured Selective models). DSSM is a deep structured semantic model, which is commonly used to solve the problem of semantic similarity in the field of natural language processing. The double-tower model can be divided into an input layer, a presentation layer and a matching layer from top to bottom, and is divided into two input layers, two presentation layers and one matching layer when viewed from the horizontal direction. The outputs of the two input layers are respectively the inputs of the two representation layers, the outputs of the two representation layers are collected to the matching layer, and the whole structure is in a double-tower form. In the invention, two inputs of an input layer of the double-tower model are respectively a user characteristic vector and an article characteristic vector, two representation layers are of the same neural network structure, and the characteristic vectors can obtain vectors of the same dimension after calculation through the neural network. Finally, the two vectors are normalized by L2 and then subjected to inner product. Furthermore, the optimized double-tower model is optimized by adopting a frequency estimation method based on a Hash sequence. This approach reduces the sampling bias problem that negative sampling may occur in each batch, thereby optimizing the loss function.
The flow of the step 2 is as follows:
2.1, giving two imbedding functions with parameters:
Figure BDA0003517220730000061
wherein
Figure BDA0003517220730000062
Representing a d-dimensional real number vector.
Figure BDA0003517220730000063
Is the user feature vector and the article feature vector extracted by the deep neural network, wherein
Figure BDA0003517220730000064
And
Figure BDA0003517220730000065
respectively, are the input values required for the double tower model.
Further, the scheme of 2.1 is as follows:
2.1.1, the double tower model contains two deep neural network models. The component unit of the neural network model is a perceptron, the perceptron has a plurality of inputs and an output, wherein the output and the input are a linear relation trained, the output value can obtain the final result through a nonlinear activation function;
2.1.2, the deep neural network is expanded on the neural network model, the deep neural network has an input layer, an output layer and a plurality of hidden layers, the invention adopts 3 hidden layers, and the layers adopt full connection, and the full connection means that each node is connected with all nodes of the previous layer;
2.1.3, inputting the user characteristic vector and the article characteristic vector into a corresponding neural network, wherein because the model needs to be trained, the output result is represented by a function containing a parameter theta to be trained, and the output results are respectively as follows:
Figure BDA0003517220730000066
wherein
Figure BDA0003517220730000067
2.1.4, will output the result
Figure BDA0003517220730000068
The L2 normalization processing is carried out, and the L2 normalization formula is as follows:
Figure BDA0003517220730000071
wherein
Figure BDA0003517220730000072
γiIs represented by [ Y]The ith component on the vector;
2.2, by using a frequency estimation method based on a hash sequence, namely recording the serial number of the sample by using the hash sequence, the problem of sampling deviation possibly occurring in each batch of negative samples is reduced, so that a loss function is optimized, and the following steps are circulated;
further, the scheme of 2.2 is as follows:
2.2.1, randomly taking T samples from the sample set, expressed as follows:
Figure BDA0003517220730000073
wherein xiRepresenting the ith user feature vector in sample T,
yirepresenting the ith item feature vector in sample T,
rirepresents the degree of feedback of the ith user in the sample T, and ri ∈[0,1]。
2.2.2 calculating y in each sample by using a frequency estimation method based on a Hash sequenceiProbability p ofi
The 2.2.2 flow is as follows:
2.2.2.1, setting arrays A and D with learning rate alpha and size H and a hash function H;
wherein the hash function h may hash each yiMapping to a range of [0, H]Is an integer of (1).
2.2.2.2, for each step t ═ 1,2, …, all collections of items in one batch (batch) of training sample numbers
Figure BDA0003517220730000074
For each article
Figure BDA0003517220730000075
Comprises the following steps:
2.2.2.2.1、
Figure BDA0003517220730000076
where ← represents the assignment,
Figure BDA0003517220730000077
denotes yiThe time of the last time that it was sampled,
Figure BDA0003517220730000078
denotes yiThe number of steps of one pass of the sampled data;
2.2.2.2.2、
Figure BDA0003517220730000079
2.2.2.3, for each
Figure BDA00035172207300000710
Probability of sampling
Figure BDA00035172207300000711
2.2.3 computing the optimized loss function
Figure BDA00035172207300000712
And giving a derivation process;
the 2.2.3 flow is as follows:
2.2.3.1, the formula of vector inner product is given:
Figure BDA00035172207300000713
2.2.3.2, given a vector x, obtaining one of the M items
Figure BDA00035172207300000714
The probability of (c) can be calculated using the softmax function, which is formulated as follows:
Figure BDA0003517220730000081
wherein e is a natural constant;
2.2.3.3, the weighted log-likelihood loss function is:
Figure BDA0003517220730000082
wherein T represents the randomly taken T samples of 2.2.1;
2.2.3.4, the invention uses a negative sampling algorithm to calculate, the negative sampling algorithm adopts a smoothing strategy, which can improve the sampling probability of the low frequency sample. We sample negative samples in the same batch. Give a minimum batch of
Figure BDA0003517220730000083
Wherein the batch-softmax function is:
Figure BDA0003517220730000084
2.2.3.5, in each batch, because the existence of the power distribution phenomenon, that is, the random sampling of negative samples in each batch can make the hot goods easy to be sampled, and the loss function excessively penalizes the hot goods, the frequency is considered to correct the sampling, and the formula is as follows:
sc(xi,yi)=s(xi,yi)-log(pi)
wherein s isc(xi,yi) Denotes s (x)i,yi) Correction value of piThe hash sequence based frequency estimation algorithm from 2.2.2;
2.2.3.6, the conditional probability function thus modified is:
Figure BDA0003517220730000085
2.2.3.7, the final loss function is:
Figure BDA0003517220730000086
and 2.2.4, updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value. Wherein the gradient descent algorithm machine learns the commonly used algorithm for solving the model parameters;
the flow of the step 3 is as follows:
3.1, given a user interaction matrix
Figure BDA0003517220730000087
Figure BDA0003517220730000088
Denotes the first
Figure BDA0003517220730000089
A user and the first
Figure BDA00035172207300000810
The interactive condition of each item, U and V respectively represent a user set and an item set;
the expression of the user interaction matrix Y is as follows:
Figure BDA0003517220730000091
3.2, definition of Oi,jRepresenting the interaction condition of the user i to the item j, and mapping the item with the interaction of the user i to the knowledge graph
Figure BDA0003517220730000092
The entity of (a);
further, the 3.2 scheme is as follows:
3.2.1、Oi,jis an interaction matrix composed of row vectors of users. O isi,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;
3.2.2, defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the articles, and the value stores the entity corresponding to the object;
3.2.3, expressing the interaction condition of a user from a row vector of a user interaction matrix O, storing the row vector in an array, wherein the array is defined as Ee;
3.2.3, defining a temporary set temp _ set user storage entity, traversing an array E, and if the value of an element in the array is 1, accessing an article HashMap according to the index of the element to acquire a corresponding entity, and storing the entity in the temporary set temp _ set;
the flow of the step 4 is as follows:
4.1, giving the user feature vector v of the first step to a useruser
4.2, initializing a temporary set temp _ set of the user according to the third step;
4.3, defining a HashMap preferred by a user, namely a user _ map, storing the circulation times of a key value, and storing a triple set by a value;
4.4, defining a set used _ set for storing the recalled triples;
4.5, setting the loop time number K to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the loop, and executing the following steps:
4.5.1, traverse the entity in temp _ set, remove the entity in used _ set
temp_set←temp_set-used_set
Wherein ← represents valuation, -represents difference set operation;
4.5.2 finding triple sets from the knowledge graph
Figure BDA0003517220730000093
And stored in the user _ map, defined as follows:
Figure BDA0003517220730000101
wherein (h ', r', t ') represents a triplet, h' represents a head entity, r 'represents a relationship, and t' represents a tail entity;
4.5.3, due to the existence of a loop in the knowledge graph, in order to prevent the collection of the recalled entities, the entities in temp _ set need to be added to used _ set;
4.5.4, taking out
Figure BDA0003517220730000102
Tail entities of middle triplets, deposited into collections
Figure BDA0003517220730000103
4.5.5, go through
Figure BDA0003517220730000104
The corresponding article feature vector v is taken out from the entity in (1)item
4.5.6, input parameter v described by the second stepuserAnd vitemTo the double tower model, the users and the sets are obtained
Figure BDA0003517220730000105
The probability of the corresponding article in (1), and ordering the entities according to the probability;
4.5.7, a value τ is given, 0< τ ≦ 1, for determining the number of screening entities:
Figure BDA0003517220730000106
wherein
Figure BDA0003517220730000107
Representing the elements in the set sorted by probability and returning an array, getSubVec (i, j) representing the fetching of the child array, i.e. the i to j elements of the original array, newSet () representing the conversion of the array into the set,
Figure BDA0003517220730000108
represents rounding up;
4.5.8 from
Figure BDA0003517220730000109
Screening of tailing entity genera from collectionsIn that
Figure BDA00035172207300001010
The triplet of (2):
Figure BDA00035172207300001011
4.5.9, method for storing the triple set lambda to user _ map: user _ map (k, Λ);
4.5.10, will
Figure BDA00035172207300001012
Set overlay to temp _ set;
and 4.6, after the execution is finished by 4.5, finally obtaining the knowledge graph preferred by the user.
The invention has the following beneficial effects: and screening entities preferred by users in the knowledge graph through the optimized double-tower model. The optimization method of the double-tower model adopts a frequency estimation method based on a Hash sequence, so that the article can better adapt to various data distributions. By screening the knowledge graph entities, not only can better data be obtained, but also recalled entities can be really close to user preferences; and the screening entities are beneficial to the deep recalling of the knowledge graph, because the recalled entities in each time can be increased due to the explosion of the cardinality of the previous entities, and if the screening is not carried out, the computing efficiency and the exploration of the potential preference of the user can be influenced.
Detailed Description
The present invention is further described below with reference to examples.
Example (b):
the first step is as follows: defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;
the process of the first step is as follows:
1.1, the user characteristics refer to the interaction behaviors of the user on the object, including click records, search records, social data, personal data and sample age, and the user characteristic vector is obtained by converting the interaction data into a vector and splicing (concatemate). The method for converting the original data into the vector is called vector embedding (embedding), which is a method for representing data features commonly used in machine learning and aims to extract features from the original data, namely low-dimensional vectors after mapping through a neural network;
further, the scheme of 1.1 is as follows
1.1.1, embedding of the user click record is weighted average of id type embedding of all clicked items, wherein the id type embedding is a vector for mapping item unique identifiers to the same dimension, and the weight of the id type embedding is in direct proportion to item browsing time. The imbedding calculation formula of the user click record is as follows:
Figure BDA0003517220730000111
wherein v isclickIndicating that the user clicked on the recorded embedding,
Figure BDA0003517220730000112
denotes the ith weight, vclick,iThe id type embedding of the ith item in the click record is represented, and n represents the number of the embedding; wherein,
Figure BDA0003517220730000113
can be calculated by the following formula:
Figure BDA0003517220730000114
wherein
Figure BDA0003517220730000115
Representing the time of browsing the item i by the user, N representing the total number of samples, and k representing the total number of positive examples;
1.1.2, the embedding of the user search record is to perform word segmentation on the keywords of the historical search to obtain entries. The process of Word segmentation is to obtain embedding of the corresponding entry through a Word2vec model, and then carry out weighted average on the embedding of the user search record.
The word segmentation is a technology for segmenting a keyword string submitted by a user to be searched into different entries token by a search engine.
The Word2vec model is proposed by Mikolov et al in 2013, the model converts content words in a text into space vectors through conversion processing, and the numerical values of the Word vectors are influenced by context and contain the mutual relevance between words.
The formula for embedding the user search records is as follows:
Figure BDA0003517220730000116
wherein v issearchEmbedding representing a user's search records,
Figure BDA0003517220730000117
denotes the ith weight, vsearch,iIndicating the imbedding of the ith entry in the search record, wherein n indicates the number of the imbedding;
weight calculation of embedding of search records:
Figure BDA0003517220730000121
judging whether the user clicks the article after searching the article according to the searching validity;
1.1.3, the social data of the user comprises an embedding weighted average corresponding to the collection, praise and subscription data. The imbedding corresponding to the collection and approval data refers to the imbedding of the item id class collected and approved by the user; the imbedding corresponding to the subscription data refers to the imbedding of the id class of the responsible person corresponding to the user subscription item.
The formula for embedding the social data of the user is as follows:
Figure BDA0003517220730000122
wherein v issocialEmbedding representing a user's search records,
Figure BDA0003517220730000123
denotes the ith weight, vsocial,iEmbedding representing the ith social data in the search record;
weight calculation for favorites and praise embedding:
Figure BDA0003517220730000124
wherein
Figure BDA0003517220730000125
Representing the time of browsing the item i by the user, N representing the total number of samples, and k representing the total number of positive examples;
embedding weight calculation for a subscription:
Figure BDA0003517220730000126
wherein
Figure BDA0003517220730000127
Showing the browsing time of the ith item of the subscriber, wherein N represents the total number of samples, and k represents the total number of positive examples;
1.1.4, the user's personal data includes the user's gender, age, and location. The neutral features are simple binary features, the age and the region belong to continuous features, and the binary features can be normalized to real values in the interval of [0,1 ]. Embedding of user personal data is a vector obtained by splicing the processed values of gender, age and region;
further, the scheme of 1.1.4 is as follows:
1.1.4.1, calculating the binary expression of the user gender, wherein the formula is as follows:
Figure BDA0003517220730000131
1.1.4.2, calculating the normalized real value of the age and the region of the user, wherein the normalized formula is as follows:
Figure BDA0003517220730000132
wherein X represents the sample value, μ is the mean of all sample data, and σ is the standard deviation of all sample data;
1.1.4.3, splicing the sex binary value, age and the region normalized real value of 1.1.4 to obtain a vector, wherein the vector splicing operation formula is as follows:
vpersonal=[gender,zage,zregion]
wherein v ispersonalRepresenting the user feature vector, gender, zageAnd zregionNormalized values respectively representing the age and region of the user;
1.1.5, the user clicks the recorded embedding in the flow of 1.1, the user searches the recorded embedding, the user interacts with the recorded embedding of the data, the embedding of the user personal data is subjected to a locate connection operation to obtain a user characteristic vector, and the formula is as follows:
vuser=concatenate(vclick,vsearch,vsocial,vpersonal)
=[vclick[1],vclick[2],…,vsearch[1],vsearch[2],…,vsocial[1],vsocial[2],…,vpersonal[1],vpersonal[2],…]
wherein v isuserRepresenting a user feature vector, vclick[i]The ith component, v, representing the user clicking embeddingsearch[i]The i-th component, v, representing the user's search record embeddingsocial[i]The i component, v, representing the user's social data embeddingpersonal[i]An ith component representing user personal data embedding;
1.2, the article characteristics comprise the id and the context information of the article, and the article characteristic vector is formed by splicing the id class embedding of the article with the embedding of the context information;
further, the scheme of 1.2 is as follows:
1.2.1, giving id type embedding of the article, wherein the id type embedding is a vector for mapping the unique identifier of the article to the same dimension;
1.2.2, providing context information embedding of the article, wherein the context information is a vector obtained by Word2 vec;
1.2.3, carrying out a concatemate connection operation on the id type embedding and the context information embedding in the step 1.2 to obtain an article feature vector, wherein the formula is as follows:
vitem=concatenate(vid,vcontext)=[vid[1],vid[2],…,vcontext[1],vcontext[2],…]
wherein v isitemRepresenting feature vectors, v, of the articleidId class embedding, v representing an itemcontextEmbedding, v representing item context informationid[i]The i component, v, of the id class embedding representing the itemcontext[i]The ith component of embedding representing item context information;
the second step is that: training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;
in the second step, the two-column model is derived from DSSM (deep Structured semiconductor models). DSSM is a deep structured semantic model, which is commonly used to solve the problem of semantic similarity in the field of natural language processing. The double-tower model can be divided into an input layer, a presentation layer and a matching layer from top to bottom, and is divided into two input layers, two presentation layers and one matching layer when viewed from the horizontal direction. The outputs of the two input layers are respectively the inputs of the two representation layers, the outputs of the two representation layers are gathered to the matching layer, and the whole body presents a form of 'double towers'. In the invention, two inputs of an input layer of the double-tower model are respectively a user characteristic vector and an article characteristic vector, two representation layers are of the same neural network structure, and the characteristic vectors can obtain vectors of the same dimension after calculation through the neural network. Finally, the two vectors are normalized by L2 and then subjected to inner product. Furthermore, the optimized double-tower model is optimized by adopting a frequency estimation method based on a Hash sequence. This approach reduces the sampling bias problem that negative sampling may occur in each batch, thereby optimizing the loss function.
The flow of the second step is as follows:
2.1, giving two imbedding functions with parameters:
Figure BDA0003517220730000141
wherein
Figure BDA0003517220730000142
Representing a d-dimensional real vector.
Figure BDA0003517220730000143
Is the user feature vector and the article feature vector extracted by the deep neural network, wherein
Figure BDA0003517220730000144
And
Figure BDA0003517220730000145
respectively, the input values required for the two-tower model.
Further, the procedure of 2.1 is as follows
2.1.1, the double tower model contains two deep neural network models. The component unit of the neural network model is a perceptron, the perceptron has a plurality of inputs and an output, wherein the output and the input are a linear relation trained, the output value can obtain the final result through a nonlinear activation function;
2.1.2, the deep neural network is expanded on the neural network model, the deep neural network has an input layer, an output layer and a plurality of hidden layers, the invention adopts 3 hidden layers, and the layers adopt full connection, and the full connection means that each node is connected with all nodes of the previous layer;
2.1.3, inputting the user characteristic vector and the article characteristic vector into a corresponding neural network, wherein because the model needs to be trained, the output result is represented by a function containing a parameter theta to be trained, and the output results are respectively as follows:
Figure BDA0003517220730000151
wherein
Figure BDA0003517220730000152
2.1.4, will output the result
Figure BDA0003517220730000153
The L2 normalization processing is carried out, and the L2 normalization formula is as follows:
Figure BDA0003517220730000154
wherein
Figure BDA0003517220730000155
γiIs represented by [ Y]The ith component on the vector;
2.2, by using a frequency estimation method based on a hash sequence, namely recording the serial number of the sample by using the hash sequence, the problem of sampling deviation possibly occurring in each batch of negative samples is reduced, so that a loss function is optimized, and the following steps are circulated;
further, the scheme of 2.2 is as follows:
2.2.1, randomly taking T samples from the sample set, expressed as follows:
Figure BDA0003517220730000156
wherein xiRepresenting the ith user feature vector in sample T,
yirepresenting the ith item feature vector in sample T,
rirepresents the degree of feedback of the ith user in the sample T, and ri ∈[0,1]。
2.2.2 computing y in each sample by using a frequency estimation method based on a Hash sequenceiProbability p ofi
The 2.2.2 flow is as follows:
2.2.2.1, setting arrays A and D with learning rate alpha and size H and a hash function H;
wherein the hash function h may hash each yiMapping to a range of [0, H]Is an integer of (1).
2.2.2.2, for each step t ═ 1,2, …, all collections of items in one batch (batch) of training sample numbers
Figure BDA0003517220730000157
For each article
Figure BDA0003517220730000158
Comprises the following steps:
2.2.2.2.1、
Figure BDA0003517220730000159
where ← represents the assignment,
Figure BDA00035172207300001510
denotes yiThe time of the last time that it was sampled,
Figure BDA00035172207300001511
denotes yiThe number of steps of one pass of the sampled data;
2.2.2.2.2、
Figure BDA00035172207300001512
2.2.2.3, for each
Figure BDA0003517220730000161
Probability of sampling
Figure BDA0003517220730000162
2.2.3 computational optimizationPost loss function
Figure BDA0003517220730000163
And giving a derivation process;
the 2.2.3 process is as follows:
2.2.3.1, the formula of vector inner product is given:
Figure BDA0003517220730000164
2.2.3.2, given a vector x, obtaining one of the M items
Figure BDA0003517220730000165
The probability of (c) can be calculated using the softmax function, which is formulated as follows:
Figure BDA0003517220730000166
wherein; theta after the symbol
Figure BDA0003517220730000167
B, carrying parameters, e is a natural constant;
2.2.3.3, the weighted log-likelihood loss function is:
Figure BDA0003517220730000168
wherein T represents the randomly taken T samples of 2.2.1;
2.2.3.4, the invention uses the negative sampling algorithm to calculate, the negative sampling algorithm adopts a smooth strategy, and the sampling probability of the low frequency sample can be improved. We sample negative samples in the same batch. Giving a minimum batch of
Figure BDA0003517220730000169
Wherein the batch-softmax function is:
Figure BDA00035172207300001610
2.2.3.5, in each batch, because the existence of the power distribution phenomenon, that is, the random sampling of negative samples in each batch can make the hot goods easy to be sampled, and the loss function excessively penalizes the hot goods, the frequency is considered to correct the sampling, and the formula is as follows:
sc(xi,yi)=s(xi,yi)-log(pi)
wherein s isc(xi,yi) Denotes s (x)i,yi) Correction value of piThe hash sequence based frequency estimation algorithm from 2.2.2;
2.2.3.6, the conditional probability function thus modified is:
Figure BDA0003517220730000171
2.2.3.7, the final loss function is:
Figure BDA0003517220730000172
and 2.2.4, updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value. Wherein the gradient descent algorithm
A machine learning common algorithm for solving model parameters;
the third step: defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;
the third step of the process is as follows:
3.1, given a user interaction matrix
Figure BDA0003517220730000173
Figure BDA0003517220730000174
Is shown as
Figure BDA0003517220730000175
A user and the second
Figure BDA0003517220730000176
The interactive condition of each item, U and V respectively represent a user set and an item set;
the expression of the user interaction matrix Y is as follows:
Figure BDA0003517220730000177
3.2, definition of Oi,jRepresenting the interaction condition of the user i to the item j, and mapping the item with the interaction of the user i to the knowledge graph
Figure BDA0003517220730000178
The entity of (1);
further, the 3.2 scheme is as follows:
3.2.1、Oi,jis an interaction matrix, which is composed of row vectors of users. O isi,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;
3.2.2, defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the article, and the value stores the entity corresponding to the object;
3.2.3, expressing the interaction condition of a user from a row vector of a user interaction matrix O, storing the row vector in an array, wherein the array is defined as Ee;
3.2.3, defining a temporary set temp _ set user storage entity, traversing an array E, and if the value of an element in the array is 1, accessing an article HashMap according to the index of the element to acquire a corresponding entity, and storing the entity in the temporary set temp _ set;
the fourth step: inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preferred entity transmission mode, screening entities with high probability, and finally obtaining a knowledge graph representing the user preference and the potential preference;
the fourth step is as follows:
4.1, providing the user feature vector v of the first step for a useruser
4.2, initializing a temporary set temp _ set of the user according to the third step;
4.3, defining a HashMap preferred by a user, namely a user _ map, storing the circulation times of a key value, and storing a triple set by a value;
4.4, defining a set used _ set for storing the recalled triples;
4.5, setting the cycle number K to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the cycle, and executing the following steps:
4.5.1 traversing the entity in temp _ set, removing the entity in used _ set
temp_set←temp_set-used_set
Wherein ← represents valuation, -represents difference set operation;
4.5.2 finding triple sets from the knowledge graph
Figure BDA0003517220730000181
And stored in the user _ map, defined as follows:
Figure BDA0003517220730000182
wherein (h ', r', t ') represents a triplet, h' represents a head entity, r 'represents a relationship, and t' represents a tail entity;
4.5.3, due to the existence of a loop in the knowledge graph, in order to prevent the recalled entities from being collected, the entities in temp _ set need to be added to used _ set;
4.5.4, taking out
Figure BDA0003517220730000183
Tail entities of middle triplets, deposited into collections
Figure BDA0003517220730000184
4.5.5, go through
Figure BDA0003517220730000185
The corresponding article feature vector v is taken out from the entity in (1)item
4.5.6, input parameter v described by the second stepuserAnd vitemTo the double tower model, the user and the set are obtained
Figure BDA00035172207300001812
The probability of the corresponding article in (1), and ordering the entities according to the probability;
4.5.7, a value τ is given, 0< τ ≦ 1, for determining the number of screening entities:
Figure BDA0003517220730000186
wherein
Figure BDA0003517220730000187
Representing the elements in the set sorted by probability and returning an array, getSubVec (i, j) representing the fetching of the child array, i.e. the i to j elements of the original array, newSet () representing the conversion of the array into the set,
Figure BDA0003517220730000188
represents rounding up;
4.5.8 from
Figure BDA0003517220730000189
The screened tail entities in the set belong to
Figure BDA00035172207300001810
The triplet of (2):
Figure BDA00035172207300001811
4.5.9, storing the triple set Lambda to the user _ map: user _ map (k, Λ); 4.5.10, will
Figure BDA0003517220730000191
Set overlay to temp _ set;
and 4.6, after the execution is finished by 4.5, finally obtaining the knowledge graph preferred by the user.

Claims (5)

1. A knowledge graph user preference entity recalling method based on a double-tower model is characterized by comprising the following steps:
1) defining a user characteristic vector and an article characteristic vector as the input of a double-tower model;
2) training a double-tower model, and optimizing the double-tower model by combining an in-batch softmax loss function and a frequency estimation method based on a Hash sequence;
3) defining an entity mapping relation between a user historical interaction matrix and a knowledge graph;
4) and inputting the entities recalled by each transmission and the user characteristics into the optimized double-tower model user bias to obtain a prediction probability in a preference entity transmission mode, screening the entities according to the prediction probability, and finally obtaining a knowledge graph representing the user preference and the potential preference.
2. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of step 1) is as follows:
1.1) user characteristics refer to interactive behaviors of a user on an article, including click records, search records, social data, personal data and sample age, and user characteristic vectors are obtained by converting the interactive data into vectors and splicing the vectors; the mode of converting the original data into the vector is called vector embedding;
1.1.1) the embedding of the user click record is the weighted average of the id types of the clicked items, wherein the id types of the embedding are vectors which map the unique identifiers of the items to the same dimension, and the weight of the id types of the embedding is in direct proportion to the item browsing time; the imbedding calculation formula of the user click record is as follows:
Figure FDA0003517220720000011
wherein v isclickAn embedding representing the user clicking on a record,
Figure FDA0003517220720000012
denotes the ith weight, vclick,iRepresenting id type embedding of the ith item in the click record, wherein n represents the number of the embedding; wherein,
Figure FDA0003517220720000013
can be calculated by the following formula:
Figure FDA0003517220720000014
wherein
Figure FDA0003517220720000015
Representing the time when the user browses the item i, N representing the total number of samples, and k representing the total number of positive examples;
1.1.2) the embedding of the user search record is to perform word segmentation on the keywords of the historical search to obtain entries; the Word segmentation process is to obtain embedding of a corresponding entry through a Word2vec model, and then carry out weighted average on the embedding of the user search record;
the formula for embedding the user search records is as follows:
Figure FDA0003517220720000021
wherein v issearchEmbedding representing a user's search records,
Figure FDA0003517220720000022
denotes the ith weight, vsearch,iIndicating the imbedding of the ith entry in the search record, wherein n indicates the number of the imbedding;
weight calculation of embedding of search records:
Figure FDA0003517220720000023
judging whether the user clicks the article after searching the article according to the searching validity;
1.1.3) the social data of the user comprises the embedding weighted average corresponding to the collection, praise and subscription data; the imbedding corresponding to the collection and approval data refers to the imbedding of the item id class collected and approved by the user; subscribing the embedding corresponding to the data refers to subscribing the embedding of the id class of a person in charge corresponding to the item by the user;
the formula for embedding the social data of the user is as follows:
Figure FDA0003517220720000024
wherein v issocialEmbedding representing a user's search records,
Figure FDA0003517220720000025
denotes the ith weight, vsocial,iEmbedding representing the ith social data in the search record;
weight calculation for favorites and praise embedding:
Figure FDA0003517220720000026
wherein
Figure FDA0003517220720000027
Representing user to itemi, browsing time, N represents the total number of samples, and k represents the total number of positive cases;
embedding weight calculation for a subscription:
Figure FDA0003517220720000028
wherein
Figure FDA0003517220720000029
Showing the browsing time of the ith item of the subscriber, wherein N represents the total number of samples, and k represents the total number of positive examples;
1.1.4) personal data of a user includes the user's gender, age, and location; the neutral characteristic is a simple binary characteristic, the age and the region belong to a continuous characteristic, and the continuous characteristic is normalized into a real numerical value in a [0,1] interval; embedding of user personal data is a vector obtained by splicing the processed values of gender, age and region;
1.1.4.1) calculates a binary representation of the user's gender, which is formulated as follows:
Figure FDA0003517220720000031
1.1.4.2) calculating the normalized real value of the age and the region of the user, wherein the normalized formula is as follows:
Figure FDA0003517220720000032
wherein X represents the sample value, μ is the mean of all sample data, and σ is the standard deviation of all sample data;
1.1.4.3) splicing the sex binary value, age and the region normalized real value in the step 1.1.4) to obtain a vector, wherein the vector splicing operation formula is as follows:
vpersonal=[gender,zage,zregion]
wherein v ispersonalRepresenting the user feature vector, gender, zageAnd zregionNormalized values respectively representing the age and region of the user;
1.1.5) clicking recorded embedding by the user according to the process in the step 1.1), searching recorded embedding by the user, interacting data embedding by the user, and carrying out concatemate connection operation on the embedding of personal data of the user to obtain a user characteristic vector, wherein the formula is as follows:
vuser=concatenate(vclick,vsearch,vsocial,vpersonal)=[vclick[1],vclick[2],…,vsearch[1],vsearch[2],…,vsocial[1],vsocial[2],…,vpersonal[1],vpersonal[2],…]
wherein v isuserRepresenting a user feature vector, vclick[i]The ith component, v, representing the user clicking on embeddingsearch[i]The i-th component, v, representing the user's search record embeddingsocial[i]The i component, v, representing the user's social data embeddingpersonal[i]An ith component representing user personal data embedding;
1.2) the article characteristics comprise the id of the article and the context information thereof, and the article characteristic vector is formed by splicing the id class embedding of the article and the embedding of the context information thereof;
1.2.1) giving an id class embedding of the article, wherein the id class embedding is a vector for mapping the unique identifier of the article to the same dimension;
1.2.2) giving context information embedding of the article, wherein the context information is a vector obtained by Word2 vec;
1.2.3) carrying out a containing connection operation on the id type embedding and the context information embedding in the step 1.2) to obtain an article feature vector, wherein the formula is as follows:
vitem=concatenate(vid,vcontext)=[vid[1],vid[2],…,vcontext[1],vcontext[2],…]
wherein v isitemIndicating the nature of the articleEigenvectors, vidId class embedding, v representing an itemcontextEmbedding, v representing item context informationid[i]The i component, v, of the id class embedding representing the itemcontext[i]The ith component of embedding representing item context information.
3. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 2) is as follows:
2.1) the embedding function gives two band parameters:
Figure FDA0003517220720000041
wherein
Figure FDA0003517220720000042
Figure FDA0003517220720000043
Representing a d-dimensional real number vector;
Figure FDA0003517220720000044
is the user feature vector and the article feature vector extracted by the deep neural network, wherein
Figure FDA0003517220720000045
And
Figure FDA0003517220720000046
respectively are input values required by the double-tower model;
2.1.1) the double tower model contains two deep neural network models; the component unit of the neural network model is a perceptron, the perceptron has a plurality of inputs and an output, wherein the output and the input are a linear relation trained, the output value can obtain the final result through a nonlinear activation function;
2.1.2) the deep neural network is expanded on the neural network model, and the deep neural network is provided with an input layer, an output layer and a plurality of hidden layers;
2.1.3) inputting the user characteristic vector and the article characteristic vector into a corresponding neural network, wherein the output result is represented by a function containing a parameter theta to be trained because the model needs to be trained, and the output results are respectively as follows:
Figure FDA0003517220720000047
wherein
Figure FDA0003517220720000048
2.1.4) will output the result
Figure FDA0003517220720000049
The L2 normalization processing is carried out, and the L2 normalization formula is as follows:
Figure FDA00035172207200000410
wherein
Figure FDA00035172207200000411
γiIs represented by [ Y]The ith component on the vector;
2.2) through a frequency estimation method based on a hash sequence, namely, the hash sequence is used for recording the serial number of the sample, the problem of sampling deviation possibly occurring in each batch of negative samples is reduced, so that a loss function is optimized, and the following steps are circulated;
2.2.1) randomly taking T samples from the sample set, expressed as follows:
Figure FDA00035172207200000412
wherein xiRepresenting the ith user feature vector in sample T,
yirepresenting the ith item feature vector in sample T,
rirepresents the degree of feedback of the ith user in the sample T, and ri∈[0,1];
2.2.2) computing y in each sample by using a frequency estimation method based on a Hash sequenceiProbability p ofi
2.2.2.1) setting arrays A and D with learning rate alpha and size H and a hash function H;
wherein the hash function h may hash each yiMapping to a range of [0, H]An integer of (d);
2.2.2.2) for each step t ═ 1,2, …, all collections of items in one batch of the training sample number batch
Figure FDA0003517220720000051
For each article
Figure FDA0003517220720000052
Comprises the following steps:
2.2.2.2.1)
Figure FDA0003517220720000053
where ← represents the assignment,
Figure FDA0003517220720000054
denotes yiThe time of the last time that it was sampled,
Figure FDA0003517220720000055
denotes yiThe number of steps of one pass of the sampled data;
2.2.2.2.2)
Figure FDA0003517220720000056
2.2.2.3) for each
Figure FDA0003517220720000057
Probability of sampling
Figure FDA0003517220720000058
2.2.3) after calculation optimizationLoss function of
Figure FDA0003517220720000059
And giving a derivation process;
2.2.3.1) gives the vector inner product formula:
Figure FDA00035172207200000510
2.2.3.2) given a vector
Figure FDA00035172207200000511
Obtaining one of the M articles
Figure FDA00035172207200000512
The probability of (c) can be calculated using the softmax function, which is formulated as follows:
Figure FDA00035172207200000513
wherein e is a natural constant;
2.2.3.3) weighted log-likelihood loss function as:
Figure FDA00035172207200000514
wherein T represents 2.2.1) the randomly taken T samples;
2.2.3.4) calculating by using a negative sampling algorithm, wherein the negative sampling algorithm adopts a smoothing strategy and can improve the sampling probability of low-frequency samples; sampling negative samples in the same batch; giving a minimum batch of
Figure FDA00035172207200000515
Wherein the batch-softmax function is:
Figure FDA0003517220720000061
2.2.3.5) in each batch, the sampling is corrected by frequency, which is considered to be the following equation, because of the power distribution phenomenon, that is, the negative samples are randomly sampled in each batch, which makes the hot goods easy to sample, and the hot goods are excessively punished in the loss function:
sc(xi,yi)=s(xi,yi)-log(pi)
wherein s isc(xi,yi) Denotes s (x)i,yi) Correction value of piIs yiThe probability of (d);
2.2.3.6) the conditional probability function thus modified is:
Figure FDA0003517220720000062
2.2.3.7) the final loss function is:
Figure FDA0003517220720000063
2.2.4) updating the parameter theta by adopting a gradient descent algorithm to enable the parameter theta to be close to an optimal value; the gradient descent algorithm machine learns the commonly used algorithm and is used for solving the model parameters.
4. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 3) is as follows:
3.1) given a user interaction matrix
Figure FDA0003517220720000064
Figure FDA0003517220720000065
Is shown as
Figure FDA0003517220720000066
A user and the second
Figure FDA0003517220720000067
The interactive condition of each item, U and V respectively represent a user set and an item set;
the expression of the user interaction matrix Y is as follows:
Figure FDA0003517220720000068
3.2) definition of Oi,jRepresenting the interaction condition of the user i to the item j, and mapping the item with the interaction of the user i to the knowledge graph
Figure FDA0003517220720000069
The entity of (1);
3.2.1)Oi,jis an interactive matrix, which is composed of row vectors of users; o isi,jRepresenting the interaction condition of a user i to an item j, wherein j is the index of the item;
3.2.2) defining a HashMap for storing all articles, wherein the HashMap is stored in a key-value key value pair mode, the key stores the index of the article, and the value stores the entity corresponding to the object;
3.2.3) expressing the interaction condition of a user from the row vector of the user interaction matrix O, and storing the row vector in an array, wherein the array is defined as Ee;
3.2.3) defining a temporary set temp _ set user storage entity, traversing the array E, and if the value of an element in the array is 1, accessing the article HashMap according to the index of the element, acquiring the corresponding entity, and storing the entity in the temporary set temp _ set.
5. The method for recalling knowledge-graph user preference entity based on double-tower model according to claim 1, wherein the specific process of the step 4) is as follows:
4.1) providing a user with the user feature vector v of the first stepuser
4.2) initializing the temporary set temp _ set of the user according to the third step;
4.3) defining a HashMap preferred by a user, namely a user _ map, storing the circulation times of a key value, and storing a triple set by a value;
4.4) defining a set used _ set for storing the recalled triples;
4.5) making the number K of loops equal to 1,2,3 …, giving a maximum value K, and when the number of triples in the user _ map is greater than K, exiting the loop, and executing the following steps:
4.5.1) traverse the entities in temp _ set, remove the entities in used _ set:
temp_set←temp_set-used_set
wherein ← represents valuation, -represents difference set operation;
4.5.2) finding triple sets from the knowledge-graph
Figure FDA0003517220720000071
And stored in the user _ map, defined as follows:
Figure FDA0003517220720000072
wherein (h ', r', t ') represents a triplet, h' represents a head entity, r 'represents a relationship, and t' represents a tail entity;
4.5.3) due to the existence of a loop of the knowledge graph, in order to prevent the collection of the recalled entities, the entities in temp _ set need to be added to used _ set;
4.5.4) taking out
Figure FDA0003517220720000073
Tail entities of middle triplets, deposited into collections
Figure FDA0003517220720000074
4.5.5) traversal
Figure FDA0003517220720000075
The corresponding article feature vector v is taken out from the entity in (1)item
4.5.6) input parameter v as described by the second stepuserAnd vitemTo the double tower model, the users and the sets are obtained
Figure FDA0003517220720000076
The probability of the corresponding article in (1), and ordering the entities according to the probability;
4.5.7) given a value τ,0< τ ≦ 1, for determining the number of screening entities:
Figure FDA0003517220720000077
wherein
Figure FDA0003517220720000078
Representing the elements in the set sorted by probability and returning an array, getSubVec (i, j) representing the fetching of the child array, i.e. the i to j elements of the original array, newSet () representing the conversion of the array into the set,
Figure FDA0003517220720000081
represents rounding up;
4.5.8) from
Figure FDA0003517220720000082
The screened tail entities in the set belong to
Figure FDA0003517220720000083
The triplet of (2):
Figure FDA0003517220720000084
4.5.9) method of depositing a triplet set Λ to user _ map: user _ map (k, Λ);
4.5.10) will be
Figure FDA0003517220720000085
Set overlay to temp _ set;
4.6) and 4.5) finally obtaining the knowledge graph preferred by the user.
CN202210169936.1A 2022-02-23 2022-02-23 Knowledge graph user preference entity recall method based on double-tower model Pending CN114564594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210169936.1A CN114564594A (en) 2022-02-23 2022-02-23 Knowledge graph user preference entity recall method based on double-tower model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210169936.1A CN114564594A (en) 2022-02-23 2022-02-23 Knowledge graph user preference entity recall method based on double-tower model

Publications (1)

Publication Number Publication Date
CN114564594A true CN114564594A (en) 2022-05-31

Family

ID=81714548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210169936.1A Pending CN114564594A (en) 2022-02-23 2022-02-23 Knowledge graph user preference entity recall method based on double-tower model

Country Status (1)

Country Link
CN (1) CN114564594A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150504A (en) * 2023-04-17 2023-05-23 特斯联科技集团有限公司 Recommendation method and device for processing long tail distribution, computer storage medium and terminal
CN118312657A (en) * 2024-06-07 2024-07-09 江苏瑞问科技有限公司 Knowledge base-based intelligent large model analysis recommendation system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150504A (en) * 2023-04-17 2023-05-23 特斯联科技集团有限公司 Recommendation method and device for processing long tail distribution, computer storage medium and terminal
CN118312657A (en) * 2024-06-07 2024-07-09 江苏瑞问科技有限公司 Knowledge base-based intelligent large model analysis recommendation system and method
CN118312657B (en) * 2024-06-07 2024-08-09 江苏瑞问科技有限公司 Knowledge base-based intelligent large model analysis recommendation system and method

Similar Documents

Publication Publication Date Title
CN111523047B (en) Multi-relation collaborative filtering algorithm based on graph neural network
CN108920641B (en) Information fusion personalized recommendation method
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN110674407A (en) Hybrid recommendation method based on graph convolution neural network
CN113807422B (en) Weighted graph convolutional neural network scoring prediction model integrating multi-feature information
CN111382283B (en) Resource category label labeling method and device, computer equipment and storage medium
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN109471982B (en) Web service recommendation method based on QoS (quality of service) perception of user and service clustering
CN109933720B (en) Dynamic recommendation method based on user interest adaptive evolution
CN106250545A (en) A kind of multimedia recommendation method and system searching for content based on user
CN113918834B (en) Graph convolution collaborative filtering recommendation method fusing social relations
CN113918832A (en) Graph convolution collaborative filtering recommendation system based on social relationship
CN114693397A (en) Multi-view multi-modal commodity recommendation method based on attention neural network
CN113918833A (en) Product recommendation method realized through graph convolution collaborative filtering of social network relationship
CN114564594A (en) Knowledge graph user preference entity recall method based on double-tower model
CN111523040A (en) Social contact recommendation method based on heterogeneous information network
CN115982467A (en) Multi-interest recommendation method and device for depolarized user and storage medium
CN116662564A (en) Service recommendation method based on depth matrix decomposition and knowledge graph
CN110909785B (en) Multitask Triplet loss function learning method based on semantic hierarchy
CN110083766B (en) Query recommendation method and device based on meta-path guiding embedding
Zhu et al. Multimodal sparse linear integration for content-based item recommendation
Irfan et al. Optimization of information retrieval using evolutionary computation: A survey
CN117370674B (en) Multitask recommendation algorithm integrating user behaviors and knowledge patterns
CN117892815A (en) Graph comparison recommendation method based on knowledge graph
CN117056609A (en) Session recommendation method based on multi-layer aggregation enhanced contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination