WO2022204845A1 - 实体热度生成方法、装置、存储介质及电子设备 - Google Patents
实体热度生成方法、装置、存储介质及电子设备 Download PDFInfo
- Publication number
- WO2022204845A1 WO2022204845A1 PCT/CN2021/083497 CN2021083497W WO2022204845A1 WO 2022204845 A1 WO2022204845 A1 WO 2022204845A1 CN 2021083497 W CN2021083497 W CN 2021083497W WO 2022204845 A1 WO2022204845 A1 WO 2022204845A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- entity
- popularity
- label
- search
- heat
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 116
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000012937 correction Methods 0.000 claims description 63
- 230000008569 process Effects 0.000 claims description 52
- 238000012545 processing Methods 0.000 claims description 36
- 235000019633 pungent taste Nutrition 0.000 claims description 28
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 12
- 238000011002 quantification Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 230000020169 heat generation Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 4
- 241000220225 Malus Species 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 235000021016 apples Nutrition 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 1
- 102100033587 DNA topoisomerase 2-alpha Human genes 0.000 description 1
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 1
- 101000801505 Homo sapiens DNA topoisomerase 2-alpha Proteins 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 239000013585 weight reducing agent Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Definitions
- the present application relates to the field of computer technology, and in particular, to a method, apparatus, storage medium, and electronic device for generating physical heat.
- Embodiments of the present application provide a method, device, storage medium, and electronic device for generating physical heat, and the technical solutions are as follows:
- an embodiment of the present application provides a method for generating entity heat, the method comprising:
- the entity popularity for each of the entities is determined.
- an embodiment of the present application provides an apparatus for generating physical heat, the apparatus comprising:
- a retrieval statement determination module configured to obtain entity feature data for at least one entity, and determine at least one retrieval statement corresponding to each of the entities based on the entity feature data;
- a search heat determination module configured to obtain an entity search log, and determine the initial search heat corresponding to each of the retrieval sentences in the entity search log;
- the entity popularity determination module is configured to determine the entity popularity for each of the entities based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities.
- an embodiment of the present application provides a computer storage medium, where the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the above method steps.
- an embodiment of the present application provides an electronic device, which may include: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute the above method steps .
- the electronic device obtains entity feature data for at least one entity, determines at least one retrieval statement corresponding to each of the entities based on the entity feature data, and then obtains an entity search log, Determine the initial search popularity corresponding to each of the retrieval sentences in the entity search log, and then determine, based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities, for each The entity popularity of each said entity; by combining the resource data (such as entity feature data) of the resource side and the entity search log of the search terminal (such as a search engine), based on the search popularity of the user's retrieval sentence in the search scenario, it can be scientifically and effectively It can accurately determine the entity popularity of entities, and quantify the entity popularity of different categories, which greatly improves the accuracy of entity popularity generation; and can be applied to the cold start stage of category entities, when entity resources are insufficient and user data is small, etc. It can also generate high-accuracy entity heat in certain
- FIG. 1 is a schematic flowchart of a method for generating entity heat according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of another entity heat generation method provided by an embodiment of the present application.
- FIG. 3 is a schematic flowchart of another entity heat generation method provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a physical heat generation device provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a retrieval sentence determination module provided by an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a retrieval sentence generation unit provided by an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a search popularity determination module provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a heat determination unit provided by an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of an entity heat determination module provided by an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- a plurality means two or more.
- “And/or”, which describes the association relationship of the associated objects means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
- the character “/” generally indicates that the associated objects are an "or" relationship.
- a method for generating physical heat is proposed, which can be implemented by relying on a computer program, and can run on a device for generating physical heat based on the von Neumann system.
- the computer program can be integrated into an application or run as a stand-alone utility application.
- the physical heat generating device may be an electronic device, including but not limited to: a personal computer, a tablet computer, a handheld device, a vehicle-mounted device, a server, a computing device, or other processing devices connected to a wireless modem.
- Terminal equipment may be called by different names in different networks, for example: user equipment, access terminal, subscriber unit, subscriber station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication Equipment, User Agent or User Equipment, Cellular Phone, Cordless Phone, Equipment in 5G Network or Future Evolved Network, etc.
- the entity heat generation method includes:
- S101 Acquire entity feature data for at least one entity, and determine at least one retrieval sentence corresponding to each entity based on the entity feature data.
- the entity can refer to something that can exist independently and is the basis of all properties and the origin of all things, usually a word, a phrase, etc. that can be clearly identified from a group of items with similar properties, in some implementations.
- the entity can be understood as an independent individual, such as an independent commodity, an independent song, an independent TV series, and so on.
- the entity feature data can be understood as the features of the specific description of the entity, such as external size, color, density, shape characteristics, etc.
- the entity feature data can be corresponding to entity words and entities. Data composed of features such as tag words, which can be understood as attributes corresponding to entities.
- an entity may refer to various entities that can calculate heat, and an entity may be a thing in reality, or a concept or the like.
- a company is an entity
- a term is also an entity, etc. such as a store, a song, a movie, an electronic coupon, or other types of entities.
- the retrieval sentence usually refers to a sentence that can be entered in the search box for the purpose of retrieval. For example, inputting "I want to buy a folding TV" in the search box, "I want to buy a folding TV" is the retrieval sentence.
- a retrieval sentence is often called a query in the search field.
- the entity feature data acquired by the electronic device can be derived from the resource terminal (cp), such as entity feature data provided by the resource terminal containing multiple entities.
- the electronic device can extract data from each resource site for all of the Or the entity feature data corresponding to some entities, the resource site can be a resource website, a resource database; the aforementioned resource terminal can be a site that provides resource data in some specific fields (that is, entity feature data corresponding to at least one entity) , these resource terminals provide all in-depth information or related services in this field.
- the resource terminal "Douban Movies" provides resource data such as film and television information and user reviews;
- the resource site "Reading" provides resource data such as book details, book lists, book reviews, etc. This application is not limited to this.
- the entity heat generation method involved in the present application can be applied to the entity heat cold start scenario.
- the entity operation or online time is usually short (for example, less than a time threshold), and the number of physical visitors is relatively high. Less, click-through rate is not high.
- entity feature data is usually structured data.
- the process of entity feature data is to extract entity words containing entity corresponding entities and entity-related label fields from entity feature data.
- entity semantic recognition is performed on the cleaned entity feature data to identify the features of the content of the entity-related word field. information, so that entity words related to at least one entity and tag words related to entity attributes can be extracted from the entity feature data.
- a retrieval sentence corresponding to at least one entity is generated based on the combination of entity words and/or tag words corresponding to the entities. If an entity corresponds to an entity word and a label word, the retrieval sentence generation strategy can generate two retrieval sentences of "entity word + label word” and "label word + entity word”, and so on.
- the specific retrieval sentence generation strategy may be determined according to the actual application environment, and is not specifically limited here.
- S102 Acquire an entity search log, and determine the initial search popularity corresponding to each retrieval statement in the entity search log.
- search engines can provide a search interface for the user at the front end of the search, and receive the query key statement for the entity input by the user, and then the search engine matches the search result containing the query key statement in the webpage or network service according to the query key statement,
- search data backend records of the search query process of a large number of users within a period of time can be performed, thereby generating entity search logs for multiple entities; further, the entity search logs are stored in this
- the application can reflect the search popularity of the aforementioned retrieval sentences, that is, the number of unique visitors (uv).
- each entity search log can be obtained from the entity search log.
- the search popularity of the retrieval sentence is used as the initial search popularity (uv) corresponding to each retrieval sentence.
- S103 Determine the entity popularity for each of the entities based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities.
- the representation of an entity can usually be based on the entity word + label word, the entity word is the name of the entity, the label word can be understood as the attribute of the entity, and for the attributes of the entity, usually different categories of attributes It can be divided into strong correlation or weak correlation with the entity.
- the type of tag word can be divided. For example, the tag for an entity is divided into a first tag and a second tag, where the first tag The tag association degree with the entity is smaller than the tag association degree between the second tag and the entity.
- the first tag can be an optional tag for an entity
- the second tag can be a mandatory tag for an entity, for example, for a certain entity: "Zuofu", a TV drama is a mandatory tag word, and version information, such as a high-definition version.
- the entity determined based on entity feature data may only correspond to entity words; for entity B, the entity determined based on entity feature data may correspond to entity words and first label words (for example, a label can be selected); for entity C, the entity determined based on entity feature data may correspond to an entity word and a second label word (such as a required label); for entity D, the entity determined based on entity feature data An entity may correspond to an entity word, a first label word, and a second label word (such as a required label), etc.
- entities of different categories can be divided based on entity characteristics determined by the entity, that is, entity categories.
- the totality of "the initial search popularity corresponding to each retrieval sentence" may be used as a kind of search popularity feedback device, and the search popularity feedback device can be used to The popularity feedback value of the target sentence is determined based on the target sentence to be retrieved.
- a search containing only entity words can be determined based on each of the entities, which can be understood as single-entity word popularity (uv-e) ; Combined with the popularity of the entity category sentences corresponding to each entity, that is, the search popularity of the entity category sentences, to measure the comprehensive popularity, that is, from the single entity word search dimension of the entity and the comprehensive entity search dimension of the entity category.
- the entity heat corresponding to a certain entity category can be determined, that is, the comprehensive heat.
- the comprehensive heat corresponding to the entity can be understood as the entity category corresponding to the entity, and can also be understood as the vertical search category. Entity popularity of vertical entities.
- each entity is in the entity cold start scenario, it does not depend on the entity feature data provided by the resource party and the user feedback data based on the entity, such as entity word exposure, click volume, comment volume, etc., but based on the entity corresponding Internal feature data (such as entity words, tag words representing attributes, etc.) are comprehensively measured to determine the entity popularity of an entity, so that in a cold start scenario such as a cold start scenario, it can be avoided that the related technology needs to rely on the entity feature data of multiple resources.
- the heat generation results in a large deviation of the data impact, and it is impossible to align the heat values of entities with the same name under different categories.
- the entity popularity of the entity is only defined in a general sense, and for the determination of the entity popularity in detail, reference may be made to other embodiments involved in this application.
- the entity popularity for each entity when determining the entity popularity for each entity, for the entity categories that have already been launched, it can be based on the user's real-time search click behavior, the number of visitors, and the comprehensive corresponding to feedback information such as comment information.
- Search for information correct the entity popularity corresponding to the corresponding entity category, and obtain entity word popularity information that is more suitable for products.
- comprehensive search information for the entity can be obtained, and hot search features in the comprehensive search information can be extracted.
- Hot search features include but are not limited to features such as click volume, search times, visitor volume, exposure, cold start popularity, etc. , input the hotspot search feature into the pre-trained hotness update model, and output the hotness reference amount for the entity; and perform hotness update processing on the entity hotness of the entity based on the hotness reference amount.
- the heat update model may be a neural network model
- the neural network model may be based on a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network model Fitting of one or more models such as Recurrent Neural Networks (RNN), models, embedding models, Gradient Boosting Decision Tree (GBDT) models, and Logistic Regression (LR) models realized.
- the initial heat update model can be created first, the sample heat search features can be extracted by obtaining a large amount of sample data, and input to the initial heat update model for training. During the training process, the actual output value and expected output of the neural network model are calculated. The expected error of the value, the parameters of the neural network model are adjusted based on the expected error, and after the training is completed, a heat update model is generated.
- the electronic device obtains entity feature data for at least one entity, determines at least one retrieval statement corresponding to each of the entities based on the entity feature data, and then obtains an entity search log.
- the initial search popularity corresponding to each of the retrieval sentences is determined in the search log, and then based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities, a determination is made for each of the entities.
- the resource data such as entity feature data
- the search terminal such as search engine
- Entity popularity and quantifies the entity popularity of different categories, greatly improving the accuracy of entity popularity generation; and, it can be applied to the cold start stage of category entities, and can also be used in the case of insufficient entity resources and small user data, etc. Generate high-accuracy entity heat, and the robustness of the entity heat generation method is excellent.
- FIG. 2 is a schematic flowchart of another embodiment of a method for generating entity heat according to the present application. specific:
- S201 Obtain entity feature data for at least one entity, and determine field content information for each of the entities based on the entity feature data;
- the entity feature data can be understood as the features of the specific description of the entity, and the electronic device can extract the entity feature data corresponding to all or some entities in the resource site from each resource site, and the resource site can be a resource website.
- resource database the aforementioned resource terminal can be a site that provides resource data in some specific fields (that is, entity feature data corresponding to at least one entity), and these resource terminals provide all in-depth information about this field or related services .
- the resource terminal "Douban Movies” as a provider of movie resources, provides resource data such as film and television information and user reviews
- the resource site "Reading" provides resource data such as book details, book lists, book reviews, etc. This application is not limited to this.
- entity feature data is usually structured data.
- the entity feature data provided by the resource side usually contains irrelevant information such as vertical lines and dashes, and it is also necessary to extract entities containing entities from the entity feature data.
- the corresponding entity word and the label field associated with the entity In this application, by cleaning the acquired field content containing entity words and related tag words (such as constructing a regular expression to filter out irrelevant information, constructing a regular expression to extract useful field content), remove the entity irrelevant Then, the entity semantic recognition is performed on the cleaned entity feature data to identify the feature information of the content of the entity-related word segment, so that the entity words related to at least one entity and the entity words related to the entity can be extracted from the entity feature data. Attribute-related tag words; "at least one entity-related entity word and entity attribute-related tag words" is also the field content information in this application.
- the electronic device further acquires the entity field feature of the field content information, and generates at least one retrieval statement for each entity according to the entity field feature. For details, refer to the subsequent steps of this embodiment.
- S202 Acquire an entity field feature corresponding to each entity in the field content information, and determine a field type corresponding to the entity field feature, where the entity field feature includes an entity word corresponding to the entity and a label associated with the entity word ;
- the electronic device determines the field content information for each entity based on the entity feature data, by identifying the field content information, it mainly determines the entity words and tag words corresponding to each entity from the field content information, based on The difference between the "entity words and label words" corresponding to the entities, the entity field features also represent the features corresponding to the entity words and label words corresponding to the entities.
- the entity determined by entity feature data may only correspond to entity words; for entity field features of entity B, the entity determined based on entity feature data may correspond to entity words and first label words (such as optional labels); For the entity field feature of entity C, the entity determined based on entity feature data may correspond to an entity word and a second label word (such as a required label); for entity field features of entity D, the entity determined based on entity feature data An entity may correspond to an entity word, a first label word, and a second label word (such as a required label), etc.
- entities of different categories can be divided based on entity characteristics determined by the entity, that is, entity categories.
- the field types may be divided into a first field type, a second field type, a third field type, and a fourth field type;
- the first field type is the corresponding field type when the entity field feature contains only the entity word
- the second field type is the field type when the entity field feature includes the entity word and the first label associated with the entity
- the third field type is the field type when the entity field feature includes the entity word and the second label associated with the entity;
- the fourth field type is the field type when the entity field feature includes the first label and the second label associated with the entity word and the entity;
- the degree of association of the note between the first tag and the entity is smaller than the degree of association of the note between the second tag and the entity.
- S203 Based on the field type corresponding to each of the entity field features, generate at least one retrieval statement for each of the entities corresponding to the field type.
- determining the retrieval sentence for an entity is to generate a plurality of retrieval sentences based on the entity field characteristics corresponding to the entity, which can measure the popularity of an entity in multiple dimensions.
- the retrieval sentence generation strategy can only use the entity word as the retrieval sentence query corresponding to the entity;
- the retrieval sentence generation strategy can be: generate two retrieval sentences of "entity word + first label word” and "first label word + entity word”; in some
- first label word such as taking the entity: x song as an example
- the entity word is the song name
- the first label is the version
- Multiple versions such as A version, B version, C version.
- multiple retrieval sentences can be produced, such as "A version + x song”, “C version + x song”, “B version + x song”, “x song + A version”, “x song”. +B version”, “x song +C version”, a total of 6 search sentences query.
- the retrieval sentence generation strategy may be: generate two retrieval sentences of "entity word + second label word” and "second label word + entity word”; in some implementations For example, when there are multiple second label words under the second label, such as entity: x song as an example, the entity word is the song name, and the second label is the singer, then there will be multiple versions under the first label - version. Versions, such as A singer, B singer, C singer.
- multiple retrieval sentences can be generated, such as "Singer A+x song”, “Singer C+x song”, “Singer B+x song”, “Song x song+A singer”, “Song x song”. +B singer”, “x song +C singer”, a total of 6 search sentences query.
- the field type is the fourth field type, that is, the field type corresponding to when the entity field feature includes the entity word and the second tag and the first tag associated with the entity
- the actual application describes the resource-based entity Feature data
- the entity contains the entity word and also contains the second label (such as a required label) and the first label (such as a non-essential label)
- the retrieval sentence generation strategy can be: "entity word”, "first label”, "second label”
- the retrieval sentence query obtained by the arbitrary arrangement and combination of the three elements of "label” can be "entity word + second label + first note", “entity word + first label + second note”....
- search statement generation strategies corresponding to the above field types are only interpreted schematically. Those skilled in the art should understand that the search statement generation strategies can be customized based on the actual application environment. Good understanding is given by way of example, and does not limit the protection scope of the present application.
- the sentences whose length of the retrieval sentence is greater than the preset value it is also possible to filter the sentences whose length of the retrieval sentence is greater than the preset value.
- the query whose length is greater than 50 can be removed, and the entire query sentence can be regularized to remove spaces, symbols, and sizes.
- Write and other preset filter characters to avoid the subsequent failure to obtain entity popularity caused by these preset filter characters, and to ensure that the query generated by the same entity word is the same.
- S204 Perform a hot search on the at least one retrieval sentence of each of the entities in the entity search log, and determine the sentence search hotness corresponding to each of the retrieval sentences;
- the electronic device may use the totality of “the initial search popularity corresponding to each retrieval sentence” as a kind of search popularity feedback device, such as A heat query uv model system is constructed, and the search heat feedback device is used to determine the heat feedback value of the target sentence based on the target sentence to be retrieved, that is, input each retrieval sentence corresponding to each entity into the search heat feedback device, Output the sentence search popularity of each retrieval sentence corresponding to each entity, such as the number of clicks, the number of visitors, and so on.
- search popularity feedback device such as A heat query uv model system is constructed
- the search heat feedback device is used to determine the heat feedback value of the target sentence based on the target sentence to be retrieved, that is, input each retrieval sentence corresponding to each entity into the search heat feedback device, Output the sentence search popularity of each retrieval sentence corresponding to each entity, such as the number of clicks, the number of visitors, and so on.
- the electronic device may directly use the sentence search popularity as the initial search popularity corresponding to the retrieval sentence.
- the electronic device may aggregate the popularity of a plurality of retrieval sentences that belong to the same entity semantics. For example, in practical applications, there may be retrieval sentences that correspond to different language versions and belong to the same entity semantics. , such as the retrieval sentence of the same entity semantics of the English version, the retrieval sentence of the same entity semantics of the Chinese version, the retrieval sentence of the same entity semantics of the Japanese version, and so on.
- the electronic device can determine at least one target retrieval sentence belonging to the same entity semantics from each of the retrieval sentences; the determination method can be to establish a semantic recognition model in advance, input each retrieval sentence into the semantic recognition model, and output the same semantic recognition model.
- Entity semantic target retrieval statement
- the electronic device then aggregates the first search popularity corresponding to each of the target retrieval sentences to obtain a second search popularity; wherein the first search popularity is the search popularity corresponding to the target retrieval sentence.
- the electronic device may add up the first search popularity corresponding to all target retrieval sentences to obtain the second search popularity. Therefore, the second search popularity can be used as the initial search popularity of each target retrieval sentence.
- S206 Determine the entity popularity for each of the entities based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities.
- entity hot words can be provided for the target category to recommend in combination with the scene requirements.
- the entity hot word recommendation function can first determine the entity hotness of each entity under the target category.
- the entity popularity generation method described above is used, and the entity popularity of each entity is sorted to give the high-temperature entity word data of each dimension, and then the high-temperature entities of different dimensions are leveled according to the number of searches of the query under this category. In this way, the entity word can be guaranteed to be a hot word in the whole network, and the comparability of each entity word in this category can also be guaranteed.
- the electronic device obtains entity feature data for at least one entity, determines at least one retrieval statement corresponding to each of the entities based on the entity feature data, and then obtains an entity search log.
- the initial search popularity corresponding to each of the retrieval sentences is determined in the search log, and then based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities, a determination is made for each of the entities.
- the resource data such as entity feature data
- the search terminal such as search engine
- Entity popularity and quantifies the entity popularity of different categories, greatly improving the accuracy of entity popularity generation; and, it can be applied to the cold start stage of category entities, and can also be used in the case of insufficient entity resources and small user data, etc. Generate high-accuracy entity heat, and the robustness of the entity heat generation method is excellent.
- FIG. 3 is a schematic flowchart of another embodiment of a method for generating entity heat according to the present application. specific:
- S301 Obtain entity feature data for at least one entity, and determine at least one retrieval statement corresponding to each of the entities based on the entity feature data;
- S302 Obtain an entity search log, and determine the initial search popularity corresponding to each retrieval statement in the entity search log.
- S303 Based on the initial search popularity corresponding to each of the retrieval sentences, determine the first search popularity of the vertical intent entity corresponding to each of the entities and the second search popularity of each of the entities.
- the vertical intent entity can be understood as a combination of entities and (search) categories.
- the vertical intent entity can be understood as providing targeted services for a specific field, a specific group of people or a specific need.
- the search volume of the vertical intent entity corresponding to an entity can be understood as the search popularity corresponding to the entity in a certain vertical field. It can be "television and movies", “novels", “pictures”, etc.
- the aforementioned "televisions", “pictures” and “novels” can be understood as a vertical field.
- vertical intent entity it can be composed of entity word + (search) category, and the (search) category is usually strongly related to the search intent in the actual search scenario of the user.
- the entity searches in the vertical category Heat is included as a reference for heat generation.
- the (search) category in the vertical intent entity can be determined based on the entity’s classification of subject types (eg, film and television, pictures, novels, news); in some embodiments, it can be determined based on the electronic device
- the vertical search categories provided by the terminal search engine are determined, such as video vertical search, picture vertical search and so on.
- the vertical search category can also be customized based on the entity word itself corresponding to the entity using expert intervention, and the expert can set the vertical search category.
- the search category is the movie category.
- the first search popularity of the vertical intent entity corresponding to the entity may be marked as the vertical intent search popularity uv-i.
- “Second search popularity of the entity” can be understood as the search popularity of the retrieval sentence composed of the single entity word corresponding to the entity.
- the entity corresponds to the entity word, usually the entity name; then only the retrieval sentence corresponding to the entity word
- the search popularity is the second search popularity in this embodiment, which may be marked as single-entity search popularity uv-e in some embodiments.
- the electronic device may use the totality of "the initial search popularity corresponding to each retrieval sentence" as a kind of search popularity feedback device, such as constructing a The heat query uv model system, the search heat feedback device is used to determine the heat feedback value of the search object based on the target sentence to be retrieved, then the electronic device can be based on the vertical intent entity corresponding to each entity in the search heat feedback device.
- the electronic device can sequentially determine the search popularity of the entities in the search popularity feedback device based on the retrieval sentence formed by the single-entity word corresponding to each of the entities, that is, the second search popularity. Search popularity.
- S304 Perform a heat correction process on the first search heat of the entity based on the second search heat of the vertical intent entity corresponding to the entity, to obtain a corrected third search heat for the entity.
- the correction processing based on this hotness is to highlight the The main search is for vertical intentions, and the search for non-ideal intentions is suppressed.
- the electronic device can first perform a heat judgment based on the second search heat of the vertical intent entity and the set target search heat, and determine whether the heat of the vertical intent entity is too low, and the target search heat is pre-targeted for the vertical
- the hotness threshold of the intent entity can be a custom hotness value. If the hotness is 0, it means that the vertical intent entity is too low, and the user needs to search for the corresponding entity word based on the entity only.
- the search volume is suppressed, that is, the number of single-entity searches is suppressed. details as follows:
- the electronic device can conduct the first search popularity of the entity. Suppression;
- the electronic device may preset a heat correction threshold, and the heat correction threshold is used to perform heat correction processing on the first search heat of the entity.
- the electronic device takes the product of the first search popularity and the popularity correction threshold as the corrected third search popularity for the entity.
- the target search popularity is c
- the first search popularity of an entity (such as a single-entity search popularity)
- the heat correction threshold is denoted as a
- the third search popularity of the entity after the heat correction processing is denoted.
- the heat treatment process can be expressed by the following formula:
- the electronic device makes a further decision, if the ratio of the first search popularity to the second search popularity of the vertical intent entity corresponding to the entity is greater than the intention ratio threshold, the second search popularity will be used as the revised target.
- the third search popularity of the entity if the ratio of the first search popularity to the second search popularity of the vertical intent entity corresponding to the entity is greater than the intention ratio threshold, the second search popularity will be used as the revised target. The third search popularity of the entity.
- the intent proportion threshold is used to measure and quantify the intent proportion of the vertical intent entity corresponding to the first search popularity and the entity. Based on the intent proportion threshold, the main search intent for the entity in the search process is further highlighted. Based on this The heat correction process is to highlight the vertical intention of the main search and suppress the search of non-ideal intentions.
- the above heat treatment process can be expressed by the following formula:
- uv-E is the third search heat
- uv-i is the second search heat of the vertical intent entity
- uv-e is the first search heat of the entity
- b is the intent ratio threshold
- S305 Determine the entity popularity for each of the entities in the initial search popularity corresponding to each of the retrieval sentences by adopting the target popularity calculation strategy corresponding to each of the entity categories.
- the present application combines the popularity of the entity category statement corresponding to each entity, that is, the search popularity of the entity category statement to measure the comprehensive popularity, that is, the single entity word search dimension of the entity and the integrated entity category of the entity category are measured.
- the search dimension is processed by heat aggregation, so that the entity heat used to accurately characterize an entity can be determined, that is, the comprehensive heat.
- different heat calculation strategies are set for the characteristics of different physical categories, and different calculation methods are used to realize the heat calculation of entities of different physical categories.
- the division of entity categories is based on the characteristics of the entity.
- the representation of an entity can usually be based on entity words + tag words.
- the entity word is the name of the entity, and the tag word can be It is understood as the attribute of the entity, and for the attribute of the entity, usually different categories of attributes can be divided into strong correlation or weak correlation with the entity. In practical applications, the type of the label word can be divided.
- the tags are divided into a first tag and a second tag, wherein the degree of association between the tags of the first tag and the entity is smaller than the degree of association between the tags of the second tag and the entity. Based on this, entities are divided according to the two dimensions of entity words and tag words corresponding to an entity, as follows:
- the entity category includes a first entity category composed of the entity word of the entity, a second entity category composed of the entity word and the entity's first label, and the entity word and the entity's first label.
- a third entity category composed of two tags and a fourth entity category composed of the entity word, the first tag and the second tag; wherein, the degree of association between the tags of the first tag and the entity is smaller than that of the first tag and the entity.
- the second label is associated with the label of the entity.
- the first tag can be an optional tag for an entity
- the second tag can be a mandatory tag for an entity, for example, for a certain entity: "Zuofu", a TV drama is a mandatory tag word, and version information, such as a high-definition version.
- the entity determined based on entity feature data may only correspond to entity words; for entity B, the entity determined based on entity feature data may correspond to entity words and first tags word (such as optional label); for entity C, the entity determined based on entity feature data may correspond to entity word and second label word (such as required label); for entity D, the entity determined based on entity feature data
- entity may correspond to an entity word, a first label word, and a second label word (such as a required label), etc.
- entities of different categories may be divided based on entity characteristics determined by the entity, that is, entity categories.
- the electronic device acquires at least one single-entity statement that belongs to the same entity semantics as the entity;
- the single-entity sentence is relative to the label of the entity. If the retrieval sentence only contains entity word components but does not contain label components, such a retrieval sentence can be called a single-entity sentence.
- entities with the same entity semantics may correspond to multiple versions of single-entity statements, such as entity: apple, then single-entity statements can be apples with Chinese semantics, apples with English semantics, etc., but in fact these Different versions of single-entity statements essentially correspond to entities with the same entity semantics.
- an entity semantic recognition model can be built to perform semantic recognition on each retrieval sentence corresponding to each entity, determine the entity semantics of each retrieval sentence, and cluster retrieval sentences belonging to the same entity semantics. This can obtain the single-entity statement corresponding to an entity belonging to the same entity semantics.
- the electronic device may use the totality of "the initial search popularity corresponding to each retrieval sentence" as a kind of search popularity feedback device, such as constructing a The heat query uv model system, the search heat feedback device is used to determine the heat feedback value of the target sentence based on the target sentence to be retrieved, that is, input each of the single-entity sentences into the search heat feedback device, and output each single-entity retrieval sentence
- the search popularity of the sentence that is, the popularity of a single entity, such as the number of clicks, the number of visitors, and so on.
- the normalization of the heat metric dimension is to perform heat clustering on the heatness of all single entities, and the heatness of each entity can be summed to obtain a total entity heatness, which is used as the heat value for the entity. Physical heat.
- the normalization process of the heat metric dimension may be: selecting the maximum entity heatness indicated by the maximum value from the heatness of each single entity, and dividing the heatness of each single entity by the maximum heatness of the entity to obtain the entity quotient value , multiply the entity quotient value by a heat dimension value, and the obtained product value is used as the entity heat of the entity corresponding to the single-entity statement, wherein the heat measure dimension value is set for the first type of entity, and is used to calculate the entity heat Unified to an order of magnitude to achieve unified dimensioning of the heat.
- the thermal dimension value can be determined in advance based on the actual application environment, which is not specifically limited here.
- the entity category is a second entity category composed of the entity word and the entity's first tag
- the initial search popularity corresponding to each of the retrieval sentences obtain the first tag corresponding to the entity.
- an entity of a second entity category is usually composed of an entity word and a first label word under the first label.
- the user can search only based on a single entity word as a search sentence. It can be based on the retrieval sentence composed of the entity word and the first tag word under the first tag; in this application, the retrieval scenario corresponding to the "first tag corresponding to the entity” can be understood as the latter: that is, the user based on the entity
- the first label popularity can be understood as the search for the search sentence with "entity word and the first label word under the first label" as the search sentence heat.
- the popularity of the first tag can be based on the “entity word and the first tag word under the first tag” as the object of the popularity search, to determine the search popularity of the user of such a retrieval sentence, and then the popularity of the first tag can be obtained. .
- "entity word and first tag word under the first tag” can be directly used as a popularity search sentence, and input into the search popularity feedback device to obtain the first tag popularity.
- the electronic device can further refine the heat calculation for the entities of the second category, that is, “in the initial search heat corresponding to each search sentence” compares the “entity word and the first label”
- the first tag word is used as the heat retrieval sentence to obtain the label search heat as the initial label heat, and then refer to the processing method determined by the third search heat.
- the influence of the heat is aligned and clustered, and unified to the corresponding magnitude, so as to achieve the effect of more accurate calculation of the heat of the entity.
- the electronic device obtains the initial label heat of the first label corresponding to the entity, and performs heat alignment processing on the initial label heat and the third search heat of the entity, and obtains the heat alignment process.
- Initial label popularity
- the heat alignment process refers to the heat correction feature of "determining the third search heat" with reference to the steps in the foregoing embodiments, and then performs the same or similar heat correction processing on the initial label heat, so as to obtain the initial label heat after heat alignment processing. .
- the aforementioned “determining the third search popularity” usually 1.
- the product of the first search popularity and the popularity correction threshold is used as the correction
- the subsequent third search popularity for the entity that is, when determining the third search popularity, the popularity correction object "first search popularity” will be multiplied by a "heat correction threshold”;
- aligning refer to the aforementioned processing method for "first search popularity”, where the "initial label popularity” is multiplied by the same "heat correction threshold” to obtain the initial label popularity after the heat alignment processing.
- the second search popularity will be used as the revised target for the entity.
- the third search popularity of the entity then here, the same processing is performed for the "initial label popularity", that is, the "second search popularity” is taken as the initial label popularity after the heat alignment processing.
- the electronic device may directly use the initial label heat as the first label heat.
- the electronic device may further perform thermal dimension normalization processing on the initial label heat to obtain the first label heat.
- the normalization of the heat metric dimension here is to perform heat clustering on all initial label heats of the entity, and the multiple label heats corresponding to each entity can be added to obtain a total entity heat. What it does for the entity is entity heat.
- a plurality of first label words may be included under the first label of the entity. For example, if the first label is "singer", the first label word may be a plurality of specific singers, such as singer a, singer b, singer c. , singer d and so on.
- a combination of "entity word + first label word” can correspond to one initial label popularity, and multiple combinations of “entity word + first label word” usually correspond to multiple initial label popularity corresponding to an entity, for example : entity: abc, the first label X, the first label X can correspond to X1, X2, X3; it can correspond to the initial label heat 1 of the search sentence corresponding to "entity word abc + first label word X1", "entity word abc + first label word X1”
- a label word X2" corresponds to the initial label heat 2 of the search sentence, and "entity word abc+first label word X3" corresponds to the initial label heat 3 of the search sentence;
- the normalization process of the heat metric may be: selecting the maximum entity label heat indicated by the maximum value from each initial label heat of the entity, dividing each initial label heat by the maximum entity label heat, and obtaining the entity label quotient; or, select the average N of the first n heat values from the initial label heat of the entity, and divide each initial label heat by N to obtain the entity label quotient;
- the thermal dimension value of the label can be determined in advance based on the actual application environment, which is not specifically limited here. In this way, after the thermal dimension is normalized, it can be used as the first label heat uv-nnt.
- the first label popularity and the third search popularity are respectively multiplied by a weight value, and then the two weight results are added together, and the summed result is used as the entity popularity for the entity;
- the entity popularity for the entity obtained in the aforementioned manner can be understood as: the search popularity for a specific entity category, for example, the entity word of a certain entity A is abc, the first label is X, and the first label of entity A is If the tag word is X1, then the search popularity of the entity category corresponding to "entity word abc + first tag word X1" is calculated according to the aforementioned method; for another example, the entity word of an entity A is abc, the first tag is X, and the entity The first label word of A is X2, then the search popularity of different entity label versions of the entity category corresponding to "entity word abc + first label word X2" is calculated according to the aforementioned method; in the search field involved in this application, entity The distinction is usually divided by entity words, but if they belong
- Strategy 3 When the entity category is a third entity category consisting of the entity word and the entity's second label, obtain the corresponding second label in the initial search popularity corresponding to each search sentence.
- the second label heat of the second label, the second label heat is subjected to thermal dimension normalization processing to obtain the reference second label heat;
- an entity of a third entity category is usually composed of an entity word and a second label word under the second label.
- the user can search only based on a single entity word as a search sentence. It can be based on the retrieval sentence formed by the entity word and the second label word under the second label; in this application, the retrieval scenario corresponding to the "second label corresponding to the entity" can be understood as the latter: that is, the user based on the entity
- the search popularity of the second label can be understood as the “entity word and the second label word under the second label” as the corresponding search sentence. Search popularity.
- the search popularity of the second tag can be based on the "entity word and the second tag word under the second tag" as the object of popularity search, to determine the search popularity of the user of this retrieval sentence, and then the second tag can be obtained. Search popularity.
- "entity word and the second tag word under the second tag” can be directly used as a popularity search sentence, and input into the search popularity feedback device to obtain the search popularity of the second tag.
- the electronic device can further refine the heat calculation for entities of the third category, that is, “in the initial search heat corresponding to each of the search sentences” compares the “entity word and the second label”
- the second label word "as the heat retrieval sentence to obtain the label search heat as the initial heat result, and then to obtain the label correction weight for the second label heat, and perform parameter correction, wherein the label correction weight is based on the entity's second label.
- the search popularity corresponding to different second tag words is determined.
- the electronic device may obtain a label correction weight for the second label popularity, perform a heat correction process on the third search popularity of the entity based on the label correction weight, and obtain a reference third search after the heat correction process. heat.
- a plurality of second label words may be included under the second label of the entity.
- the first label word may be a plurality of versions of different styles, such as version a, version b, Version c, version d, etc.
- a combination of "entity word + second label word” can correspond to a second label popularity, and a combination of multiple “entity words + second label word” usually corresponds to entities corresponding to different label versions of an entity.
- Multiple second label popularity for example: entity: abc, second label Y, the second label Y can correspond to Y1, Y2, Y3; it can correspond to "entity word abc + second label word Y1" corresponding to the second search sentence
- the label heat 1 is the label heat of a label version of the entity
- "entity word abc + the second label word Y2" corresponds to the second label heat 2 of the search sentence and the label heat of another label version of the entity
- "entity word abc+second tag word Y3" corresponds to the second tag popularity 3 of the search sentence and the entity's tag popularity of another tag version of the entity;
- the label correction weight is specifically based on the difference between "the label heat of the entity corresponding to the current label version of the entity" (that is, the current second label heat uv-nt of the entity) and the total label heat of all second label version entities of the entity.
- the value is determined, and the label correction weight v can be calculated by the following formula:
- V uv-nt/ ⁇ uv-nt i
- uv-nt is the current second label popularity corresponding to the entity category of the entity
- ⁇ uv-nt i is the addition of the search popularity of each second label version entity (search sentence query) corresponding to the entity (query) and.
- the electronic device can then perform a heat correction process on the entity's third search popularity uv-e based on the label correction weight v, to obtain: The reference third search heat after heat correction processing;
- the actual feedback of the label correction weight is the proportion of popularity between the popularity of the current second label version of the entity and the popularity of all second label versions of the entity, which is quantified based on the difference in the proportion of the second label of the entity.
- the degree of influence of the current second tag version on the entire entity search therefore, based on the tag correction weight multiplied by the entity's third search popularity uv-e, the revised reference third search popularity can be obtained.
- the reference third search heat uv-E can be calculated by the following formula:
- the electronic device further performs a heat weighting process on the reference second label heat and the reference third search heat, so as to obtain the first version of the entity heat for the entity.
- the reference second label popularity and the reference third search popularity are respectively multiplied by a weight value, and then the two weight results are added up, and the summed result is used as the first version of the entity popularity for the entity;
- the entity popularity for the entity obtained in the aforementioned manner can be understood as: the search popularity for a specific entity category, for example, the entity word of an entity B is abcd, the second label is Q, and the second label of entity A is The tag word is Q1, then the search popularity of the entity category corresponding to "entity word abc + second tag word Q1" is calculated according to the aforementioned method; for another example, the entity word of an entity B is abcd, the second tag is Q, and the entity The second label word of A is Q2, then what is calculated according to the aforementioned method is the initial search popularity of different entity label versions of the entity category corresponding to "entity word abcd + second label word Q2"; in the search field involved in this application,
- the distinction of entities is usually divided by entity words, but if they belong
- each entity is in an entity cold start scenario
- due to differences in user search intentions for different versions of the same entity in the second tag there will be an unpopular version of the second tag.
- There will be a second label of the popular version so the value of the physical popularity of the first version of the popular version may be as small as 0, and because the user's search popularity is not high in the entity cold start scenario, the second label of the popular version corresponds to the first version.
- the difference between the entity's popularity and the first-version entity's popularity corresponding to the second label of the unpopular version is small. In this application, it can be further distinguished and a popularity threshold is set.
- the original entity's popularity meets the popularity threshold or is less than the popularity threshold
- a thermal quantification factor a can be added to distinguish the two situations, that is, the thermal quantification factor a can be added to the value of the initial entity heat as The first version of the physical heat.
- the first version entity popularity when determining the first version entity popularity of the label version entity corresponding to the entity, the first version entity popularity can be directly used as the entity popularity of the aforementioned entity.
- the entity search heat of all entities with different second label values under the second label can be added to obtain the comprehensive label heat.
- the second label is For a singer, the search popularity of all the songs under the "Singer" Jiahe obtains the overall popularity of the label, and after normalizing the overall popularity of the label, it is added to the physical popularity of the first version to obtain the physical popularity of the final version. That is, the electronic device in the present application determines the entity popularity for the entity based on the entity popularity in the first version and the overall label popularity by acquiring the overall popularity of the label corresponding to the entity and the second label.
- Strategy 4 When the entity category is the fourth entity category consisting of the entity word and the first tag and the second tag, 1. Determine the popularity of the first tag for the first tag corresponding to the entity and For the second label popularity of the second label corresponding to the entity, obtain a label correction weight for the second label popularity, and perform a heat correction process on the third search popularity of the entity based on the label correction weight to obtain the popularity The third search popularity after the correction processing is performed.
- an entity of the fourth entity category is usually composed of entity words, the first label words under the first label, and the second label words under the second label.
- the retrieval is performed based on a single entity word as a retrieval sentence, which may be retrieval based on a retrieval sentence formed by fitting any one or more of the entity word, the first label word and the second label word;
- the electronic device obtains entities of the same version that belong to at least one version type of the same entity semantics as the entity word and the third search popularity corresponding to each entity of the same version, based on the third search popularity corresponding to the entities of the same version
- the numerical ordering of determines the target heat difference
- the aforementioned "same entity semantics" can be understood as the entity word of the entity is the same, because the entity often corresponds to the label, such as the first label, the second label, the label value under the label is different, the entities with different label words and the same entity word can be divided into For different entities of the same version, the entity words between the two entities of the same version are the same, and the label words under the label are different;
- the electronic device is sorted according to the popularity of the third search popularity corresponding to the entities of the same version, and the difference between two different popularity in the sequence can be obtained as the target popularity difference.
- the third search popularity of the first TOP1 can be selected.
- the difference between the third search popularity and the ranking second TOP2 is used as the target popularity difference.
- the target heat difference, the first label heat, and a preset heat quantification factor obtain the first version of the entity heat for the entity
- the electronic device updates the first label heat to the target heat difference; otherwise, no processing is performed.
- the function of comparing the first label popularity with the target popularity difference is to de-weight the label popularity, because when the first label popularity is greater than the target popularity difference, the popularity ranking is usually lower.
- the subsequent determination of the same version of the entity corresponding to the label popularity of will be greater than the popularity of the front same version entity. In order to avoid this situation, it is necessary to perform a weight reduction process, that is, to update the first label popularity to the target Heat difference.
- the electronic device can then perform a heat weighting process on the first tag heat and the third search heat to obtain a reference heat;
- the first label heat when the electronic device obtains the reference heat, the first label heat may first be subjected to a uniform dimension normalization process to obtain the processed first label heat uv-nnt, and then the first label heat The label heat uv-nnt, the third search heat uv_e and the second label search heat uv_nt are weighted and summed to obtain a heat comprehensive value, that is, the reference heat;
- the electronic device finally adds the reference heat and the heat quantification factor to obtain the initial search heat for the entity.
- the aforementioned heat quantification factor a may also be added to the value of the reference heat, so as to further distinguish the heat from the cold version and the hot version.
- its reference popularity is used as the initial search popularity.
- the second tag in order to more accurately refine the entity popularity and improve the accuracy of the entity popularity calculation, it can be used for If there is a category that must tag query, you can add up the initial entity heats of all entities of the same version with different second label values under the second label (such as the required label) to get the overall label heat, and normalize the overall label heat. Add the physical heat of the first version to get the physical heat of the final version. That is, the electronic device in the present application determines the entity popularity for the entity based on the entity popularity in the first version and the overall label popularity by acquiring the overall popularity of the label corresponding to the entity and the second label.
- the electronic device obtains entity feature data for at least one entity, determines at least one retrieval statement corresponding to each of the entities based on the entity feature data, and then obtains an entity search log.
- the initial search popularity corresponding to each of the retrieval sentences is determined in the search log, and then based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities, a determination is made for each of the entities.
- the resource data such as entity feature data
- the search terminal such as search engine
- Entity popularity and quantifies the entity popularity of different categories, greatly improving the accuracy of entity popularity generation; and, it can be applied to the cold start stage of category entities, and can also be used in the case of insufficient entity resources and small user data, etc. Generate high-accuracy entity heat, and the robustness of the entity heat generation method is excellent.
- FIG. 4 shows a schematic structural diagram of an apparatus for generating physical heat according to an exemplary embodiment of the present application.
- the entity heat generating apparatus may be implemented as all or a part of the terminal through software, hardware or a combination of the two.
- the apparatus 1 includes a retrieval sentence determination module 11 , a search popularity determination module 12 and an entity popularity determination module 13 .
- Retrieval statement determination module 11 for obtaining the entity feature data for at least one entity, and determining at least one retrieval statement corresponding to each of the entities based on the entity feature data;
- a search popularity determination module 12 configured to obtain an entity search log, and determine the initial search popularity corresponding to each of the retrieval sentences in the entity search log;
- the entity popularity determination module 13 is configured to determine the entity popularity for each of the entities based on the initial search popularity corresponding to each of the retrieval sentences, each of the entities, and the entity category corresponding to each of the entities.
- the retrieval sentence determination module 11 includes:
- a field content determination unit 111 configured to acquire entity feature data for at least one entity from the entity resource end, and determine field content information for each of the entities based on the entity feature data;
- the retrieval sentence generating unit 112 is configured to acquire entity field features of the field content information, and generate at least one retrieval sentence for each entity according to the entity field features.
- the retrieval sentence generating unit 112 includes:
- the field type acquisition subunit 1121 is used to acquire the entity field feature corresponding to each entity in the field content information, and determine the field type corresponding to the entity field feature, where the entity field feature includes the entity word and the entity corresponding to the entity. the label associated with the entity word;
- the retrieval sentence determination subunit 1122 is configured to generate at least one retrieval sentence for each entity corresponding to the field type based on the field type corresponding to the field feature of each entity.
- the field type includes a first field type, a second field type and a third field type
- the first field type is the corresponding field type when the entity field feature contains only the entity word;
- the second field type is the entity field feature containing the entity word and the first associated entity word.
- the second field type when labeling;
- the third field type is the third field type when the entity field feature includes the entity word and the second label associated with the entity word;
- the fourth field type is the fourth field type when the entity field feature includes the entity word and the first label associated with the entity word and the second label;
- the note association degree between the first tag and the entity word is smaller than the note association degree between the second tag and the entity word.
- the search popularity determination module 12 includes:
- a hotness search unit 121 configured to perform a hotness search on the at least one retrieval statement of each of the entities in the entity search log, and determine the statement search hotness corresponding to each of the retrieval statements;
- the hotness determining unit 122 is configured to use the sentence search hotness as the initial search hotness corresponding to the retrieval sentence.
- the heat determination unit 122 includes:
- a sentence determination subunit 1221 configured to determine at least one target retrieval sentence belonging to the same entity semantics from each of the retrieval sentences
- the heat aggregation subunit 1222 is configured to perform aggregation processing on the first search heat corresponding to each target retrieval sentence to obtain the second search heat;
- the popularity determination subunit 1223 is configured to use the second search popularity as the initial search popularity of each target retrieval sentence.
- the entity heat determination module 13 includes:
- the search hotness calculation unit 131 is configured to determine, based on the initial search hotness corresponding to each of the retrieval sentences, the first search hotness of the vertical intent entity corresponding to each of the entities and the second search hotness of each of the entities, based on the The second search popularity of the vertical intent entity corresponding to the entity performs a heat correction process on the first search popularity of the entity to obtain the corrected third search popularity for the entity;
- the entity popularity determination unit 132 is configured to adopt the target popularity calculation strategy corresponding to each entity category, and determine the entity popularity for each of the entities in the initial search popularity corresponding to each of the retrieval sentences.
- the search heat calculation unit 131 is specifically used for:
- the second search popularity is used as the revised third search popularity for the entity .
- the entity popularity determination unit 132 is specifically configured to:
- the heat metric is normalized to obtain the entity heat for the entity.
- the entity popularity determination unit 132 is specifically configured to:
- the first label popularity of the first label corresponding to the entity is obtained, and the first label popularity and the third search popularity of the entity are subjected to a popularity weighting process to obtain Entity heat for the entity.
- the entity heat determination unit 132 is specifically configured to:
- the entity heat determination unit 132 is specifically configured to:
- the second label popularity of the second label corresponding to the second label is obtained, and the second label popularity is subjected to thermal metric dimension normalization to obtain the reference second label label popularity;
- the entity heat determination unit 132 is specifically configured to:
- the entity heat determination unit 132 is specifically configured to:
- the entity heat determination unit 132 is specifically configured to:
- the reference popularity and the heat quantification factor are added to obtain the initial search popularity for the entity.
- the entity category includes a first entity category, a second entity category, a third entity category, and a fourth entity category; wherein,
- the first entity category is an entity category composed of entity words of the entity
- the second entity category is an entity category composed of the entity word and the first label of the entity
- the third entity category is an entity category composed of the entity word and the second label of the entity
- the fourth entity category is an entity category composed of the entity word and the first label and the second label;
- the label association degree between the first label and the entity is smaller than the label association degree between the second label and the entity.
- the device 1 is specifically used for:
- the entity heat generating apparatus when the entity heat generating apparatus provided in the above embodiment executes the entity heat generating method, only the division of the above functional modules is used as an example for illustration In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the entity heat generating apparatus and the entity heat generating method embodiments provided in the above embodiments belong to the same concept, and the implementation process thereof is detailed in the method embodiments, which will not be repeated here.
- An embodiment of the present application further provides a computer storage medium, where the computer storage medium can store multiple instructions, and the instructions are suitable for being loaded and executed by a processor as described in the foregoing embodiments shown in FIG. 1 to FIG. 3 .
- the computer storage medium can store multiple instructions, and the instructions are suitable for being loaded and executed by a processor as described in the foregoing embodiments shown in FIG. 1 to FIG. 3 .
- the specific execution process of the entity heat generation method reference may be made to the specific description of the embodiments shown in FIG. 1 to FIG. 3 , which will not be repeated here.
- the present application also provides a computer program product, where the computer program product stores at least one instruction, and the at least one instruction is loaded by the processor and executes the physical heat in the embodiments shown in FIG. 1 to FIG. 3 above.
- the computer program product stores at least one instruction
- the at least one instruction is loaded by the processor and executes the physical heat in the embodiments shown in FIG. 1 to FIG. 3 above.
- the electronic device 1000 may include: at least one processor 1001 , at least one network interface 1004 , user interface 1003 , memory 1005 , and at least one communication bus 1002 .
- the communication bus 1002 is used to realize the connection and communication between these components.
- the user interface 1003 may include a display screen (Display) and a camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- Display display screen
- Camera Camera
- the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may optionally include a standard wired interface and a wireless interface (eg, a WI-FI interface).
- the processor 1001 may include one or more processing cores.
- the processor 1001 uses various excuses and lines to connect various parts of the entire server 1000, and executes the server by running or executing the instructions, programs, code sets or instruction sets stored in the memory 1005, and calling the data stored in the memory 1005. 1000s of various functions and processing data.
- the processor 1001 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable LogicArray, PLA). implemented in hardware.
- the processor 1001 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like.
- the CPU mainly handles the operating system, user interface, and application programs; the GPU is used for rendering and drawing of the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 1001, but is implemented by a single chip.
- the memory 1005 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory).
- the memory 1005 includes a non-transitory computer-readable storage medium.
- Memory 1005 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
- the memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like used to implement the above method embodiments; the storage data area may store the data and the like involved in the above method embodiments.
- the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 .
- the memory 1005, which is a computer storage medium may include an operating system, a network communication module, a user interface module, and an entity heat generating application program.
- the user interface 1003 is mainly used to provide an input interface for the user, and obtain the data input by the user; and the processor 1001 can be used to call the entity heat generation application stored in the memory 1005, and Specifically, the following operations are performed: acquiring entity feature data for at least one entity, and determining at least one retrieval statement corresponding to each of the entities based on the entity feature data;
- the entity popularity for each of the entities is determined.
- the processor 1001 when the processor 1001 executes the acquiring entity feature data for at least one entity, and determines at least one retrieval statement corresponding to each entity based on the entity feature data, the processor 1001 specifically performs the following operations: Obtain entity feature data for at least one entity from the entity resource end, and determine field content information for each of the entities based on the entity feature data;
- the entity field feature of the field content information is acquired, and at least one retrieval statement for each entity is generated according to the entity field feature.
- the processor 1001 when the processor 1001 executes the acquisition of the entity field feature of the field content information, and generates at least one retrieval statement for each entity according to the entity field feature, the processor 1001 specifically performs the following operations : obtain the entity field feature corresponding to each entity in the field content information, determine the field type corresponding to the entity field feature, and the entity field feature includes the entity word corresponding to the entity and the label associated with the entity word;
- At least one retrieval statement for each of the entities corresponding to the field type is generated.
- the field type when the processor 1001 executes includes a first field type, a second field type and a third field type;
- the first field type is the corresponding field type when the entity field feature contains only the entity word;
- the second field type is the entity field feature containing the entity word and the first associated entity word.
- the second field type when labeling;
- the third field type is the third field type when the entity field feature includes the entity word and the second label associated with the entity word;
- the fourth field type is the fourth field type when the entity field feature includes the entity word and the first label associated with the entity word and the second label;
- the note association degree between the first tag and the entity word is smaller than the note association degree between the second tag and the entity word.
- the processor 1001 specifically performs the following operations when executing the obtaining of the entity search log and determining the initial search popularity corresponding to each of the retrieval sentences in the entity search log: in the entity search log Perform a hot search on the at least one retrieval sentence of each of the entities in the log, and determine the sentence search hotness corresponding to each of the retrieval sentences;
- the sentence search popularity is used as the initial search popularity corresponding to the retrieval sentence.
- the processor 1001 may execute all or part of the processes in the methods of any of the foregoing embodiments. For the specific processes, reference may be made to any of the foregoing embodiments, and details are not repeated here.
- the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included.
- the storage medium can be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例公开了一种实体热度生成方法、装置、存储介质及电子设备。通过获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,并获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,再基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度,可以提高实体热度生成的准确性。
Description
本申请涉及计算机技术领域,尤其涉及一种实体热度生成方法、装置、存储介质及电子设备。
随着互联网技术飞速发展,实体资源更新也越来越快;对各个实体内容的搜索需求对于用户而言越来越重要,例如影视剧检索、音乐歌曲检索、小说检索、人名检索等。在对实体内容的搜索场景中,实体热度非常重要,可反映实体在当前阶段的受欢迎程度。
发明内容
本申请实施例提供了一种实体热度生成方法、装置、存储介质及电子设备,所述技术方案如下:
第一方面,本申请实施例提供了一种实体热度生成方法,所述方法包括:
获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;
获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度;
基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
第二方面,本申请实施例提供了一种实体热度生成装置,所述装置包括:
检索语句确定模块,用于获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;
搜索热度确定模块,用于获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度;
实体热度确定模块,用于基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
第三方面,本申请实施例提供一种计算机存储介质,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行上述的方法步骤。
第四方面,本申请实施例提供一种电子设备,可包括:处理器和存储器;其中,所述存储器存储有计算机程序,所述计算机程序适于由所述处理器加载并执行上述的方法步骤。
在本申请一个或多个实施例中,电子设备通过获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,然后再获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,然后基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度;通过融合资源方的资源数据(如实体特征数据)同时结合搜索端(如搜索引擎)的实体搜索日志,基于搜索场景下用户的检索语句的搜索热度,可科学有效地确定实体的实体热度,并使得不同品类的实体热度得以量化,大幅提高了实体热度生成的准确性;以及,可适用于品类实体冷启动阶段,在实体资源丰富度不足、用户数据较小等情况下也可生成较高准确率的实体热度,实体热度生成方法的鲁棒性优。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实体热度生成方法的流程示意图;
图2是本申请实施例提供的另一种实体热度生成方法的流程示意图;
图3是本申请实施例提供的另一种实体热度生成方法的流程示意图;
图4是本申请实施例提供的一种实体热度生成装置的结构示意图;
图5是本申请实施例提供的一种检索语句确定模块的结构示意图;
图6是本申请实施例提供的一种检索语句生成单元的结构示意图;
图7是本申请实施例提供的一种搜索热度确定模块的结构示意图;
图8是本申请实施例提供的一种热度确定单元的结构示意图;
图9是本申请实施例提供的一种实体热度确定模块的结构示意图;
图10是本申请实施例提供的一种电子设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。在本申请的描述中,需要说明的是,除非另有明确的规定和限定,“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。此外,在本申请的描述中,除非另有说明,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
下面结合具体的实施例对本申请进行详细说明。
在一个实施例中,如图1所示,特提出了一种实体热度生成方法,该方法可依赖于计算机程序实现,可运行于基于冯诺依曼体系的实体热度生成装置上。该计算机程序可集成在应用中,也可作为独立的工具类应用运行。所述实体热度生成装置可以为电子设备,包括但不限于:个人电脑、平板电脑、手持设备、车载设备、服务器、计算设备或连接到无线调制解调器的其它处理设备等。在不同的网络中终端设备可以叫做不同的名称,例如:用户设备、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置、蜂窝电话、无绳电话、5G网络或未来演进网络中的设备等。
具体的,该实体热度生成方法包括:
S101:获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句。
所述实体(entity)可以是指能够独立存在的、作为一切属性的基础和万物本原的东西,通常可以从一组具有相似属性的项目中明确标识的一个单词、一个短语等,在一些实施例中,所述实体可以理解为独立的个体,如独立的商品、独立的歌曲、独立的电视剧等等。
所述实体特征数据可以理解为对实体的具体描述的特征,如外性尺寸、色泽、密度、形状特点等等,在一些实施例中,所述实体特征数据可以是由实体词以及实体对应的标签词等特征构成的数据,所述标签词可以理解为实体对应的属性。
进一步的,在本申请中,实体可以指各种可以计算热度的实体,实体可以是现实中一个事物,也可以是一个概念等。比如,一个公司就是一个实体,一个术语也是一个实体,等等比如店铺、歌曲、电影、电子券或者其它类型的实体。
其中,检索语句通常是指可在搜索框输入的以检索为目的的语句,例如在搜索框输入“我想买折叠电视”,“我想买折叠电视”即为检索语句。检索语句在搜索领域常称之为query。
具体的,电子设备获取的实体特征数据可来源于资源端(cp),如资源端的提供的包含多个实体的 实体特征数据,进一步的,电子设备可以从各资源站点来提取针对资源站中所有或部分实体所对应的实体特征数据,资源站点可以是资源网站、资源数据库;前述提及的资源端可以是提供某些特定领域的资源数据(也即至少一个实体对应的实体特征数据)的站点,这些资源端提供了有关这个领域的全部深度信息或者相关服务。例如,资源端“豆瓣电影”作为电影资源的提供方提供了影视信息、用户评论等资源数据;资源站点“读书”提供了图书详情、书榜、书评等资源数据,等等,本申请不限于此。
在一些实施例中,本申请涉及的实体热度生成方法可适用于实体热度冷启动场景中,在实体热度冷启动场景下通常实体运营或上线时间较短(如小于时间阈值)、实体访客量较少、点击率不高。
具体的,电子设备从资源端获取到针对至少一个实体的实体特征数据之后,然后对所述实体特征数据进行处理,从而确定各所述实体分别对应的至少一个检索语句。在实际应用中,实体特征数据通常是一种结构化的数据,对实体特征数据进行处理,在于从实体特征数据中提取出包含实体对应的实体词以及与实体相关的标签字段。通过对获取到的包含实体词和相关标签词的字段内容进行清洗,去除与实体无关的干扰数据,然后再对清洗后的实体特征数据进行实体语义识别,识别出实体相关字词段内容的特征信息,从而可从实体特征数据中抽取出与至少一个实体相关的实体词和与实体属性相关的标签词。然后按照相应的检索语句生成策略,基于实体对应的实体词和/或标签词进行组合从而生成针对至少一个实体所对应的检索语句。如某实体对应实体词和标签词,则检索语句生成策略可以使生成“实体词+标签词”和“标签词+实体词”两个检索语句,等等。具体的检索语句生成策略可根据实际应用环境进行确定,此处不做具体限定。
S102:获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度。
大量用户通常使用搜索引擎搜索互联网信息。具体地,在搜索引擎可在搜索前端为用户提供搜索界面,并接收用户输入的针对实体的查询关键语句,然后搜索引擎根据查询关键语句匹配网页或网络服务中包含该查询关键语句的搜索结果,在整个针对实体的搜索查询过程中,可对一段时间内的大量用户的搜索查询过程进行搜索数据后端记录,从而生成针对多个实体的实体搜索日志;进一步的,所述实体搜索日志在本申请中可反映前述各检索语句的搜索热度,也即独立访客量(uv)。
具体的,由于实体搜索日志源于实体搜索端(如搜索引擎)的实体搜索数据,可通过实体搜索端获取到实体搜索日志之后,基于前述确定的至少一个检索语句,在实体搜索日志中获取各检索语句的搜索热度,从而将该搜索热度作为各检索语句对应的初始搜索热度(uv)。
S103:基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
对于一个实体而言,表征某个实体通常可基于实体词+标签词进行表征,实体词为实体的名称,标签词可以理解为实体的属性,而针对实体的属性而言,通常不同类别的属性可划分为与实体强相关或弱相关,实际应用中也即可对标签词的类型进行划分,如将针对某个实体的标签划分为第一标签和第二标签,其中,所述第一标签与所述实体的标签关联度小于所述第二标签与所述实体的标签关联度。实际应用中,第一标签可以是针对实体的可选择标签,第二标签可以是针对实体的必须标签,例如对于某一实体:“赘婿”,电视剧为必须标签词,版本信息,如高清版,为可选标签词;例如对于某一实体:“爸爸去哪儿”,综艺为必须标签词,发行方:xx,为可选标签词。
进一步,基于此,针对A实体而言,基于实体特征数据确定的该实体可仅对应有实体词;针对B实体而言,基于实体特征数据确定的该实体可对应有实体词和第一标签词(如可选择标签);针对C实体而言,基于实体特征数据确定的该实体可对应有实体词和第二标签词(如必须标签);针对D实体而言,基于实体特征数据确定的该实体可对应有实体词、第一标签词以及第二标签词(如必须标签),等等,在实际应用中,可基于实体确定的实体特征来划分不同品类的实体,也即实体品类。
在一些实施例中,在确定各所述检索语句对应的初始搜索热度之后,可将“各所述检索语句对应的初始搜索热度”的总体作为一种搜索热度反馈器,搜索热度反馈器用于可基于待检索的目标语句来确定该目标语句的热度反馈值,进一步的,在本申请中,基于各所述实体可确定仅含实体词的搜索,可理解为单实体词热度(uv-e);结合各实体所对应的实体品类语句的热度,也即实体品类语句的搜索热度进 行综合热度衡量,也即可从实体的单实体词搜索维度和实体品类的综合实体搜索维度进行热度聚合处理,从而可确定出用于准确表征某一实体类别所对应的实体热度,也即综合热度,另外在本申请中实体对应的综合热度可理解为实体对应的实体品类也可理解为垂直搜索类别中的垂类实体的实体热度。
另外,若各实体处于实体冷启动场景中,可不依赖于资源方提供的实体特征数据和基于针对实体的诸如实体词曝光度、点击量、评论量等用户热度反馈数据,而是基于实体对应的内部特征数据(如实体词、表征属性的标签词等)来综合衡量,确定某一实体的实体热度,这样在诸如冷启动场景中可避免相关技术中需要依赖于多个资源端的实体特征数据进行热度生成从而导致数据影响偏差较大,而且也无法对齐不同类别下同名实体的热度值。
在本实施例仅概括性的释义确定实体的实体热度,详细实体热度的确定可参考本申请所涉及的其他实施例。
在一种具体的实施场景中,在确定针对每个所述实体的实体热度,对于已经上线的实体品类,可以基于用户的实时搜索点击行为、访客量、诸如评论信息等反馈信息所对应的综合搜索信息,对相应实体品类对应的实体热度进行修正,得到更适用于产品的实体词热度信息。具体实施中可获取针对所述实体的综合搜索信息,提取所述综合搜索信息中的热度搜索特征,热度搜索特征包括但不限于点击量、搜索次数、访客量、曝光量、冷启动热度等特征,将所述热点搜索特征输入至预先训练好的热度更新模型中,输出针对所述实体的热度参考量;基于所述热度参考量对所述实体的实体热度进行热度更新处理。
其中,所述热度更新模型可以是一种神经网络模型,所述神经网络模型可以是基于卷积神经网络(Convolutional Neural Network,CNN)模型,深度神经网络(Deep Neural Network,DNN)模型、循环神经网络(RecurrentNeuralNetworks,RNN)、模型、嵌入(embedding)模型、梯度提升决策树(Gradient Boosting Decision Tree,GBDT)模型、逻辑回归(Logistic Regression,LR)模型等模型中的一种或多种的拟合实现的。在实际应用中,可先创建初始热度更新模型,通过获取大量的样本数据提取样本热度搜索特征,输入至初始热度更新模型中训练,在训练过程中,计算神经网络模型的实际输出值与期望输出值的期望误差,基于所述期望误差调整所述神经网络模型的参数,训练完成后,生成热度更新模型。
在本申请实施例中,电子设备通过获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,然后再获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,然后基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度;通过融合资源方的资源数据(如实体特征数据)同时结合搜索端(如搜索引擎)的实体搜索日志,基于搜索场景下用户的检索语句的搜索热度,可科学有效地确定实体的实体热度,并使得不同品类的实体热度得以量化,大幅提高了实体热度生成的准确性;以及,可适用于品类实体冷启动阶段,在实体资源丰富度不足、用户数据较小等情况下也可生成较高准确率的实体热度,实体热度生成方法的鲁棒性优。
请参见图2,图2是本申请提出的一种实体热度生成方法的另一种实施例的流程示意图。具体的:
S201:获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定针对各所述实体的字段内容信息;
根据一些实施例中,实体特征数据可以理解为对实体的具体描述的特征,电子设备可以从各资源站点来提取针对资源站中所有或部分实体所对应的实体特征数据,资源站点可以是资源网站、资源数据库;前述提及的资源端可以是提供某些特定领域的资源数据(也即至少一个实体对应的实体特征数据)的站点,这些资源端提供了有关这个领域的全部深度信息或者相关服务。例如,资源端“豆瓣电影”作为电影资源的提供方提供了影视信息、用户评论等资源数据;资源站点“读书”提供了图书详情、书榜、书评等资源数据,等等,本申请不限于此。在实际应用中,实体特征数据通常是一种结构化的数据,基于资源端提供的实体特征数据通常会包含诸如竖线、短横线等无关信息,而且还需从实体特征数据提取出包含实体对应的实体词以及与实体相关的标签字段。在本申请中,通过对获取到的包含实体词和相关标签词的字段内容进行清洗(如构建正则表达式对无关信息进行滤除,构建正则表达式提取有用的字段内 容),去除与实体无关的干扰数据,然后再对清洗后的实体特征数据进行实体语义识别,识别出实体相关字词段内容的特征信息,从而可从实体特征数据中抽取出与至少一个实体相关的实体词和与实体属性相关的标签词;“至少一个实体相关的实体词和与实体属性相关的标签词”也即本申请中的字段内容信息。
进一步的,电子设备再获取所述字段内容信息的实体字段特征,根据实体字段特征生成针对每个所述实体的至少一个检索语句。具体参考本实施例的后续步骤。
S202:获取所述字段内容信息中每个实体对应的实体字段特征,确定所述实体字段特征对应的字段类型,所述实体字段特征包括所述实体对应的实体词和所述实体词关联的标签;
具体的,电子设备基于所述实体特征数据确定针对各所述实体的字段内容信息之后,通过对字段内容信息进行识别,主要在于从字段内容信息中确定各实体对应的实体词和标签词,基于实体对应的“实体词和标签词”的不同,实体字段特征也即表征实体对应的实体词和标签词对应的特征,根据一些实施例中,例如,针对A实体的实体字段特征而言,基于实体特征数据确定的该实体可仅对应有实体词;针对B实体的实体字段特征而言,基于实体特征数据确定的该实体可对应有实体词和第一标签词(如可选择标签);针对C实体的实体字段特征而言,基于实体特征数据确定的该实体可对应有实体词和第二标签词(如必须标签);针对D实体的实体字段特征而言,基于实体特征数据确定的该实体可对应有实体词、第一标签词以及第二标签词(如必须标签),等等,在实际应用中,可基于实体确定的实体特征来划分不同品类的实体,也即实体品类。
在一种可行的实施方式中,所述字段类型可划分为第一字段类型、第二字段类型、第三字段类型以及第四字段类型;
所述第一字段类型为所述实体字段特征仅含所述实体词时对应的字段类型;
所述第二字段类型为所述实体字段特征包含所述实体词和所述实体关联的第一标签时的字段类型;
所述第三字段类型为所述实体字段特征包含所述实体词和所述实体关联的第二标签时的字段类型;
所述第四字段类型为所述实体字段特征包含所述实体词和所述实体关联的第一标签以及第二标签时的字段类型;
其中,所述第一标签与所述实体的便签关联度小于所述第二标签与所述实体的便签关联度。
S203:基于每个所述实体字段特征对应的所述字段类型,生成所述字段类型对应的针对每个所述实体的至少一个检索语句。
具体的,在确定每个实体的实体字段特征所属的字段类型之后,基与预设的检索语句生成策略,生成相应类型的字段类型所对应的针对每个所述实体的至少一个检索语句,其中,在本申请中确定针对实体的检索语句在于,基于实体对应的实体字段特征去生成多个检索语句,可多维度的去衡量一个实体的热度。
在一种具体的实施场景中,若所述字段类型为第一字段类型,也即为所述实体字段特征仅含所述实体词时对应的字段类型,则实际应用中说明基于资源端的实体特征数据该实体只包含实体词不含标签,则检索语句生成策略可以仅将该实体词作为该实体对应的检索语句query;
若所述字段类型为第二字段类型,也即为实体字段特征包含所述实体词和所述实体关联的第一标签时对应的字段类型,则实际应用中说明基于资源端的实体特征数据该实体包含实体词也包含第一标签(如非必须标签),则检索语句生成策略可以是:生成“实体词+第一标签词”和“第一标签词+实体词”两个检索语句;在一些实施例中,当第一标签下的第一标签词为多个时,如以实体:x歌曲为例,则实体词为歌曲名,第一标签为版本,则第一标签下-版本会存在多个版本,如A版本,B版本、C版本。则此时基于前述策略即可生产多个检索语句,如“A版本+x歌曲”、“C版本+x歌曲”、“B版本+x歌曲”、“x歌曲+A版本”、“x歌曲+B版本”、“x歌曲+C版本”共计6个检索语句query。
若所述字段类型为第三字段类型,也即为实体字段特征包含所述实体词和所述实体关联的第二标签时对应的字段类型,则实际应用中说明基于资源端的实体特征数据该实体包含实体词也包含第二标签(如必须标签),则检索语句生成策略可以是:生成“实体词+第二标签词”和“第二标签词+实体词” 两个检索语句;在一些实施例中,当第二标签下的第二标签词为多个时,如以实体:x歌曲为例,则实体词为歌曲名,第二标签为歌手,则第一标签下-版本会存在多个版本,如A歌手,B歌手、C歌手。则此时基于前述策略即可生产多个检索语句,如“A歌手+x歌曲”、“C歌手+x歌曲”、“B歌手+x歌曲”、“x歌曲+A歌手”、“x歌曲+B歌手”、“x歌曲+C歌手”共计6个检索语句query。
若所述字段类型为第四字段类型,也即为实体字段特征包含所述实体词和所述实体关联的第二标签和第一标签时对应的字段类型,则实际应用中说明基于资源端的实体特征数据该实体包含实体词也包含第二标签(如必须标签)和第一标签(如非必须标签),则检索语句生成策略可以是:“实体词”、“第一标签”、“第二标签”这三个元素任意排列组合得到的检索语句query,如可以是“实体词+第二标签+第一便签”、“实体词+第一标签+第二便签”....
需要说明的是,上述各字段类型对应的检索语句生成策略仅示意性进行释义,本领域的技术人员应当理解的是,该检索语句生成策略可基于实际应用环境进行自定义设置,上述仅为了更好的理解进行举例说明,并不对本申请的保护范围进行限定。
在一种可行的实施方式中,在生成所述字段类型对应的针对每个所述实体的至少一个检索语句的过程中,生成检索语句query时需要可以对特定query进行过滤以及格式统一。一种方式是:对实体词进行正则处理,也即构建正则表达式,正则表达式对应的正则条件包括字符串大写转小写,繁体转简体,全角转半角,去除符号和多余空格、首尾空格、去除中文之间空格等等。一种方式是如果正则后实体词为空字符串,该query进行过滤,如果正则处理后第二标签对应的标签词(必须tag(标签)词)为空字符串,数字,单字英文字母等预设过滤字符,则对该query进行过滤。其中,正则表达式的构建基于实际应环境的需求确定,此处不做具体限定。
在一种可行的实施方式中,还可对检索语句的长度大于预设值的语句进行过滤,如可以去除长度大于50的query,并对整个query语句进行正则处理,以去除空格、符号、大小写等预设过滤字符,避免这些预设过滤字符引起的后续无法获取实体热度的情况,还可保证同实体词生成的query相同。
S204:在所述实体搜索日志中对各所述实体的所述至少一个检索语句进行热度搜索,确定各所述检索语句对应的语句搜索热度;
在一种具体的实施方式中,电子设备在确定各所述检索语句对应的初始搜索热度之后,可将“各所述检索语句对应的初始搜索热度”的总体作为一种搜索热度反馈器,如构建一种热度query uv模型系统,搜索热度反馈器用于可基于待检索的目标语句来确定该目标语句的热度反馈值,也即将各实体的对应的每个检索语句输入至搜索热度反馈器中,输出各实体的对应的每个检索语句的语句搜索热度,如点击量、访客量等等。
S205:将所述语句搜索热度作为所述检索语句对应的初始搜索热度。
在一种可行的实施方式中,电子设备可直接将该语句搜索热度作为该检索语句对应的初始搜索热度。
在一种可行的实施方式中,电子设备可对多个属于同于相同实体语义的检索语句的热度进行聚合,例如在实际应用中,可能会对应不同语言版本的且属于相同实体语义的检索语句,如英文版本的同一实体语义的检索语句、中文版本的同一实体语义的检索语句、日文版本的同一实体语义的检索语句等等。
实际应用中,电子设备可从各所述检索语句中,确定属于同一实体语义的至少一个目标检索语句;确定方法可以是预先建立语义识别模型,将各检索语句输入至语义识别模型,输出属于同一实体语义的目标检索语句;
电子设备然后对各所述目标检索语句对应的第一搜索热度进行聚合处理,得到第二搜索热度;其中第一搜索热度为目标检索语句对应的搜索热度。具体实施中,电子设备可对所有的目标检索语句对应的第一搜索热度进行加和,得到第二搜索热度。从而可将所述第二搜索热度作为每个所述目标检索语句的初始搜索热度。
S206:基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
具体可参考S101-S103或本申请的其他实施例,此处不再赘述。
在一种具体的实施场景中,在实际应用中可结合场景需求对于目标品类提供实体热词进行推荐实体热词推荐功能可以首先确定目标品类下的各实体的实体热度,计算方法可参考本申请的所述实体热度生成方法,并对各个实体的实体热度进行排序给出各个维度高热实体词数据,然后根据该类目下搜索query的搜索次数打平不同维度的高热实体。这样即可以保证实体词为全网高热词,也可以保证在该类目下各个实体词具有可比较性。
在本申请实施例中,电子设备通过获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,然后再获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,然后基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度;通过融合资源方的资源数据(如实体特征数据)同时结合搜索端(如搜索引擎)的实体搜索日志,基于搜索场景下用户的检索语句的搜索热度,可科学有效地确定实体的实体热度,并使得不同品类的实体热度得以量化,大幅提高了实体热度生成的准确性;以及,可适用于品类实体冷启动阶段,在实体资源丰富度不足、用户数据较小等情况下也可生成较高准确率的实体热度,实体热度生成方法的鲁棒性优。
请参见图3,图3是本申请提出的一种实体热度生成方法的另一种实施例的流程示意图。具体的:
S301:获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;
具体可参考本申请的其他实施例,此处不再赘述。
S302:获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度。
具体可参考本申请的其他实施例,此处不再赘述。
S303:基于各所述检索语句对应的初始搜索热度,确定各所述实体对应的垂类意图实体的第一搜索热度以及各所述实体的第二搜索热度。
所述垂类意图实体可以理解理解为实体与(搜索)类别的组合,实际应用中,垂类意图实体可以理解为有针对性地为某一特定领域、某一特定人群或某一特定需求提供有专门的信息实体检索,简而言之,一个实体对应的垂类意图实体的搜索量,可以理解为该实体在某一垂直领域对应的搜索热度,如实体:赘婿,则垂类意图实体可以是“赘婿影视”、“赘婿小说”、“赘婿图片”等等,前述“影视”“图片”“小说”等可理解为一个垂直领域。具体实施中,垂类意图实体:可由实体词+(搜索)类别组合而成,(搜索)类别通常与用户实际搜索场景中的搜索意图强相关,在本申请中将实体在垂类领域的搜索热度纳入热度生成的参考。在一些实施方式中,垂类意图实体中的(搜索)类别可基于该实体划分题材类型确定(如影视类、图片类、小说类、新闻类);在一些实施方式中,可基于电子设备本端搜索引擎提供的垂类搜索类别确定,如影视垂类搜索、图片垂类搜索等等。在一些实施方式中,垂类搜索类别也可基于实体对应的实体词本身采用专家端干预进行自定义,专家端设置垂类搜索类别,如实体:庆余年,则专家端可自定义设置催类搜索类别为影视类别。
在一些实施方式中,实体对应的垂类意图实体的第一搜索热度可标记为垂类意图搜索热度uv-i。
“所述实体的第二搜索热度”可以理解为实体对应的单实体词构成的检索语句的搜索热度,具体实施中,实体对应实体词,通常为实体名称;则仅由实体词对应的检索语句的搜索热度即为本实施例中的第二搜索热度,在一些实施方式中可标记为单实体搜索热度uv-e。
根据一些实施例中,电子设备在确定各所述检索语句对应的初始搜索热度之后,可将“各所述检索语句对应的初始搜索热度”的总体作为一种搜索热度反馈器,如构建一种热度query uv模型系统,搜索热度反馈器用于可基于待检索的目标语句来确定该搜索对象的热度反馈值,则电子设备可基于各所述实体对应的垂类意图实体在搜索热度反馈器中依次确定催类意图实体的搜索热度,也即第一搜索热度;电子设备可基于各所述实体对应的单实体词构成的检索语句在搜索热度反馈器中依次确定实体的搜索热度,也即第二搜索热度。
S304:基于所述实体对应的垂类意图实体的第二搜索热度对所述实体的第一搜索热度进行热度修正处理,得到修正后的针对所述实体的第三搜索热度。
在本申请中,考虑到用户在使用搜索服务时,用户期待通过输入查询内容,可获取所期望的精准的热点信息,为突显搜索过程中针对实体的主搜索意图,基于此热度修正处理在于凸显主搜索垂类意图,对非主意图搜索进行打压。
具体实施中,电子设备首先可基于所述垂类意图实体的第二搜索热度和设置的目标搜索热度进行热度判决,确定该垂类意图实体的热度是否过低,目标搜索热度为预先针对垂类意图实体的热度门限,在一些实施方式中,目标搜索热度可以是自定义一个热度数值,如热度为0,说明垂类意图实体过低,则需要对用户仅基于实体的实体词进行搜索对应的搜索量进行打压,也即对单实体搜索次数进行打压。具体如下:
若所述垂类意图实体的第二搜索热度与目标搜索热度一致,如垂类意图实体的第二搜索热度等于目标搜索热度-热度0,此时电子设备可对实体的第一搜索热度进行热度打压;在一些实施方式中,电子设备可预先设置热度修正阈值,热度修正阈值用于对实体的第一搜索热度进行热度修正处理。具体实施中,电子设备将所述第一搜索热度与热度修正阈值的乘积作为修正后的针对所述实体的第三搜索热度。
示意性的,目标搜索热度为c,实体的第一搜索热度(如单实体搜索热度)记为uv-e,热度修正阈值记为a,热度修正处理后的所述实体的第三搜索热度记为uv-E,则热度处理过程可由如下公式表示:
uv-E=uv-e*a,uv-e=c
2、电子设备再进一步判决,若所述第一搜索热度与所述实体对应的垂类意图实体的第二搜索热度的比值大于意图占比阈值,将所述第二搜索热度作为修正后的针对所述实体的第三搜索热度。
所述意图占比阈值用于衡量并量化第一搜索热度与所述实体对应的垂类意图实体的意图占比,基于该意图占比阈值进一步突显搜索过程中针对实体的主搜索意图,基于此热度修正处理在于凸显主搜索垂类意图,对非主意图搜索进行打压。上述热度处理过程可由如下公式表示:
uv-E=uv-i,uv-e/uv-i>b
其中,uv-E为第三搜索热度,uv-i为垂类意图实体的第二搜索热度,uv-e为实体的第一搜索热度,b为意图占比阈值。
S305:采用各所述实体品类分别对应的目标热度计算策略,在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度。
根据一些实施例中,本申请结合各实体所对应的实体品类语句的热度,也即实体品类语句的搜索热度进行综合热度衡量,也即可从实体的单实体词搜索维度和实体品类的综合实体搜索维度进行热度聚合处理,从而可确定出用于准确表征某一实体的实体热度,也即综合热度。具体实施中,针对不同实体品类的特征的设置不同的热度计算策略,采用不同的计算方实现对不同实体品类的实体的热度计算。其中,实体品类的划分基于实体的特征而言,在一些实施例中,对于一个实体而言,表征某个实体通常可基于实体词+标签词进行表征,实体词为实体的名称,标签词可以理解为实体的属性,而针对实体的属性而言,通常不同类别的属性可划分为与实体强相关或弱相关,实际应用中也即可对标签词的类型进行划分,如将针对某个实体的标签划分为第一标签和第二标签,其中,所述第一标签与所述实体的标签关联度小于所述第二标签与所述实体的标签关联度。基于此,按照一个实体对应的实体词和标签词这两个维度来对实体进行划分,如下:
所述实体品类包括由所述实体的实体词构成的第一实体品类、由所述实体词以及所述实体的第一标签构成的第二实体品类、由所述实体词以及所述实体的第二标签构成的第三实体品类以及由所述实体词以及所述第一标签和第二标签构成的第四实体品类;其中,所述第一标签与所述实体的标签关联度小于所述第二标签与所述实体的标签关联度。实际应用中,第一标签可以是针对实体的可选择标签,第二标签可以是针对实体的必须标签,例如对于某一实体:“赘婿”,电视剧为必须标签词,版本信息,如高清版,为可选标签词;例如对于某一实体:“爸爸去哪儿”,综艺为必须标签词,发行方:xx,为可选标签词。
根据一些实施例中,针对A实体而言,基于实体特征数据确定的该实体可仅对应有实体词;针对B实体而言,基于实体特征数据确定的该实体可对应有实体词和第一标签词(如可选择标签);针对C实体而言,基于实体特征数据确定的该实体可对应有实体词和第二标签词(如必须标签);针对D实体而言,基于实体特征数据确定的该实体可对应有实体词、第一标签词以及第二标签词(如必须标签),等等,在实际应用中,可基于实体确定的实体特征来划分不同品类的实体,也即实体品类。
以下对实体热度计算进行释义:
策略1:
1、当所述实体品类为由所述实体的实体词所构成的第一实体品类时,电子设备获取与所述实体属于同一实体语义的至少一个单实体语句;
所述单实体语句是相对与实体的标签而言的,若检索语句中仅含实体词成分不含标签成分,则可将此类检索语句称之为单实体语句。
在实际应用中,同一实体语义的实体会存在对应多种版本的单实体语句的情况,如实体:苹果,则单实体语句可以是中文语义的苹果、英文语义的苹果等等,但实际上这些不同版本的单实体语句实质上对应的是同一实体语义的实体。实际应用中,可搭建实体语义识别模型,对各实体对应的各检索语句进行语义识别,确定各检索语句的实体语义,对属于同一实体语义的检索语句进行聚类。这个可获取到某一实体属于同一实体语义所对应的单实体语句。
2、在各所述检索语句对应的初始搜索热度中,确定各所述单实体语句对应的单实体热度;
根据一些实施例中,电子设备在确定各所述检索语句对应的初始搜索热度之后,可将“各所述检索语句对应的初始搜索热度”的总体作为一种搜索热度反馈器,如构建一种热度query uv模型系统,搜索热度反馈器用于可基于待检索的目标语句来确定该目标语句的热度反馈值,也即将各所述单实体语句输入至搜索热度反馈器中,输出各单实体检索语句的语句搜索热度,也即单实体热度,如点击量、访客量等等。
3、对各所述单实体热度进行热度量纲归一化处理,得到针对所述实体的实体热度。
在一种可行的实施方式中,热度量纲归一化处理在于对所有单实体热度进行热量聚类,可将各实体热度加和,得到一个总的实体热度将其作为针对所述实体的是实体热度。
在一种可行的实施方式中,热度量纲归一化处理可以是,从各单实体热度中选取最大值指示的最大实体热度,将每个单实体热度除以最大实体热度,得到实体商值,将该实体商值乘以一个热度量纲值,得到的乘积值作为所述单实体语句对应的实体的实体热度,其中热度量纲值是针对第一类实体设置的,用于将实体热度统一到一个量级上,实现对热度统一量纲化。热度量纲值可预先基于实际应用环境确定,此处不做具体限定。
策略2:
当所述实体品类为由所述实体词以及所述实体的第一标签构成的第二实体品类时,在各所述检索语句对应的初始搜索热度中,获取所述实体对应的第一标签的第一标签热度;
在实际应用中,一个第二实体品类的实体通常可由实体词和第一标签下的第一标签词构成,则用户在对该实体检索时,可以是仅基于单实体词作为检索语句进行检索,可以是基于实体词和第一标签下的第一标签词构成的检索语句进行检索;在本申请中,“实体对应的第一标签”对应的检索场景可理解为后者:也即用户基于体词和第一标签下的第一标签词构成的检索语句进行检索的场景,则第一标签热度可理解为由“实体词和第一标签下的第一标签词”作为检索语句所对的搜索热度。示意性的,第一标签热度即可基于将“实体词和第一标签下的第一标签词”作为热度检索对象,去确定这种检索语句的用户的搜索热度,即可得到第一标签热度。
在一种可行的实施方式中,可直接将“实体词和第一标签下的第一标签词”作为热度检索语句,输入至搜索热度反馈器中,获取第一标签热度。
在一种可行的实施方式中,电子设备可对第二品类的实体进一步的细化热度计算,也即将“在各所述检索语句对应的初始搜索热度中”对“实体词和第一标签下的第一标签词”作为热度检索语句得到的 标签搜索热度作为初始标签热度,然后参照第三搜索热度确定的处理方式热度对齐以及热度量纲归一化,这样可将实体对应的标签维度对实体热度的影响进行热度对齐和聚类,统一到相应量级上,从而起到更精准计算实体的热度的效果。
具体实施中,1、电子设备获取所述实体对应的第一标签的初始标签热度,将所述初始标签热度与所述实体的第三搜索热度进行热度对齐处理,得到热度对齐处理后的所述初始标签热度;
在本申请中,热度对齐处理参考前述实施例步骤对“确定第三搜索热度”的热度修正特征,然后对初始标签热度进行相同或类似的热度修正处理,从而得到热度对齐处理后的初始标签热度。
具体实施中,前述“确定第三搜索热度”时,通常1、前述在垂类意图实体的第二搜索热度与目标搜索热度一致时,将所述第一搜索热度与热度修正阈值的乘积作为修正后的针对所述实体的第三搜索热度,也即确定第三搜索热度时会将热度修正对象“第一搜索热度”乘以一个“热度修正阈值”;那么后续对“初始标签热度”进行热度对齐时,则参照前述对“第一搜索热度”的处理方式,此处对“初始标签热度”乘以相同的“热度修正阈值”,得到热度对齐处理后的所述初始标签热度。
可选的,前述若在“第一搜索热度与所述实体对应的垂类意图实体的第二搜索热度的比值大于意图占比阈值”会将所述第二搜索热度作为修正后的针对所述实体的第三搜索热度,那么此处,也对“初始标签热度”进行同样的处理,也即将“第二搜索热度”作为热度对齐处理后的初始标签热度。
可选的,在一些实施方式中,电子设备可直接将初始标签热度作为第一标签热度。
具体实施中,2、电子设备还可对所述初始标签热度进行热度量纲归一化处理,得到第一标签热度。
一种可行的实施方式中,此处热度量纲归一化处理在于对实体的所有初始标签热度进行热量聚类,可将各实体对应的多个标签热度加和,得到一个总的实体热度将其作为针对所述实体的是实体热度。示意性的,实体的第一标签下可包含多个第一标签词,如第一标签为“歌手”,则第一标签词可以是多个具体的歌手,如歌手a、歌手b、歌手c、歌手d等等。
一种“实体词+第一标签词”的组合即可对应一个初始标签热度,则多个“实体词+第一标签词”的组合则通常对应某一实体对应的多个初始标签热度,例如:实体:abc,第一标签X,第一标签X下可对应X1、X2、X3;则可对应“实体词abc+第一标签词X1”对应检索语句的初始标签热度1,“实体词abc+第一标签词X2”对应检索语句的初始标签热度2,“实体词abc+第一标签词X3”对应检索语句的初始标签热度3;
具体实施中,此处热度量纲归一化处理可以是,从实体的各初始标签热度中选取最大值指示的最大实体标签热度,将每个初始标签热度除以最大实体标签热度,得到实体标签商值;或者,从实体的各初始标签热度选择热度排序前n个热度值的均值N,将每个初始标签热度除以N,得到实体标签商值;
再将该实体标签商值乘以一个标签热度量纲值,得到的乘积值作为所述初始标签热度对应的实体的实体热度,其中标签热度量纲值是针对第一类实体设置的,用于将标签实体热度统一到一个量级上,实现对标签热度统一量纲化。标签热度量纲值可预先基于实际应用环境确定,此处不做具体限定。这样经热度量纲归一化处理之后,即可作为第一标签热度uv-nnt。
具体实施中,3、将所述第一标签热度与所述实体的第三搜索热度进行热度加权处理,得到针对所述实体的实体热度。
具体实施中,也即将第一标签热度和第三搜索热度分别乘以一个权重值,然后将两个权重结果进行加和,加和结果作为针对实体的实体热度;
在一些实施方式中,可以理解的是,当该实体对应的实体品类为第二品类,也即该实体包含实体词和第一标签,在实际应用中,若第一标签下包含多个第一标签词,按照前述方式得到的针对所述实体的实体热度可以理解为:针对具体的实体品类的搜索热度,如某一实体A的实体词为abc,第一标签为X,实体A的第一标签词为X1,则按照前述方式计算出的是“实体词abc+第一标签词X1”对应的实体品类的搜索热度;又如某一实体A的实体词为abc,第一标签为X,实体A的第一标签词为X2,则按照前述方式计算出的是“实体词abc+第一标签词X2”对应实体品类的不同实体标签版本的搜索热度;在本申请中涉及的搜索领域中,实体的区分通常按实体词来划分,但属于同一实体但在实体的标签所对应 的标签词不同的情况下,实际上是实体品类下对应的不同标签版本个体,也即这些不同标签版本实体的实体热度。
策略3:当所述实体品类为由所述实体词以及所述实体的第二标签构成的第三实体品类时,在各所述检索语句对应的初始搜索热度中,获取所述第二标签对应的第二标签的第二标签热度,对所述第二标签热度进行热度量纲归一化处理,得到参考第二标签热度;
在实际应用中,一个第三实体品类的实体通常可由实体词和第二标签下的第二标签词构成,则用户在对该实体检索时,可以是仅基于单实体词作为检索语句进行检索,可以是基于实体词和第二标签下的第二标签词构成的检索语句进行检索;在本申请中,“实体对应的第二标签”对应的检索场景可理解为后者:也即用户基于体词和第二标签下的第二标签词构成的检索语句进行检索的场景,则第二标签搜索热度可理解为由“实体词和第二标签下的第二标签词”作为检索语句所对应的搜索热度。示意性的,第二标签搜索热度即可基于将“实体词和第二标签下的第二标签词”作为热度检索对象,去确定这种检索语句的用户的搜索热度,即可得到第二标签搜索热度。
在一种可行的实施方式中,可直接将“实体词和第二标签下的第二标签词”作为热度检索语句,输入至搜索热度反馈器中,获取第二标签搜索热度。
在一种可行的实施方式中,电子设备可对第三品类的实体进一步的热度细化计算,也即将“在各所述检索语句对应的初始搜索热度中”对“实体词和第二标签下的第二标签词”作为热度检索语句得到的标签搜索热度作为初始热度结果,然后来获取针对所述第二标签热度的标签修正权重,进行参数修正,其中标签修正权重基于实体的第二标签下不同的第二标签词对应的搜索热度确定。
具体实施中,电子设备可获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的参考第三搜索热度。
具体的,以下对标签修正权重的确定过程进行释义,如下:
示意性的,实体的第二标签下可包含多个第二标签词,如第二标签为“音乐版本”,则第一标签词可以是多个不同风格的版本,如版本a、版本b、版本c、版本d等等。
一种“实体词+第二标签词”的组合即可对应一个第二标签热度,则多个“实体词+第二标签词”的组合则通常对应某一实体不同标签版本实体所分别对应的多个第二标签热度,例如:实体:abc,第二标签Y,第二标签Y下可对应Y1、Y2、Y3;则可对应“实体词abc+第二标签词Y1”对应检索语句的第二标签热度1为且实体的一个标签版本实体的标签热度,“实体词abc+第二标签词Y2”对应检索语句的第二标签热度2为且实体的另一个标签版本实体的标签热度,“实体词abc+第二标签词Y3”对应检索语句的第二标签热度3且实体的另一个标签版本实体的标签热度;
进一步的,标签修正权重具体基于“实体所对应当前的标签版本实体的标签热度”(也即实体的当前第二标签热度uv-nt)与实体的所有第二标签版本实体的总标签热度的差值确定,标签修正权重v可由下述公式计算得到:
V=uv-nt/∑uv-nt
i
其中,uv-nt为实体的实体品类所对应的当前第二标签热度,“∑uv-nt
i”为该实体(query)对应的各第二标签版本实体(检索语句query)的搜索热度的加和。
进一步的,基于上述公式计算得到针对所述第二标签热度的标签修正权重之后,然后电子设备可基于所述标签修正权重v对所述实体的第三搜索热度uv-e进行热度修正处理,得到热度修正处理后的参考第三搜索热度;
具体的,标签修正权重实际反馈的是针对实体的当前第二标签版本对应热度与该实体的所有第二标签版本的热度之间的热度比重,基于实体的第二标签的热度比重的差异来量化当前第二标签版本对整个实体搜索的影响程度,因此,可基于该标签修正权重乘以所述实体的第三搜索热度uv-e,得到修正后的参考第三搜索热度。参考第三搜索热度uv-E可由下述公式计算得到:
uv-E=uv-e*v
进一步的,电子设备再将所述参考第二标签热度与所述参考第三搜索热度进行热度加权处理,得到针对所述实体的初版实体热度。
具体实施中,也即将参考第二标签热度和参考第三搜索热度分别乘以一个权重值,然后将两个权重结果进行加和,加和结果作为针对实体的初版实体热度;
在一些实施方式中,可以理解的是,当该实体对应的实体品类为第三品类,也即该实体包含实体词和第二标签,在实际应用中,若第二标签下包含多个第二标签词,按照前述方式得到的针对所述实体的实体热度可以理解为:针对具体的实体品类的搜索热度,如某一实体B的实体词为abcd,第二标签为Q,实体A的第二标签词为Q1,则按照前述方式计算出的是“实体词abc+第二标签词Q1”对应的实体品类的搜索热度;又如某一实体B的实体词为abcd,第二标签为Q,实体A的第二标签词为Q2,则按照前述方式计算出的是“实体词abcd+第二标签词Q2”对应实体品类的不同实体标签版本的初版搜索热度;在本申请中涉及的搜索领域中,实体的区分通常按实体词来划分,但属于同一实体但在实体的标签所对应的标签词不同的情况下,实际上是实体品类下对应的不同标签版本个体,也即这些不同标签版本实体的初版实体热度。
可选的,在实际应用中,若各实体处于实体冷启动场景中,则在实际应用中由于对于同一实体的不同版本第二标签用户搜索意图存在差异,会存在冷门版本的第二标签,也会存在热门版本的第二标签,因此可能冷门版本的初版实体热度的数值较小如等于0,且由于实体冷启动场景中,用户搜索热度也不高,因此热门版本的第二标签对应的初版实体热度与冷门版本的第二标签对应的初版实体热度相差较小,在本申请可进一步对其进行区分,设置一个热度门限值,当初版实体热度满足热度门限值或小于热度门限值时,可不进行处理,当初版实体热度大于热度门限值时,可对其加上一个热度量化因子a,将两种情况进行区分,也即将初版实体热度的数值上加上热度量化因子a作为初版实体热度。
在一种可行的实施方式,在确定实体所对应标签版本实体的初版实体热度,可直接将该初版实体热度作为前述实体的实体热度。
在一种可行的实施方式,考虑到第二标签通常与实体间的标签关联度较高,通常对实体的搜索影响程度较大,为了更精准的对实体热度进行细化,提高实体热度计算的准确度,可对于有必须标签query的类目,可将第二标签下(如必须标签)下所有不同第二标签值的实体的实体搜索热度加和得到标签综合热度,例如,第二标签是某歌手,则将该“某歌手”下的所有歌曲的搜索热度嘉禾得到标签综合热度,并对标签综合热度归一化后与所述初版实体热度加和得到最终版本的实体热度。也即本申请中电子设备通过获取针对所述实体和所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
策略4:当所述实体品类为由所述实体词以及所述第一标签和第二标签构成的第四实体品类时,1、确定针对所述实体对应的第一标签的第一标签热度以及针对所述实体对应的第二标签的第二标签热度,获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的所述第三搜索热度。
其中,所述针对所述实体对应的第一标签的第一标签热度的步骤以及针对所述实体对应的第二标签的第二标签热度的步骤,可参考前述释义,此处不再赘述。
在实际应用中,一个第四实体品类的实体通常可由实体词、第一标签下的第一标签词和第二标签下的第二标签词构成,则用户在对该实体检索时,可以是仅基于单实体词作为检索语句进行检索,可以是基于实体词、第一标签词以及第二标签词中任意一种或多种的拟合构成的检索语句进行检索;
具体的,针对“第四实体品类”的“获取针对所述第二标签热度的标签修正权重”可参照“第三实体品类”的计算过程,计算过程类似,均可基于uv-E=uv-e*v计算得到。在确定标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的所述第三搜索热度,也即计算第三搜索热度与标签修正权重的乘积作为处理后的第三搜索热度。
2、电子设备获取与所述实体词属于同一实体语义的至少一个版本类型的同版实体以及各所述同版 实体对应的第三搜索热度,基于各所述同版实体对应的第三搜索热度的数值排序确定目标热度差值;
前述“同一实体语义”可以理解为实体的实体词相同,由于实体常会对应标签,如第一标签、第二标签,标签下的标签值不同可将各不同标签词且实体词相同的实体划分为不同的同版实体,两两同版实体之间实体词相同,标签下的标签词不同;
电子设备按照各同版实体对应的第三搜索热度的热度高低进行排序,可获取序列中两个不同的热度的差值作为目标热度差值,实际应用中可取排序第一TOP1的第三搜索热度和排序第二TOP2的第三搜索热度的差值作为目标热度差值。
3、基于所述第三搜索热度、所述目标热度差值、所述第一标签热度以及预设的热度量化因子,得到针对所述实体的初版实体热度;
具体的:当所述第一标签热度大于所述目标热度差值,电子设备将所述第一标签热度更新为所述目标热度差值;反之,则不作处理。在本申请中,将第一标签热度与所述目标热度差值进行比较的作用在于,对标签热度进行降权处理,因为当第一标签热度大于所述目标热度差值,通常热度排名靠后的标签热度所对应的同版实体后续的确定的热度会大于靠前的同版实体的热度,为避免此种情况,需进行降权处理,也即将所述第一标签热度更新为所述目标热度差值。
电子设备再可将所述第一标签热度和所述第三搜索热度进行热度加权处理,得到参考热度;
在一种可行的实施方式中,电子设备在得到参考热度时,可先对第一标签热度进行热度统一量纲归一化处理,得到处理后的第一标签热度uv-nnt,然后将第一标签热度uv-nnt、第三搜索热度uv_e和第二标签搜索热度uv_nt进行加权加和处理,得到一个热度综合值,也即参考热度;
电子设备最后将所述参考热度与所述热度量化因子进行加和,得到针对所述实体的初版搜索热度。
具体的,还可对参考热度的数值加上前述热度量化因子a,以进一步区分冷版和热版热度。最后将其参考热度作为初版搜索热度。
4、获取针对所述实体和所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
具体的,考虑到第二标签通常与实体间的标签关联度较高,通常对实体的搜索影响程度较大,为了更精准的对实体热度进行细化,提高实体热度计算的准确度,可对于有必须标签query的类目,可将第二标签下(如必须标签)下所有不同第二标签值的同版实体的初版实体热度加和得到标签综合热度,并将标签综合热度归一化后与所述初版实体热度加和得到最终版本的实体热度。也即本申请中电子设备通过获取针对所述实体和所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
在本申请实施例中,电子设备通过获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,然后再获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,然后基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度;通过融合资源方的资源数据(如实体特征数据)同时结合搜索端(如搜索引擎)的实体搜索日志,基于搜索场景下用户的检索语句的搜索热度,可科学有效地确定实体的实体热度,并使得不同品类的实体热度得以量化,大幅提高了实体热度生成的准确性;以及,可适用于品类实体冷启动阶段,在实体资源丰富度不足、用户数据较小等情况下也可生成较高准确率的实体热度,实体热度生成方法的鲁棒性优。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参见图4,其示出了本申请一个示例性实施例提供的实体热度生成装置的结构示意图。该实体热度生成装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置1包括检索语句确定模块11、搜索热度确定模块12以及实体热度确定模块13。
检索语句确定模块11,用于获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定 各所述实体分别对应的至少一个检索语句;
搜索热度确定模块12,用于获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度;
实体热度确定模块13,用于基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
可选的,如图5所示,所述检索语句确定模块11,包括:
字段内容确定单元111,用于从实体资源端获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定针对各所述实体的字段内容信息;
检索语句生成单元112,用于获取所述字段内容信息的实体字段特征,根据所述实体字段特征生成针对每个所述实体的至少一个检索语句。
可选的,如图6所示,所述检索语句生成单元112,包括:
字段类型获取子单元1121,用于获取所述字段内容信息中每个实体对应的实体字段特征,确定所述实体字段特征对应的字段类型,所述实体字段特征包括所述实体对应的实体词和所述实体词关联的标签;
检索语句确定子单元1122,用于基于每个所述实体字段特征对应的所述字段类型,生成所述字段类型对应的针对每个所述实体的至少一个检索语句。
可选的,所述字段类型包括第一字段类型、第二字段类型以及第三字段类型;
所述第一字段类型为所述实体字段特征仅含所述实体词时对应的字段类型;所述第二字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签时的第二字段类型;
所述第三字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第二标签时的第三字段类型;
所述第四字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签以及第二标签时的第四字段类型;
其中,所述第一标签与所述实体词的便签关联度小于所述第二标签与所述实体词的便签关联度。
可选的,如图7所示,所述搜索热度确定模块12,包括:
热度搜索单元121,用于在所述实体搜索日志中对各所述实体的所述至少一个检索语句进行热度搜索,确定各所述检索语句对应的语句搜索热度;
热度确定单元122,用于将所述语句搜索热度作为所述检索语句对应的初始搜索热度。
可选的,如图8所示,所述热度确定单元122,包括:
语句确定子单元1221,用于从各所述检索语句中,确定属于同一实体语义的至少一个目标检索语句;
热度聚合子单元1222,用于对各所述目标检索语句对应的第一搜索热度进行聚合处理,得到第二搜索热度;
热度确定子单元1223,用于将所述第二搜索热度作为每个所述目标检索语句的初始搜索热度。
可选的,如图9所示,所述实体热度确定模块13,包括:
搜索热度计算单元131,用于基于各所述检索语句对应的初始搜索热度,确定各所述实体对应的垂类意图实体的第一搜索热度以及各所述实体的第二搜索热度,基于所述实体对应的垂类意图实体的第二搜索热度对所述实体的第一搜索热度进行热度修正处理,得到修正后的针对所述实体的第三搜索热度;
实体热度确定单元132,用于采用各所述实体品类分别对应的目标热度计算策略,在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度。
可选的,所述搜索热度计算单元131,具体用于:
若所述垂类意图实体的第二搜索热度与目标搜索热度一致,将所述第一搜索热度与热度修正阈值的乘积作为修正后的针对所述实体的第三搜索热度;
若所述第一搜索热度与所述实体对应的垂类意图实体的第二搜索热度的比值大于意图占比阈值,将 所述第二搜索热度作为修正后的针对所述实体的第三搜索热度。
可选的,当所述实体品类为由所述实体的实体词所构成的第一实体品类时,所述实体热度确定单元132,具体用于:
获取与所述实体属于同一实体语义的至少一个单实体语句,在各所述检索语句对应的初始搜索热度中,确定各所述单实体语句对应的单实体热度,对各所述单实体热度进行热度量纲归一化处理,得到针对所述实体的实体热度。
可选的,当所述实体品类为由所述实体词以及所述实体的第一标签构成的第二实体品类时,所述实体热度确定单元132,具体用于:
在各所述检索语句对应的初始搜索热度中,获取所述实体对应的第一标签的第一标签热度,将所述第一标签热度与所述实体的第三搜索热度进行热度加权处理,得到针对所述实体的实体热度。
可选的,所述实体热度确定单元132,具体用于:
获取所述实体对应的第一标签的初始标签热度,将所述初始标签热度与所述实体的第三搜索热度进行热度对齐处理,得到热度对齐处理后的所述初始标签热度;
对所述初始标签热度进行热度量纲归一化处理,得到第一标签热度。
可选的,所述实体热度确定单元132,具体用于:
在各所述检索语句对应的初始搜索热度中,获取所述第二标签对应的第二标签的第二标签热度,对所述第二标签热度进行热度量纲归一化处理,得到参考第二标签热度;
获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的参考第三搜索热度;
将所述参考第二标签热度与所述参考第三搜索热度进行热度加权处理,得到针对所述实体的初版实体热度;
获取针对所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
可选的,所述实体热度确定单元132,具体用于:
所述采用各所述实体品类分别对应的目标热度计算策略,在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度,包括:
确定针对所述实体对应的第一标签的第一标签热度以及针对所述实体对应的第二标签的第二标签热度,获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的所述第三搜索热度;
获取与所述实体词属于同一实体语义的至少一个版本类型的同版实体以及各所述同版实体对应的第三搜索热度,基于各所述同版实体对应的第三搜索热度的数值排序确定目标热度差值;基于所述第三搜索热度、所述目标热度差值、所述第一标签热度以及预设的热度量化因子,得到针对所述实体的初版实体热度;
获取针对所述实体和所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度;以及。
可选的,所述实体热度确定单元132,具体用于:
获取所述实体对应的第二标签下不同第二标签词实体的参考第二标签热度,对各参考第二标签热度加和得到第二标签综合热度,获取所述第二标签热度与所述第二标签综合热度的比值,将所述比值作为针对所述第二标签热度的标签修正权重。
可选的,所述实体热度确定单元132,具体用于:
当所述第一标签热度大于所述目标热度差值,将所述第一标签热度更新为所述目标热度差值;
将所述第一标签热度和所述第三搜索热度进行热度加权处理,得到参考热度;
将所述参考热度与所述热度量化因子进行加和,得到针对所述实体的初版搜索热度。
可选的,所述实体品类包括第一实体品类、第二实体品类、第三实体品类以及第四实体品类;其中,
所述第一实体品类为由所述实体的实体词所构成的实体品类;
所述第二实体品类由所述实体词以及所述实体的第一标签构成的实体品类;
所述第三实体品类由所述实体词以及所述实体的第二标签构成的实体品类;
所述第四实体品类为由所述实体词以及所述第一标签和第二标签构成的实体品类;
所述第一标签与所述实体的标签关联度小于所述第二标签与所述实体的标签关联度。
可选的,所述装置1,具体用于:
获取针对所述实体的综合搜索信息,提取所述综合搜索信息中的热度搜索特征;
将所述热点搜索特征输入至热度更新模型中,输出针对所述实体的热度参考量;
基于所述热度参考量对所述实体的实体热度进行热度更新处理需要说明的是,上述实施例提供的实体热度生成装置在执行实体热度生成方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的实体热度生成装置与实体热度生成方法实施例属于同一构思,其体现实现过程详见方法实施例,这里不再赘述。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请实施例还提供了一种计算机存储介质,所述计算机存储介质可以存储有多条指令,所述指令适于由处理器加载并执行如上述图1~图3所示实施例的所述实体热度生成方法,具体执行过程可以参见图1~图3所示实施例的具体说明,在此不进行赘述。
本申请还提供了一种计算机程序产品,该计算机程序产品存储有至少一条指令,所述至少一条指令由所述处理器加载并执行如上述图1~图3所示实施例的所述实体热度生成方法,具体执行过程可以参见图1~图3所示实施例的具体说明,在此不进行赘述。
请参见图10,为本申请实施例提供了一种电子设备的结构示意图。如图10所示,所述电子设备1000可以包括:至少一个处理器1001,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。
其中,通信总线1002用于实现这些组件之间的连接通信。
其中,用户接口1003可以包括显示屏(Display)、摄像头(Camera),可选用户接口1003还可以包括标准的有线接口、无线接口。
其中,网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
其中,处理器1001可以包括一个或者多个处理核心。处理器1001利用各种借口和线路连接整个服务器1000内的各个部分,通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集,以及调用存储在存储器1005内的数据,执行服务器1000的各种功能和处理数据。可选的,处理器1001可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable LogicArray,PLA)中的至少一种硬件形式来实现。处理器1001可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1001中,单独通过一块芯片进行实现。
其中,存储器1005可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器1005包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施 例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图10所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及实体热度生成应用程序。
在图10所示的电子设备1000中,用户接口1003主要用于为用户提供输入的接口,获取用户输入的数据;而处理器1001可以用于调用存储器1005中存储的实体热度生成应用程序,并具体执行以下操作:获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;
获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度;
基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
在一个实施例中,所述处理器1001在执行所述获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句时,具体执行以下操作:从实体资源端获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定针对各所述实体的字段内容信息;
获取所述字段内容信息的实体字段特征,根据所述实体字段特征生成针对每个所述实体的至少一个检索语句。
在一个实施例中,所述处理器1001在执行所述获取所述字段内容信息的实体字段特征,根据所述实体字段特征生成针对每个所述实体的至少一个检索语句时,具体执行以下操作:获取所述字段内容信息中每个实体对应的实体字段特征,确定所述实体字段特征对应的字段类型,所述实体字段特征包括所述实体对应的实体词和所述实体词关联的标签;
基于每个所述实体字段特征对应的所述字段类型,生成所述字段类型对应的针对每个所述实体的至少一个检索语句。
在一个实施例中,所述处理器1001在执行所述字段类型包括第一字段类型、第二字段类型以及第三字段类型;
所述第一字段类型为所述实体字段特征仅含所述实体词时对应的字段类型;所述第二字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签时的第二字段类型;
所述第三字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第二标签时的第三字段类型;
所述第四字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签以及第二标签时的第四字段类型;
其中,所述第一标签与所述实体词的便签关联度小于所述第二标签与所述实体词的便签关联度。
在一个实施例中,所述处理器1001在执行所述获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度时,具体执行以下操作:在所述实体搜索日志中对各所述实体的所述至少一个检索语句进行热度搜索,确定各所述检索语句对应的语句搜索热度;
将所述语句搜索热度作为所述检索语句对应的初始搜索热度。
在一个或多个实施例中,所述处理器1001可执行前述任一实施例方法中的全部或部分流程,具体流程可参考前述任一实施例,此处不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体或随机存储记忆体等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。
Claims (20)
- 一种实体热度生成方法,其特征在于,所述方法包括:获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度;基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
- 根据权利要求1所述的方法,其特征在于,所述获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句,包括:从实体资源端获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定针对各所述实体的字段内容信息;获取所述字段内容信息的实体字段特征,根据所述实体字段特征生成针对每个所述实体的至少一个检索语句。
- 根据权利要求2所述的方法,其特征在于,所述获取所述字段内容信息的实体字段特征,根据所述实体字段特征生成针对每个所述实体的至少一个检索语句,包括:获取所述字段内容信息中每个实体对应的实体字段特征,确定所述实体字段特征对应的字段类型,所述实体字段特征包括所述实体对应的实体词和所述实体词关联的标签;基于每个所述实体字段特征对应的所述字段类型,生成所述字段类型对应的针对每个所述实体的至少一个检索语句。
- 根据权利要求3所述的方法,其特征在于,所述字段类型包括第一字段类型、第二字段类型以及第三字段类型;所述第一字段类型为所述实体字段特征仅含所述实体词时对应的字段类型;所述第二字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签时的第二字段类型;所述第三字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第二标签时的第三字段类型;所述第四字段类型为所述实体字段特征包含所述实体词和所述实体词关联的第一标签以及第二标签时的第四字段类型;其中,所述第一标签与所述实体词的便签关联度小于所述第二标签与所述实体词的便签关联度。
- 根据权利要求1所述的方法,其特征在于,所述获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的初始搜索热度,包括:在所述实体搜索日志中对各所述实体的所述至少一个检索语句进行热度搜索,确定各所述检索语句对应的语句搜索热度;将所述语句搜索热度作为所述检索语句对应的初始搜索热度。
- 根据权利要求5所述的方法,其特征在于,所述将所述语句搜索热度作为所述检索语句对应的初始搜索热度,包括:从各所述检索语句中,确定属于同一实体语义的至少一个目标检索语句;对各所述目标检索语句对应的第一搜索热度进行聚合处理,得到第二搜索热度;将所述第二搜索热度作为每个所述目标检索语句的初始搜索热度。
- 根据权利要求1所述的方法,其特征在于,所述基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体品类,确定针对每个所述实体的实体热度,包括:基于各所述检索语句对应的初始搜索热度,确定各所述实体对应的垂类意图实体的第一搜索热度以及各所述实体的第二搜索热度,基于所述实体对应的垂类意图实体的第二搜索热度对所述实体的第一搜索热度进行热度修正处理,得到修正后的针对所述实体的第三搜索热度;采用各所述实体品类分别对应的目标热度计算策略,在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度。
- 根据权利要求7所述的方法,其特征在于,所述基于所述实体对应的垂类意图实体的第二搜索热度对所述实体的第一搜索热度进行热度修正处理,得到修正后的针对所述实体的第三搜索热度,包括:若所述垂类意图实体的第二搜索热度与目标搜索热度一致,将所述第一搜索热度与热度修正阈值的乘积作为修正后的针对所述实体的第三搜索热度;若所述第一搜索热度与所述实体对应的垂类意图实体的第二搜索热度的比值大于意图占比阈值,将所述第二搜索热度作为修正后的针对所述实体的第三搜索热度。
- 根据权利要求7所述的方法,其特征在于,当所述实体品类为由所述实体的实体词所构成的第一实体品类时,所述在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度,包括:获取与所述实体属于同一实体语义的至少一个单实体语句,在各所述检索语句对应的初始搜索热度中,确定各所述单实体语句对应的单实体热度,对各所述单实体热度进行热度量纲归一化处理,得到针对所述实体的实体热度。
- 根据权利要求8所述的方法,其特征在于,当所述实体品类为由所述实体词以及所述实体的第一标签构成的第二实体品类时,所述在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度,包括:在各所述检索语句对应的初始搜索热度中,获取所述实体对应的第一标签的第一标签热度,将所述第一标签热度与所述实体的第三搜索热度进行热度加权处理,得到针对所述实体的实体热度。
- 根据权利要求10所述的方法,其特征在于,所述获取所述实体对应的第一标签的第一标签热度,包括:获取所述实体对应的第一标签的初始标签热度,将所述初始标签热度与所述实体的第三搜索热度进行热度对齐处理,得到热度对齐处理后的所述初始标签热度;对所述初始标签热度进行热度量纲归一化处理,得到第一标签热度。
- 根据权利要求8所述的方法,其特征在于,当所述实体品类为由所述实体词以及所述实体的第二标签构成的第三实体品类时,所述在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度,包括:在各所述检索语句对应的初始搜索热度中,获取所述第二标签对应的第二标签的第二标签热度,对所述第二标签热度进行热度量纲归一化处理,得到参考第二标签热度;获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的参考第三搜索热度;将所述参考第二标签热度与所述参考第三搜索热度进行热度加权处理,得到针对所述实体的初版实体热度;获取针对所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
- 根据权利要求8所述的方法,其特征在于,当所述实体品类为由所述实体词以及所述第一标签和第二标签构成的第四实体品类时,所述采用各所述实体品类分别对应的目标热度计算策略,在各所述检索语句对应的初始搜索热度中确定针对每个所述实体的实体热度,包括:确定针对所述实体对应的第一标签的第一标签热度以及针对所述实体对应的第二标签的第二标签热度,获取针对所述第二标签热度的标签修正权重,基于所述标签修正权重对所述实体的第三搜索热度进行热度修正处理,得到热度修正处理后的所述第三搜索热度;获取与所述实体词属于同一实体语义的至少一个版本类型的同版实体以及各所述同版实体对应的第三搜索热度,基于各所述同版实体对应的第三搜索热度的数值排序确定目标热度差值;基于所述第三搜索热度、所述目标热度差值、所述第一标签热度以及预设的热度量化因子,得到针对所述实体的初版实体热度;获取针对所述实体和所述第二标签所对应的标签综合热度,基于所述初版实体热度以及所述标签综合热度,确定针对所述实体的实体热度。
- 根据权利要求12或13所述的方法,其特征在于,所述获取针对所述第二标签热度的标签修正权重,包括:获取所述实体对应的第二标签下不同第二标签词实体的参考第二标签热度,对各参考第二标签热度加和得到第二标签综合热度,获取所述第二标签热度与所述第二标签综合热度的比值,将所述比值作为针对所述第二标签热度的标签修正权重。
- 根据权利要求13所述的方法,其特征在于,所述基于所述第三搜索热度、所述目标热度差值、所述第一标签热度以及预设的热度量化因子,得到针对所述实体的初版实体热度,包括:当所述第一标签热度大于所述目标热度差值,将所述第一标签热度更新为所述目标热度差值;将所述第一标签热度和所述第三搜索热度进行热度加权处理,得到参考热度;将所述参考热度与所述热度量化因子进行加和,得到针对所述实体的初版搜索热度。
- 根据权利要求1所述的方法,其特征在于,所述实体品类包括第一实体品类、第二实体品类、第三实体品类以及第四实体品类;其中,所述第一实体品类为由所述实体的实体词所构成的实体品类;所述第二实体品类由所述实体词以及所述实体的第一标签构成的实体品类;所述第三实体品类由所述实体词以及所述实体的第二标签构成的实体品类;所述第四实体品类为由所述实体词以及所述第一标签和第二标签构成的实体品类;所述第一标签与所述实体的标签关联度小于所述第二标签与所述实体的标签关联度。
- 根据权利要求1所述的方法,其特征在于,所述确定针对每个所述实体的实体热度之后,还包括:获取针对所述实体的综合搜索信息,提取所述综合搜索信息中的热度搜索特征;将所述热点搜索特征输入至热度更新模型中,输出针对所述实体的热度参考量;基于所述热度参考量对所述实体的实体热度进行热度更新处理。
- 一种实体热度生成装置,其特征在于,所述装置包括:检索语句确定模块,用于获取针对至少一个实体的实体特征数据,基于所述实体特征数据确定各所述实体分别对应的至少一个检索语句;搜索热度确定模块,用于获取实体搜索日志,在所述实体搜索日志中确定各所述检索语句对应的 初始搜索热度;实体热度确定模块,用于基于各所述检索语句对应的初始搜索热度、各所述实体以及各所述实体所对应的实体品类,确定针对每个所述实体的实体热度。
- 一种计算机存储介质,其特征在于,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行如权利要求1~17任意一项的方法步骤。
- 一种电子设备,其特征在于,包括:处理器和存储器;其中,所述存储器存储有计算机程序,所述计算机程序适于由所述处理器加载并执行如权利要求1~17任意一项的方法步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/083497 WO2022204845A1 (zh) | 2021-03-29 | 2021-03-29 | 实体热度生成方法、装置、存储介质及电子设备 |
CN202180094814.4A CN116888590A (zh) | 2021-03-29 | 2021-03-29 | 实体热度生成方法、装置、存储介质及电子设备 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/083497 WO2022204845A1 (zh) | 2021-03-29 | 2021-03-29 | 实体热度生成方法、装置、存储介质及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022204845A1 true WO2022204845A1 (zh) | 2022-10-06 |
Family
ID=83456925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/083497 WO2022204845A1 (zh) | 2021-03-29 | 2021-03-29 | 实体热度生成方法、装置、存储介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116888590A (zh) |
WO (1) | WO2022204845A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150310016A1 (en) * | 2014-04-29 | 2015-10-29 | Yahoo! Inc. | Method and system for entity recognition in a query |
CN105095433A (zh) * | 2015-07-22 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | 实体推荐方法及装置 |
CN106844603A (zh) * | 2017-01-16 | 2017-06-13 | 竹间智能科技(上海)有限公司 | 实体热门度的计算方法及装置、应用方法及装置 |
CN108492150A (zh) * | 2018-04-11 | 2018-09-04 | 口碑(上海)信息技术有限公司 | 实体热度的确定方法及系统 |
CN110309189A (zh) * | 2018-03-13 | 2019-10-08 | 深圳市腾讯计算机系统有限公司 | 实体词的热度获取方法及装置 |
-
2021
- 2021-03-29 CN CN202180094814.4A patent/CN116888590A/zh active Pending
- 2021-03-29 WO PCT/CN2021/083497 patent/WO2022204845A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150310016A1 (en) * | 2014-04-29 | 2015-10-29 | Yahoo! Inc. | Method and system for entity recognition in a query |
CN105095433A (zh) * | 2015-07-22 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | 实体推荐方法及装置 |
CN106844603A (zh) * | 2017-01-16 | 2017-06-13 | 竹间智能科技(上海)有限公司 | 实体热门度的计算方法及装置、应用方法及装置 |
CN110309189A (zh) * | 2018-03-13 | 2019-10-08 | 深圳市腾讯计算机系统有限公司 | 实体词的热度获取方法及装置 |
CN108492150A (zh) * | 2018-04-11 | 2018-09-04 | 口碑(上海)信息技术有限公司 | 实体热度的确定方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN116888590A (zh) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240078386A1 (en) | Methods and systems for language-agnostic machine learning in natural language processing using feature extraction | |
US11599714B2 (en) | Methods and systems for modeling complex taxonomies with natural language understanding | |
WO2020207074A1 (zh) | 一种信息推送的方法及设备 | |
CN106776544B (zh) | 人物关系识别方法及装置和分词方法 | |
CN111046221B (zh) | 歌曲推荐方法、装置、终端设备以及存储介质 | |
KR101644817B1 (ko) | 탐색 결과들을 생성하는 방법 | |
CN110909182B (zh) | 多媒体资源搜索方法、装置、计算机设备及存储介质 | |
WO2019084810A1 (zh) | 一种信息处理方法及终端、计算机存储介质 | |
WO2019041521A1 (zh) | 用户关键词提取装置、方法及计算机可读存储介质 | |
US20120191694A1 (en) | Generation of topic-based language models for an app search engine | |
WO2019047849A1 (zh) | 新闻处理方法、装置、存储介质及计算机设备 | |
WO2018045646A1 (zh) | 基于人工智能的人机交互方法和装置 | |
US10108698B2 (en) | Common data repository for improving transactional efficiencies of user interactions with a computing device | |
US10102246B2 (en) | Natural language consumer segmentation | |
WO2022134360A1 (zh) | 基于词嵌入的模型训练方法、装置、电子设备及存储介质 | |
WO2020155877A1 (zh) | 信息推荐 | |
US11860955B2 (en) | Method and system for providing alternative result for an online search previously with no result | |
US10083398B2 (en) | Framework for annotated-text search using indexed parallel fields | |
US20230214679A1 (en) | Extracting and classifying entities from digital content items | |
CN111566638A (zh) | 向应用编程接口添加描述性元数据以供智能代理使用 | |
CN113688310A (zh) | 一种内容推荐方法、装置、设备及存储介质 | |
CN115221191A (zh) | 一种基于数据湖的虚拟列构建方法以及数据查询方法 | |
WO2022204845A1 (zh) | 实体热度生成方法、装置、存储介质及电子设备 | |
KR101602342B1 (ko) | 의미 태깅된 자연어 질의의 의도에 부합하는 정보 추출 및 제공 방법 및 시스템 | |
US20220027419A1 (en) | Smart search and recommendation method for content, storage medium, and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21933532 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180094814.4 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21933532 Country of ref document: EP Kind code of ref document: A1 |