CN104731867B - A kind of method and apparatus that object is clustered - Google Patents
A kind of method and apparatus that object is clustered Download PDFInfo
- Publication number
- CN104731867B CN104731867B CN201510090184.XA CN201510090184A CN104731867B CN 104731867 B CN104731867 B CN 104731867B CN 201510090184 A CN201510090184 A CN 201510090184A CN 104731867 B CN104731867 B CN 104731867B
- Authority
- CN
- China
- Prior art keywords
- objects
- transition
- transfer
- information
- keywords
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000007704 transition Effects 0.000 claims description 355
- 230000006399 behavior Effects 0.000 claims description 62
- 238000010586 diagram Methods 0.000 description 5
- 239000010931 gold Substances 0.000 description 4
- 229910052737 gold Inorganic materials 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000019771 cognition Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention designs the method clustered to object in computer equipment, and this method includes:The transfer case information of multiple objects is obtained, the transfer case information is used to indicate transfer case based on object information acquisition behavior, that user is in the multiple object;According to the transfer case information, the multiple object is clustered, obtains the cluster result of the multiple object.The present invention can cluster object by analyzing transfer case information of the user in object, therefore identified object classification is more objective, accurate.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for clustering objects.
Background
In the prior art, objects are generally classified by performing natural language analysis on description texts of the objects. In particular, when the object is related to commercial use, such as when the object is a brand, the brand may be classified in combination with data from the perspective of the object, such as the industry and region to which the brand belongs, the sales of the brand, and market needs, in addition to natural language analysis of the brand name.
Disclosure of Invention
The invention aims to provide a method and a device for clustering objects.
According to an aspect of the invention, there is provided a method for clustering objects in a computer device, wherein the method comprises:
acquiring transfer situation information of a plurality of objects, wherein the transfer situation information is used for indicating the transfer situations of users in the plurality of objects based on object information acquisition behaviors;
and clustering the plurality of objects according to the transfer condition information to obtain clustering results of the plurality of objects.
According to another aspect of the present invention, there is also provided an apparatus for clustering objects in a computer device, wherein the apparatus comprises:
means for acquiring transition situation information of a plurality of objects, the transition situation information indicating a transition situation of a user among the plurality of objects based on an object information acquisition behavior;
and clustering the plurality of objects according to the transfer condition information to obtain a clustering result of the plurality of objects.
Compared with the prior art, the invention has the following advantages: 1) the scheme of the invention breaks through the bias in the field, and can cluster the objects by analyzing the transfer condition information of the user in the objects; 2) compared with data from the aspect of objects, the scheme of analyzing the transfer condition of the user in the plurality of objects is closer to the aspect of the user and can reflect the knowledge of the user on the objects more intuitively, so that the object classification determined by the scheme of the invention is more objective and accurate; 3) even in data from the perspective of the user, the transition situation information of the present invention is not common data, and in fact, if the data from the perspective of the user is explicitly mentioned, a person skilled in the art can easily think of direct evaluation (such as scoring, comment text, etc.) from the user.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is a flow chart of a method for clustering objects according to a preferred embodiment of the present invention;
fig. 2 is a schematic structural diagram of a clustering device for clustering objects according to a preferred embodiment of the present invention;
FIG. 3 is a diagram illustrating a user's transfer path among multiple objects in accordance with a preferred embodiment;
FIG. 4 is a diagram illustrating a user's transition paths among a plurality of keywords in a preferred embodiment;
FIG. 5 shows a specific example of a transition from a transition path of a mesh structure of a keyword to a transition path of a mesh structure of an object;
FIG. 6 is a diagram illustrating the transfer of an object to multiple objects in accordance with a preferred embodiment;
fig. 7 shows a specific example of fig. 6.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
Fig. 1 is a schematic flow chart of a method for clustering objects according to a preferred embodiment of the present invention. The method of the embodiment is mainly implemented by computer equipment, and the computer equipment comprises network equipment and user equipment. The network device includes but is not limited to a single network server, a server group consisting of a plurality of network servers, or a cloud based computing (CloudComputing) consisting of a large number of computers or network servers, wherein cloud computing is one of distributed computing, a super virtual computer consisting of a cluster of loosely coupled computer sets; the network in which the network device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, etc. The user equipment includes but is not limited to a PC, a tablet, a smart phone, a PDA, an IPTV, and the like.
It should be noted that the computer devices and networks are only examples, and other existing or future computing devices or networks may be suitable for the present invention, and are included within the scope of the present invention and are incorporated by reference.
The method according to the present embodiment includes step S1 and step S2.
In step S1, the computer device acquires transition situation information of a plurality of objects.
Wherein the object may comprise any object capable of being clustered. Preferably, the object is of a commercial nature. More preferably, the object comprises a brand.
Wherein the transition situation information is used for indicating the transition situation of the user in the plurality of objects based on the object information acquisition behavior. Wherein the object information obtaining behavior comprises any behavior that can be used to obtain information of an object; for example, the object information acquiring behavior includes a behavior of acquiring object information by searching a keyword related to an object; for another example, the object information acquiring behavior includes a behavior of acquiring the object information by clicking and browsing a content related to the object. Wherein, the "object information acquisition behavior based on" indicates that the transition situation reflects a transition situation generated by a user in the object information acquisition behavior, and preferably, the transition situation needs to be determined based on the object information acquisition behavior; for example, the transition situation information of the user in the object is determined by counting the search objects changed by a plurality of users in the search behavior, or by counting the search keywords related to the object changed by a plurality of users in the search behavior.
Preferably, the transfer situation information of the plurality of objects includes, but is not limited to, at least one of:
1) transfer path information of the user among the plurality of objects.
Wherein the transfer path information indicates a transfer path of a user among a plurality of objects. For example, there are three objects Object1, Object2, and Object3, and the branch path information indicates that branch paths of a plurality of users among the three objects include: from Object1 to Object2 and from Object1 to Object 3.
2) Information of the number of transitions of the user between the respective objects.
Wherein the transition number information indicates the number of transitions of the user between the respective objects. For example, there are three objects Object1, Object2, and Object3, and the transition number information indicates that the number of transitions between the three objects by a plurality of users includes: five transfers from Object1 to Object2, and eight transfers from Object1 to Object 3.
3) Transition probability information of the user between the respective objects.
Wherein the transition probability information indicates transition probabilities of users between respective objects. For example, there are three objects Object1, Object2, and Object3, and the transition probability information indicates transition probabilities of a plurality of users among the three objects include: the probability of transition from Object1 to Object2 is 38.46%, and the probability of transition from Object1 to Object3 is 61.54%.
It should be noted that there may be no transition path between some objects in the multiple objects (i.e., the user has not performed transition between some objects in the object information obtaining action), and the number of transitions between some objects and the transition probability are both zero. Furthermore, there may be situations where a transition from an object to the object itself occurs; for example, a user may search for information of the same object using different search keywords several times in succession in a search behavior, thereby creating a situation in which a transition from one object to the object itself occurs.
Preferably, the transfer situation information can be stored in various ways.
For example, the transition situation information is stored as a table, and transition paths of the user among a plurality of objects, and the number of transitions and transition probabilities of the user among the respective objects are recorded in the table, as shown in table 1 below.
Transfer path | Number of transfers | Transition probability |
Object1→Object2 | 5 | 38.46% |
Object1→Object3 | 8 | 61.54% |
TABLE 1
For another example, the transition situation information includes: the transition paths are stored as a mesh structure, and the number of transitions and/or transition probabilities between nodes (i.e., between objects) in the mesh structure. As for the 9 objects Object1 to Object9, the branch situation information of the 9 objects includes branch paths as shown in fig. 3, and the branch times and/or branch probabilities between the respective nodes having arrow connections in fig. 3 (e.g., from Object1 to Object2, from Object1 to Object3, from Object1 to Object4, etc.).
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any transfer situation information for indicating the transfer situation of the user in the plurality of objects based on the object information obtaining behavior should be included in the scope of the present invention.
Specifically, the manner in which the computer device obtains the transfer situation information of the plurality of objects includes, but is not limited to:
1) the computer device directly acquires predetermined transfer situation information of the plurality of objects.
For example, the computer device reads predetermined transfer case information for the plurality of objects from a local or other device.
2) Step S1 further includes step S11 and step S12.
In step S11, the computer device acquires transition situation information of a plurality of keywords.
Wherein the transition situation information of the plurality of keywords is used for indicating the transition situation of the user in the plurality of keywords based on the object information acquisition behavior. Preferably, the plurality of keywords are associated with objects in the object information acquisition behavior; for example, if the object information obtaining behavior is an object search behavior, the keyword may be a search keyword or the like input or selected by the user in the search behavior.
Preferably, the transfer situation information of the plurality of keywords comprises at least one of:
a) and the transfer path information of the user in the keywords.
Wherein the transition path information of the user in the plurality of keywords indicates the transition path of the user in the plurality of keywords. For example, there are three keywords Query1, Query2, and Query3, and the transfer path information indicates transfer paths of a plurality of users among the three keywords including: transfer from Query1 to Query2, and transfer from Query1 to Query 3.
b) Information of the number of transitions of the user between the respective keywords.
Wherein the information of the number of transitions of the user among the keywords indicates the number of transitions of the user among the keywords. For example, there are three keywords Query1, Query2, and Query3, and the number-of-transitions information indicating the number of transitions between the three keywords by a plurality of users includes: transfer from Query1 to Query2 five times, and from Query1 to Query3 eight times.
It should be noted that, there may be no transition path between some keywords in the keywords (i.e., the user does not make a transition between some keywords in the object information obtaining behavior), and the number of transitions between some keywords is zero.
Preferably, the transfer situation information of the plurality of keywords can be stored in a plurality of ways.
For example, the transition situation information is stored as a table, and transition paths of the user among a plurality of keywords and the number of transitions of the user among the respective keywords are recorded in the table, as shown in table 2 below.
Transfer path | Number of transfers |
Query1→Query2 | 5 |
Query1→Query3 | 8 |
TABLE 2
For another example, the transition situation information includes: the transition paths are stored as a mesh structure and the number of transitions between nodes in the mesh structure (i.e., between keywords). As for the 9 keywords Query1 to Query9, the transfer case information of the 9 objects includes the transfer path as shown in fig. 4, and the number of transfers between the respective nodes having arrow connections in fig. 4.
It should be noted that the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any transition situation information for indicating the transition situation of the user in the plurality of keywords based on the object information obtaining behavior should be included in the scope of the present invention.
Specifically, the implementation manner of the computer device acquiring the transfer condition information of the plurality of keywords includes but is not limited to:
a) the computer equipment directly acquires the predetermined transfer condition information of the keywords.
For example, the computer device reads predetermined transfer case information of the plurality of keywords from a local or other device.
b) The computer equipment acquires a keyword attention record of at least one user and determines transfer condition information of the keywords according to the keyword attention record.
The keyword attention record comprises keywords which are attended by the users in the object information acquisition behaviors and time information of the keywords which are attended. Preferably, the object information acquiring behavior comprises a searching behavior, and the focused keywords comprise searched keywords; preferably, the object information acquiring behavior comprises browsing behavior, and the focused keywords comprise keywords clicked on to browse the object content.
Preferably, for the keyword attention record of each user, the computer device determines the transfer path of the user in the keyword and the transfer times among the keywords according to the time information of the attention of the keyword contained in the keyword attention record; and the computer equipment determines the transfer condition information of the plurality of keywords by combining the transfer paths of the users in the keywords and the transfer times among the keywords.
For example, a computer device obtains keyword attention records of a user A and a user B; the keyword attention records of the user a and the user B are respectively shown in the following tables 3 and 4:
keywords of interest | Time when keyword is focused on |
Query1 | 2014-12-13-10:40 |
Query3 | 2014-12-13-10:36 |
TABLE 3
Keywords of interest | Time when keyword is focused on |
Query1 | 2014-11-10-00:14 |
Query2 | 2014-11-10-00:23 |
TABLE 4
Then for the keyword focus record of user a, the computer device determines that the transfer path of user a in the keyword comprises "Query 1 → Query 3", and the number of transfers of "Query 1 → Query 3" is 1; similarly, the computer device determines that the transfer path of user B in the keyword includes "Query 1 → Query 2", and the number of transfers of "Query 1 → Query 2" is 1. Next, the computer device merges the transition paths of the users a and B in the keywords and the number of transitions between the respective keywords, and determines transition situation information of the plurality of keywords as shown in table 5 below.
Transfer path | Number of transfers |
Query1→Query2 | 1 |
Query1→Query3 | 1 |
TABLE 5
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner for obtaining the transfer situation information of multiple keywords should be included in the scope of the present invention.
In step S12, the computer device determines transition situation information of a plurality of objects from objects to which a plurality of keywords are respectively associated and transition situation information of the plurality of keywords.
Specifically, the computer device may determine transition path information of the user among the plurality of objects according to transition path information of the user among the plurality of keywords and objects to which the plurality of keywords are respectively associated, and determine transition times and/or transition probability information of the user among the objects according to transition time information of the user among the respective keywords and objects to which the plurality of keywords are respectively associated.
For example, the transfer situation information of the keywords is as shown in table 2, and Query1, Query2 and Query3 are respectively associated with Object1, Object2 and Object 3; the computer device determines that the branch paths of the user in the objects Object1, Object2 and Object3 include "Object 1 → Object 2" and "Object 1 → Object 3" according to the branch situation information of the keywords and the aforementioned association relationship, and the branch times of the 2 branch paths are 5 and 8 respectively; next, the computer device calculates, from the number of transitions of the 2 transition paths, a transition probability of "Object 1 → Object 2" of 5/(5+8) to 38.46%, and a transition probability of "Object 1 → Object 3" of 8/(5+8) to 61.54%, that is, the computer device obtains the transition situation information as shown in table 1.
It should be noted that, since a user may concern different keywords related to the same object in a plurality of object information obtaining behaviors (for example, different search keywords corresponding to the same object are used in a plurality of searches), there may be a transition probability of transitioning from an object to the object itself, and for example, there may be p shown in fig. 600And the like. One specific example of fig. 6 can be seen in fig. 7. As shown in fig. 7, the probability of transferring from "gold treasure" to "gold treasure" itself may be as high as 71.86%.
It should be noted that, preferably, the computer device can calculate the transition probability p from the object i to the object j based on the following formulaij:
Wherein, aijRepresenting an object iThe number of transitions to the object j,indicating the number of transitions of object i to all objects.
For example, as shown in fig. 6, the Object0 can be transferred to itself and other multiple objects Object1 through Object 13; taking the transition probability from Object1 to Object8 as an example, Object0 to Object8Wherein,all the transition times from the Object0 to the Object0 itself and from the Object1 to the Object13 are indicated.
It should be noted that the objects to which the plurality of keywords are respectively associated may be determined in advance, and fig. 5 shows an example of transition from the transition path of the mesh structure of the keywords to the transition path of the mesh structure of the objects. In fig. 5, each node in the upper mesh structure is a keyword, and each node in the lower mesh structure is an object corresponding to the keyword in the corresponding node in the upper mesh structure.
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any implementation manner for obtaining the transfer situation information of multiple objects should be included in the scope of the present invention.
In step S2, the computer device clusters the plurality of objects according to the transition situation information of the plurality of objects obtained in step S1, obtaining a clustering result of the plurality of objects.
Wherein the clustering result of the plurality of objects can be expressed in a plurality of forms; for example, the clustering result includes a plurality of sets, and each set includes objects belonging to one class; for another example, the clustering result includes: the object ID and the category ID corresponding to the object ID, the category to which the object belongs can be determined by the category ID corresponding to each object ID.
Specifically, the implementation manner of clustering the multiple objects by the computer device according to the transfer condition information of the multiple objects to obtain the clustering results of the multiple objects includes but is not limited to:
1) the computer equipment clusters the objects directly according to the transfer condition information of the objects to obtain the clustering result of the objects. Wherein the higher the transition probability or the number of transitions between two objects, the higher the probability that the two objects are grouped into one class.
For example, the transition case information obtained by the computer device in step S1 is as shown in the foregoing table 1, then the computer device determines that the Object1 and the Object2 cannot be grouped into one type according to the transition probability 38.46% between the Object1 and the Object2 not exceeding the predetermined threshold 60%, and the computer device determines that the Object1 and the Object3 are grouped into one type according to the transition probability 61.54% between the Object1 and the Object3 exceeding the predetermined threshold 60%. The computer device obtains the clustering results [ Object1, Object3], [ Object2] as two sets; wherein the two sets indicate that Object1 and Object3 belong to the same category, and Object2 belongs to a category alone.
It should be noted that, in a case where there are objects grouped into one category already among the plurality of objects (for example, the plurality of objects may include objects grouped into one category by a human or a computer device operation), it may be determined whether the plurality of objects grouped into one category and one or more other objects can be grouped into one category, and then: the higher the transition probability or the transition times between one object and one or more objects in the plurality of objects grouped into one class is, the higher the possibility that the one object and the plurality of objects grouped into one class are grouped into one class is; the higher the transition probability or the number of transitions between one or more of the objects that have been grouped into one category and one or more of the other objects that have been grouped into one category, the higher the likelihood that the objects that have been grouped into one category will be grouped into one category with the other objects that have been grouped into one category.
2) The computer device clusters the plurality of objects by obtaining transition distances between the objects based on the transition situation information, and obtains a clustering result of the plurality of objects.
Specifically, the computer device may obtain transfer distances between all the objects, and then cluster the plurality of objects according to the transfer distances to obtain a clustering result of the plurality of objects; alternatively, the computer device may perform a plurality of clustering operations to obtain a clustering result for the plurality of objects, such as selecting a portion of the objects from the plurality of objects in each clustering operation and determining a desired transition distance between the portion of the objects to perform the clustering operation on the portion of the objects.
Preferably, the transfer distance between the objects includes, but is not limited to, at least one of:
a) a transfer distance between one of the plurality of objects and another of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that one of the plurality of objects is grouped into a class with another of the plurality of objects.
Wherein, the transition distance between the two objects can be determined by the transition times information and/or the transition probability information in the transition situation information.
For example, the transfer distance between two objects can be determined by the following formula:
wherein d isijRepresenting the transfer distance, p, between object i and object jijRepresenting the transition probability, p, between object i to object jjiRepresenting the transition probability between object j to object i and r represents a parameter that can be set manually.
Need to explainThat is, the above formula can be adjusted as needed, such as (p) in the formulaij+pji) /2 is adjusted toAnd the like.
b) A transfer distance between one of the plurality of objects and a plurality of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that one of the plurality of objects is grouped into a class with a plurality of objects of the plurality of objects. Preferably, a plurality of objects of the plurality of objects have been generally grouped into a class.
The transition distance between one of the objects and the plurality of the objects may be determined according to the transition distance between the one object and one or more of the plurality of the objects, or may be determined according to the number of transitions/transition probability between the one object and one or more of the plurality of the objects.
For example, there coexist 9 objects 1 to 9, wherein the transition distance between one Object1 and three objects 4, 7 and 8 grouped into one category can be determined by any one of the following:
the first method comprises the following steps: the minimum of the transition distances between Object1 and Object4, between Object1 and Object7, and between Object1 and Object8 is taken as the transition distance between Object1 and Object4, Object7 and Object 8.
And the second method comprises the following steps: the maximum transition distances among the transition distances between Object1 and Object4, between Object1 and Object7, and between Object1 and Object8 are taken as the transition distances between Object1 and Object4, Object7, and Object 8.
And the third is that: three transition distances between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are calculated, such as an average value, and the calculation results are taken as the transition distances between the Object1 and the Object4, the Object7, and the Object 8.
And fourthly: the maximum transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are determined, and the transition distance is obtained from the maximum transition times/transition probabilities as the transition distances between the Object1 and the Object4, the Object7, and the Object 8.
And a fifth mode: the smallest transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are determined, and the transition distances are obtained as the transition distances between the Object1 and the Object4, the Object7, and the Object8 according to the smallest transition times/transition probabilities.
And a sixth mode: the number of transitions/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are calculated, such as an average value or the like, and the transition distance is calculated from the calculation result as the transition distance between the Object1 and the Object4, the Object7, and the Object 8.
c) Transfer distances between ones of the plurality of objects and others of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that ones of the plurality of objects are grouped into a class with others of the plurality of objects.
The transition distance between the plurality of objects and the other plurality of objects in the plurality of objects may be determined according to the transition distance between one or more objects in the plurality of objects and one or more objects in the other plurality of objects, or may be determined according to the number of transitions/transition probability between one or more objects in the plurality of objects and one or more objects in the other plurality of objects.
For example, there coexist 9 objects Object1 to Object9, wherein the transition distance between two objects Object1 and Object3 grouped into one category and two objects Object4 and Object8 grouped into one category can be determined by either:
the first method comprises the following steps: the smallest of the transition distances between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, and between Object3 and Object8 is taken as the transition distance between Object1 and Object3 and Object4 and Object 8.
And the second method comprises the following steps: the largest of the transition distances between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, and between Object3 and Object8 is taken as the transition distance between Object1 and Object3 and Object4 and Object 8.
And the third is that: four transition distances between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are calculated, such as an average value or the like, and the calculation results are taken as the transition distances between the objects 1 and 3 and between the objects 4 and 8.
And fourthly: the maximum transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are determined, and the transition distances are found from the maximum transition times/transition probabilities as the transition distances between the objects 1 and 3, and the objects 4 and 8.
And a fifth mode: the smallest number of transitions/transition probabilities among the numbers of transitions/transition probabilities between Object1 and Object4, Object1 and Object8, Object3 and Object4, and Object3 and Object8 are determined, and the transition distances are found from the smallest number of transitions/transition probabilities as the transition distances between Object1 and Object3, and Object4 and Object 8.
And a sixth mode: the number of transitions/transition probabilities between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are calculated, such as an average value or the like, and the transition distance is found from the calculation result as the transition distances between the objects 1 and 3 and the objects 4 and 8.
It should be noted that the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that the transfer distance between any objects should be included in the scope of the present invention.
As one of preferable solutions of the implementation 2) of the step S2, the step S2 further includes a step S21, a step S22, a step S23, a step S24, and a step S25.
In step S21, the computer device selects a first partial object and a second partial object among the plurality of objects.
The first part of objects can be one or more objects in the plurality of objects, and the second part of objects can be one or more objects in the plurality of objects different from the first part of objects. Preferably, when the first part of objects or the second part of objects are multiple, the first part of objects or the second part of objects containing multiple objects belong to one class.
In step S22, the computer device acquires a transfer distance between the first partial object and the second partial object determined based on the transfer case information about the first partial object and the second partial object.
It should be noted that, before step S22, the transfer distance between the first partial object and the second partial object may already exist; for example, the transfer distance between the first partial object and the second partial object may have been determined by the computer device in a previous step, etc.
Preferably, the computer device directly reads the transfer distance between the first part of objects and the second part of objects when the transfer distance between the first part of objects and the second part of objects already exists. For example, the computer device directly reads the transfer distance between the first partial object and the second partial object which already exist locally.
When the transfer distance between the first part of objects and the second part of objects does not exist, the computer device determines the transfer distance between the first part of objects and the second part of objects according to the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects, which is determined based on the transfer situation information between the first part of objects and the second part of objects. The manner how to determine the transfer distances between two objects, between one object and multiple objects, and between multiple objects is described in detail in the foregoing description of "transfer distances between objects", and is not described herein again. If the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects already exists before the step is executed, the step may be directly read, and if the transfer distance is not obtained yet during the step, the transfer distance needs to be determined based on the transfer situation information between the first part of objects and the second part of objects.
In step S23, the computer device determines whether the first partial object and the second partial object are grouped into one type according to the transfer distance between the first partial object and the second partial object.
Wherein, the smaller the transfer distance, the higher the probability that the first part of objects and the second part of objects are grouped into one class; the larger the transfer distance, the less likely the first part of objects will be grouped with the second part of objects.
In step S24, the computer device reselects the first partial object and the second partial object, wherein no clustering operation has been performed between the reselected first partial object and the second partial object.
In step S25, the computer apparatus repeats step S22, step S23, and step S24 until a clustering result of the plurality of objects is obtained. Preferably, the computer device may adopt a plurality of ways to determine whether the clustering results of the plurality of objects have been obtained; for example, whether the number of repetitions has exceeded a predetermined repetition threshold, whether there have been no first and second partial objects for which no clustering operation has been performed, and the like.
The following is an example to better illustrate the preferred embodiment:
for example, 5 objects of Object1, Object2, Object3, Object4, and Object5 coexist.
In step S21, the computer device selects Object1 as the first partial Object and Object2 as the second partial Object. Next, in step S22, the computer device determines the transition distance between the Object1 and the Object2, based on the transition situation information between the Object1 and the Object 2. Next, in step S23, the computer device determines that the Object1 and the Object2 are grouped into one type, based on the transition distance between the Object1 and the Object 2; next, in step S24, the computer apparatus selects Object1 and Object2 that have gathered into one category as the first partial Object, and selects Object3 as the second partial Object.
Next, the computer apparatus repeats steps S22 through S23, determines that the Object1 and the Object2 cannot be grouped into one type with the Object3, and repeats step S24, selects the Object1 and the Object2 that have been grouped into one type as the first partial Object, and selects the Object4 as the second partial Object.
Next, the computer apparatus repeats steps S22 through S23, determines that the Object1 and the Object2 cannot be grouped into one type with the Object4, and repeats step S24, selects the Object1 and the Object2 that have been grouped into one type as the first partial Object, and selects the Object5 as the second partial Object.
Next, the computer device repeats steps S22 through S23, determines that Object1 and Object2 cannot be grouped together with Object5, and repeats step S24, selects Object3 as the first partial Object, and selects Object4 as the second partial Object.
Next, the computer apparatus repeats steps S22 to S23, determines that Object3 and Object4 are grouped into one, and repeats step S24, selects Object3 and Object4 that have been grouped into one as the first partial Object, and selects Object5 as the second partial Object.
Next, the computer apparatus repeats steps S22 through S23, determines that Object3 and Object4 are grouped into one type with Object5, and repeats step S24, selects Object1 and Object2 that have been grouped into one type as the first partial Object, and selects Object3, Object4, and Object5 that have been grouped into one type as the second partial Object.
Next, the computer device repeats steps S22 through S23, and determines that Object1 and Object2 cannot be grouped into one category with Object3, Object4, and Object 5. And the computer equipment judges that the first part of objects and the second part of objects which are not clustered do not exist currently, and stops the clustering operation. Then the clustering results of the objects Object1, Object2, Object3, Object4, Object5 are: [ Object1, Object2], [ Object3, Object4, Object5 ].
In the prior art, objects are generally classified by performing natural language analysis on description texts of the objects. In particular, when the object is related to business use, such as when the object is a brand, the object is influenced by human supervision, and in addition to natural language analysis of the object name, the object is classified in combination with data from the perspective of the object, such as the industry and region to which the object belongs, the sales condition of the object, and market demand. That is, in classifying objects designed for commercial use, a bias exists for those skilled in the art: the objects are classified according to business data from the perspective of the objects.
The scheme of the invention breaks through the prejudice, and can cluster the objects by analyzing the transfer condition information of the user in the objects; compared with data from the aspect of objects, the scheme of clustering the objects by analyzing the transfer condition of the user in the objects is closer to the aspect of the user and can reflect the cognition of the user on the objects more intuitively, so that the object classification determined by the scheme of the invention is more objective and accurate; in addition, even in data from the perspective of the user, the transition situation information of the present invention is not common data, and in fact, if data from the perspective of the user is explicitly mentioned, a person skilled in the art can easily think of direct evaluation (such as scoring, comment text, etc.) from the user.
Fig. 2 is a schematic structural diagram of a clustering device for clustering objects according to a preferred embodiment of the present invention. The clustering apparatus is installable into a computer device, the clustering apparatus comprising: the device comprises a device (hereinafter referred to as an acquisition device 1) for acquiring the transfer condition information of a plurality of objects, and a device (hereinafter referred to as a sub-clustering device 2) for clustering the plurality of objects according to the transfer condition information and acquiring the clustering result of the plurality of objects.
The acquisition apparatus 1 acquires transfer situation information of a plurality of objects.
Wherein the object may comprise any object capable of being clustered. Preferably, the object is of a commercial nature. More preferably, the object comprises a brand.
Wherein the transition situation information is used for indicating the transition situation of the user in the plurality of objects based on the object information acquisition behavior. Wherein the object information obtaining behavior comprises any behavior that can be used to obtain information of an object; for example, the object information acquiring behavior includes a behavior of acquiring object information by searching a keyword related to an object; for another example, the object information acquiring behavior includes a behavior of acquiring the object information by clicking and browsing a content related to the object. Wherein, the "object information acquisition behavior based on" indicates that the transition situation reflects a transition situation generated by a user in the object information acquisition behavior, and preferably, the transition situation needs to be determined based on the object information acquisition behavior; for example, the transition situation information of the user in the object is determined by counting the search objects changed by a plurality of users in the search behavior, or by counting the search keywords related to the object changed by a plurality of users in the search behavior.
Preferably, the transfer situation information of the plurality of objects includes, but is not limited to, at least one of:
1) transfer path information of the user among the plurality of objects.
Wherein the transfer path information indicates a transfer path of a user among a plurality of objects. For example, there are three objects Object1, Object2, and Object3, and the branch path information indicates that branch paths of a plurality of users among the three objects include: from Object1 to Object2 and from Object1 to Object 3.
2) Information of the number of transitions of the user between the respective objects.
Wherein the transition number information indicates the number of transitions of the user between the respective objects. For example, there are three objects Object1, Object2, and Object3, and the transition number information indicates that the number of transitions between the three objects by a plurality of users includes: five transfers from Object1 to Object2, and eight transfers from Object1 to Object 3.
3) Transition probability information of the user between the respective objects.
Wherein the transition probability information indicates transition probabilities of users between respective objects. For example, there are three objects Object1, Object2, and Object3, and the transition probability information indicates transition probabilities of a plurality of users among the three objects include: the probability of transition from Object1 to Object2 is 38.46%, and the probability of transition from Object1 to Object3 is 61.54%.
It should be noted that there may be no transition path between some objects in the multiple objects (i.e., the user has not performed transition between some objects in the object information obtaining action), and the number of transitions between some objects and the transition probability are both zero. Furthermore, there may be situations where a transition from an object to the object itself occurs; for example, a user may search for information of the same object using different search keywords several times in succession in a search behavior, thereby creating a situation in which a transition from one object to the object itself occurs.
Preferably, the transfer situation information can be stored in various ways.
For example, the transition situation information is stored as a table, and transition paths of the user among a plurality of objects, and the number of transitions and transition probabilities of the user among the respective objects are recorded in the table, as shown in table 1 described above.
For another example, the transition situation information includes: the transition paths are stored as a mesh structure, and the number of transitions and/or transition probabilities between nodes (i.e., between objects) in the mesh structure. As for the 9 objects Object1 to Object9, the branch situation information of the 9 objects includes branch paths as shown in fig. 3, and the branch times and/or branch probabilities between the respective nodes having arrow connections in fig. 3 (e.g., from Object1 to Object2, from Object1 to Object3, from Object1 to Object4, etc.).
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any transfer situation information for indicating the transfer situation of the user in the plurality of objects based on the object information obtaining behavior should be included in the scope of the present invention.
Specifically, the manner in which the acquisition apparatus 1 acquires the transfer situation information of the plurality of objects includes, but is not limited to:
1) the acquisition device 1 directly acquires predetermined transfer situation information of the plurality of objects.
For example, the acquisition apparatus 1 reads predetermined transfer situation information of the plurality of objects from a local or other device.
2) The acquisition apparatus 1 further includes means for acquiring transition situation information of a plurality of keywords (hereinafter referred to as "first sub-acquisition means", not shown) and means for determining transition situation information of a plurality of objects (hereinafter referred to as "first determination means", not shown) based on objects to which the plurality of keywords are respectively associated and the transition situation information of the plurality of keywords.
The first sub-acquisition means acquires transition situation information of a plurality of keywords.
Wherein the transition situation information of the plurality of keywords is used for indicating the transition situation of the user in the plurality of keywords based on the object information acquisition behavior. Preferably, the plurality of keywords are associated with objects in the object information acquisition behavior; for example, if the object information obtaining behavior is an object search behavior, the keyword may be a search keyword or the like input or selected by the user in the search behavior.
Preferably, the transfer situation information of the plurality of keywords comprises at least one of:
a) and the transfer path information of the user in the keywords.
Wherein the transition path information of the user in the plurality of keywords indicates the transition path of the user in the plurality of keywords. For example, there are three keywords Query1, Query2, and Query3, and the transfer path information indicates transfer paths of a plurality of users among the three keywords including: transfer from Query1 to Query2, and transfer from Query1 to Query 3.
b) Information of the number of transitions of the user between the respective keywords.
Wherein the information of the number of transitions of the user among the keywords indicates the number of transitions of the user among the keywords. For example, there are three keywords Query1, Query2, and Query3, and the number-of-transitions information indicating the number of transitions between the three keywords by a plurality of users includes: transfer from Query1 to Query2 five times, and from Query1 to Query3 eight times.
It should be noted that, there may be no transition path between some keywords in the keywords (i.e., the user does not make a transition between some keywords in the object information obtaining behavior), and the number of transitions between some keywords is zero.
Preferably, the transfer situation information of the plurality of keywords can be stored in a plurality of ways.
For example, the transition situation information is stored as a table, and transition paths of the user among a plurality of keywords and the number of transitions of the user among the keywords are recorded in the table, as shown in the foregoing table 2.
For another example, the transition situation information includes: the transition paths are stored as a mesh structure and the number of transitions between nodes in the mesh structure (i.e., between keywords). As for the 9 keywords Query1 to Query9, the transfer case information of the 9 objects includes the transfer path as shown in fig. 4, and the number of transfers between the respective nodes having arrow connections in fig. 4.
It should be noted that the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any transition situation information for indicating the transition situation of the user in the plurality of keywords based on the object information obtaining behavior should be included in the scope of the present invention.
Specifically, the implementation manner of the first sub-acquisition device acquiring the transfer condition information of the plurality of keywords includes but is not limited to:
a) the first sub-acquisition means directly acquires predetermined transfer situation information of the plurality of keywords.
For example, the first sub-acquisition means reads predetermined transfer situation information of the plurality of keywords from a local or other device.
b) The first sub-acquisition device acquires a keyword attention record of at least one user and determines transfer condition information of the keywords according to the keyword attention record.
The keyword attention record comprises keywords which are attended by the users in the object information acquisition behaviors and time information of the keywords which are attended. Preferably, the object information acquiring behavior comprises a searching behavior, and the focused keywords comprise searched keywords; preferably, the object information acquiring behavior comprises browsing behavior, and the focused keywords comprise keywords clicked on to browse the object content.
Preferably, for the keyword attention record of each user, the first sub-acquisition device determines the transfer path of the user in the keyword and the transfer times among the keywords according to the time information of the keyword attention contained in the keyword attention record; and the first sub-acquisition device determines the transfer condition information of the plurality of keywords by merging the transfer paths of the respective users in the keywords and the transfer times among the respective keywords.
For example, the first sub-acquisition means acquires keyword attention records of the user a and the user B; the keyword focus records of the user a and the user B are shown in the foregoing table 3 and table 4, respectively.
For the keyword attention record of the user a, the first sub-acquisition means determines that the transfer path of the user a in the keyword includes "Query 1 → Query 3", and the number of transfers of "Query 1 → Query 3" is 1; similarly, the first sub-acquisition means determines that the transfer path of the user B in the keyword includes "Query 1 → Query 2", and the number of transfers of "Query 1 → Query 2" is 1. Next, the first sub-acquisition means merges the transition paths of the users a and B in the keywords and the number of transitions between the respective keywords, and determines transition situation information of the plurality of keywords as shown in table 5.
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner for obtaining the transfer situation information of multiple keywords should be included in the scope of the present invention.
The first determining means determines transition situation information of a plurality of objects based on an object to which the plurality of keywords are respectively associated and transition situation information of the plurality of keywords.
Specifically, the first determination means may determine the transition path information of the user among the plurality of objects according to the transition path information of the user among the plurality of keywords and the objects to which the plurality of keywords are respectively associated, and the first determination means determines the transition number and/or transition probability information of the user among the respective objects according to the transition number information of the user among the respective keywords and the objects to which the plurality of keywords are respectively associated.
For example, the transfer situation information of the keywords is as shown in table 2, and Query1, Query2 and Query3 are respectively associated with Object1, Object2 and Object 3; the first determination device determines that the branch paths of the user in the objects Object1, Object2 and Object3 include "Object 1 → Object 2" and "Object 1 → Object 3" according to the branch situation information of the keywords and the aforementioned association relationship, and the branch times of the 2 branch paths are 5 and 8, respectively; next, the first determination device calculates, from the transition numbers of the 2 transition paths, a transition probability of "Object 1 → Object 2" of 5/(5+8) of 38.46%, and a transition probability of "Object 1 → Object 3" of 8/(5+8) of 61.54%, that is, the first determination device obtains the transition situation information as shown in the foregoing table 1.
It should be noted that, since a user may concern different keywords related to the same object in a plurality of object information obtaining behaviors (for example, different search keywords corresponding to the same object are used in a plurality of searches), there may be a transition probability of transitioning from an object to the object itself, and for example, there may be p shown in fig. 600And the like. One specific example of fig. 6 can be seen in fig. 7. As shown in fig. 7, the probability of transferring from "gold treasure" to "gold treasure" itself may be as high as 71.86%.
It should be noted that, preferably, the first determination means may calculate the transition probability p from the object i to the object j based on the following formulaij:
Wherein, aijIndicating the number of transitions from object i to object j,indicating the number of transitions of object i to all objects.
For example, as shown in fig. 6, the Object0 can be transferred to itself and other multiple objects Object1 through Object 13; taking the transition probability from Object1 to Object8 as an example, Object0 to Object8Wherein,all the transition times from the Object0 to the Object0 itself and from the Object1 to the Object13 are indicated.
It should be noted that the objects to which the plurality of keywords are respectively associated may be determined in advance, and fig. 5 shows an example of transition from the transition path of the mesh structure of the keywords to the transition path of the mesh structure of the objects. In fig. 5, each node in the upper mesh structure is a keyword, and each node in the lower mesh structure is an object corresponding to the keyword in the corresponding node in the upper mesh structure.
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any implementation manner for obtaining the transfer situation information of multiple objects should be included in the scope of the present invention.
The sub-clustering means 2 clusters the plurality of objects according to the transfer condition information of the plurality of objects obtained by the obtaining means 1, and obtains a clustering result of the plurality of objects.
Wherein the clustering result of the plurality of objects can be expressed in a plurality of forms; for example, the clustering result includes a plurality of sets, and each set includes objects belonging to one class; for another example, the clustering result includes: the object ID and the category ID corresponding to the object ID, the category to which the object belongs can be determined by the category ID corresponding to each object ID.
Specifically, the implementation manner of clustering the multiple objects by the sub-clustering device 2 according to the transfer condition information of the multiple objects to obtain the clustering results of the multiple objects includes but is not limited to:
1) the sub-clustering device 2 directly clusters the plurality of objects according to the transfer condition information of the plurality of objects to obtain a clustering result of the plurality of objects. Wherein the higher the transition probability or the number of transitions between two objects, the higher the probability that the two objects are grouped into one class.
For example, the branch situation information obtained by the acquisition apparatus 1 is as shown in table 1 described above, and then the sub-cluster apparatus 2 determines that the Object1 and the Object2 cannot be grouped into one class according to the fact that the branch probability 38.46% between the Object1 and the Object2 does not exceed the predetermined threshold 60%, and the sub-cluster apparatus 2 determines that the Object1 and the Object3 are grouped into one class according to the fact that the branch probability 61.54% between the Object1 and the Object3 exceeds the predetermined threshold 60%. The sub-clustering means 2 obtains the clustering results [ Object1, Object3], [ Object2] expressed as two sets; wherein the two sets indicate that Object1 and Object3 belong to the same category, and Object2 belongs to a category alone.
It should be noted that, in a case where there are objects grouped into one category already among the plurality of objects (for example, the plurality of objects may include objects grouped into one category by a human or an operation of a clustering device), it may be determined whether the plurality of objects grouped into one category and one or more other objects can be grouped into one category, and then: the higher the transition probability or the transition times between one object and one or more objects in the plurality of objects grouped into one class is, the higher the possibility that the one object and the plurality of objects grouped into one class are grouped into one class is; the higher the transition probability or the number of transitions between one or more of the objects that have been grouped into one category and one or more of the other objects that have been grouped into one category, the higher the likelihood that the objects that have been grouped into one category will be grouped into one category with the other objects that have been grouped into one category.
2) The sub-clustering means 2 clusters the plurality of objects by obtaining the transition distances between the objects based on the transition situation information, obtaining the clustering results of the plurality of objects.
Specifically, the sub-clustering device 2 may obtain the transfer distances between all the objects, and then cluster the plurality of objects according to the transfer distances to obtain the clustering results of the plurality of objects; alternatively, the sub-clustering means 2 may perform a plurality of clustering operations to obtain a clustering result of a plurality of objects, such as selecting a part of objects from the plurality of objects in each clustering operation and determining a required transition distance between the part of objects, thereby performing the clustering operation on the part of objects.
Preferably, the transfer distance between the objects includes, but is not limited to, at least one of:
a) a transfer distance between one of the plurality of objects and another of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that one of the plurality of objects is grouped into a class with another of the plurality of objects.
Wherein, the transition distance between the two objects can be determined by the transition times information and/or the transition probability information in the transition situation information.
For example, the transfer distance between two objects can be determined by the following formula:
wherein d isijRepresenting the transfer distance, p, between object i and object jijRepresenting the transition probability, p, between object i to object jjiRepresenting the transition probability between object j to object i and r represents a parameter that can be set manually.
Need to make sure thatIllustratively, the above formula can be adjusted as desired, e.g., (p) in the formulaij+pji) /2 is adjusted toAnd the like.
b) A transfer distance between one of the plurality of objects and a plurality of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that one of the plurality of objects is grouped into a class with a plurality of objects of the plurality of objects. Preferably, a plurality of objects of the plurality of objects have been generally grouped into a class.
The transition distance between one of the objects and the plurality of the objects may be determined according to the transition distance between the one object and one or more of the plurality of the objects, or may be determined according to the number of transitions/transition probability between the one object and one or more of the plurality of the objects.
For example, there coexist 9 objects 1 to 9, wherein the transition distance between one Object1 and three objects 4, 7 and 8 grouped into one category can be determined by any one of the following:
the first method comprises the following steps: the minimum of the transition distances between Object1 and Object4, between Object1 and Object7, and between Object1 and Object8 is taken as the transition distance between Object1 and Object4, Object7 and Object 8.
And the second method comprises the following steps: the maximum transition distances among the transition distances between Object1 and Object4, between Object1 and Object7, and between Object1 and Object8 are taken as the transition distances between Object1 and Object4, Object7, and Object 8.
And the third is that: three transition distances between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are calculated, such as an average value, and the calculation results are taken as the transition distances between the Object1 and the Object4, the Object7, and the Object 8.
And fourthly: the maximum transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are determined, and the transition distance is obtained from the maximum transition times/transition probabilities as the transition distances between the Object1 and the Object4, the Object7, and the Object 8.
And a fifth mode: the smallest transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are determined, and the transition distances are obtained as the transition distances between the Object1 and the Object4, the Object7, and the Object8 according to the smallest transition times/transition probabilities.
And a sixth mode: the number of transitions/transition probabilities between the Object1 and the Object4, between the Object1 and the Object7, and between the Object1 and the Object8 are calculated, such as an average value or the like, and the transition distance is calculated from the calculation result as the transition distance between the Object1 and the Object4, the Object7, and the Object 8.
c) Transfer distances between ones of the plurality of objects and others of the plurality of objects. Wherein the smaller the transfer distance, the greater the likelihood that ones of the plurality of objects are grouped into a class with others of the plurality of objects.
The transition distance between the plurality of objects and the other plurality of objects in the plurality of objects may be determined according to the transition distance between one or more objects in the plurality of objects and one or more objects in the other plurality of objects, or may be determined according to the number of transitions/transition probability between one or more objects in the plurality of objects and one or more objects in the other plurality of objects.
For example, there coexist 9 objects Object1 to Object9, wherein the transition distance between two objects Object1 and Object3 grouped into one category and two objects Object4 and Object8 grouped into one category can be determined by either:
the first method comprises the following steps: the smallest of the transition distances between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, and between Object3 and Object8 is taken as the transition distance between Object1 and Object3 and Object4 and Object 8.
And the second method comprises the following steps: the largest of the transition distances between Object1 and Object4, between Object1 and Object8, between Object3 and Object4, and between Object3 and Object8 is taken as the transition distance between Object1 and Object3 and Object4 and Object 8.
And the third is that: four transition distances between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are calculated, such as an average value or the like, and the calculation results are taken as the transition distances between the objects 1 and 3 and between the objects 4 and 8.
And fourthly: the maximum transition times/transition probabilities among the transition times/transition probabilities between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are determined, and the transition distances are found from the maximum transition times/transition probabilities as the transition distances between the objects 1 and 3, and the objects 4 and 8.
And a fifth mode: the smallest number of transitions/transition probabilities among the numbers of transitions/transition probabilities between Object1 and Object4, Object1 and Object8, Object3 and Object4, and Object3 and Object8 are determined, and the transition distances are found from the smallest number of transitions/transition probabilities as the transition distances between Object1 and Object3, and Object4 and Object 8.
And a sixth mode: the number of transitions/transition probabilities between the Object1 and the Object4, between the Object1 and the Object8, between the Object3 and the Object4, and between the Object3 and the Object8 are calculated, such as an average value or the like, and the transition distance is found from the calculation result as the transition distances between the objects 1 and 3 and the objects 4 and 8.
It should be noted that the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that the transfer distance between any objects should be included in the scope of the present invention.
As one of the preferred solutions of the implementation mode 2) of the sub-clustering means 2), the sub-clustering means 2 further comprises means for selecting a first part of objects and a second part of objects among the plurality of objects (hereinafter referred to as "first selecting means", not shown), means for acquiring a transfer distance between the first part of objects and the second part of objects determined based on transfer situation information about the first part of objects and the second part of objects (hereinafter referred to as "second sub-acquiring means", not shown), means for determining whether the first part of objects and the second part of objects are clustered into one class or not based on the transfer distance between the first part of objects and the second part of objects (hereinafter referred to as "second determining means", not shown), means for reselecting the first part of objects and the second part of objects (hereinafter referred to as "second selecting means", not shown in the figure) and means for triggering the second sub-obtaining means, the second determining means, and the second selecting means to repeatedly perform operations until the clustering result of the plurality of objects is obtained (hereinafter referred to as "triggering means", not shown in the figure).
The first selection means selects a first partial object and a second partial object among the plurality of objects.
The first part of objects can be one or more objects in the plurality of objects, and the second part of objects can be one or more objects in the plurality of objects different from the first part of objects. Preferably, when the first part of objects or the second part of objects are multiple, the first part of objects or the second part of objects containing multiple objects belong to one class.
The second sub-acquisition means acquires a transfer distance between the first partial object and the second partial object determined based on the transfer situation information on the first partial object and the second partial object.
It should be noted that, before the second sub-acquiring device performs the operation, the transfer distance between the first partial object and the second partial object may already exist; for example, the transition distance between the first partial object and the second partial object may have been determined by the clustering means in a previous operation, etc.
Preferably, when the transfer distance between the first part of objects and the second part of objects already exists, the second sub-acquisition means directly reads the transfer distance between the first part of objects and the second part of objects. For example, the second sub-acquisition means directly reads the transfer distance between the first partial object and the second partial object that already exist locally.
When the transfer distance between the first part of objects and the second part of objects does not exist, the second sub-acquisition device determines the transfer distance between the first part of objects and the second part of objects according to the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects, which is determined based on the transfer situation information between the first part of objects and the second part of objects. The manner how to determine the transfer distances between two objects, between one object and multiple objects, and between multiple objects is described in detail in the foregoing description of "transfer distances between objects", and is not described herein again. In addition, if the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects exists before the second sub-acquisition device performs the operation, the second sub-acquisition device may directly read the transfer distance, and if the transfer distance is not obtained yet when the second sub-acquisition device performs the operation, the transfer distance needs to be determined based on the transfer situation information between the first part of objects and the second part of objects.
The second determining means determines whether the first part of objects and the second part of objects are grouped into one class according to a transfer distance between the first part of objects and the second part of objects.
Wherein, the smaller the transfer distance, the higher the probability that the first part of objects and the second part of objects are grouped into one class; the larger the transfer distance, the less likely the first part of objects will be grouped with the second part of objects.
The second selection means reselects the first partial object and the second partial object, wherein no clustering operation has been performed between the reselected first partial object and the second partial object.
And the triggering device triggers the second sub-obtaining device, the second determining device and the second selecting device to repeatedly execute the operation until the clustering result of the plurality of objects is obtained. Preferably, the triggering device may adopt a plurality of ways to determine whether the clustering results of the plurality of objects have been obtained; for example, whether the number of repetitions has exceeded a predetermined repetition threshold, whether there have been no first and second partial objects for which no clustering operation has been performed, and the like.
The following is an example to better illustrate the preferred embodiment:
for example, 5 objects of Object1, Object2, Object3, Object4, and Object5 coexist.
The first selection means selects Object1 as the first partial Object and Object2 as the second partial Object. Then, the second sub acquisition means determines the transition distance between Object1 and Object2 from the transition situation information between Object1 and Object 2. Then, the second determining means determines that the Object1 and the Object2 are grouped into one group according to the transfer distance between the Object1 and the Object 2; next, the second selection means selects Object1 and Object2 that have gathered into one category as the first partial Object, and selects Object3 as the second partial Object.
Then, the triggering means triggers the second sub acquiring means and the second determining means to repeatedly perform operations to determine that the Object1 and the Object2 and the Object3 cannot be grouped into one type, and the triggering means triggers the second selecting means to repeatedly perform operations to select the Object1 and the Object2 grouped into one type as the first partial Object and the Object4 as the second partial Object.
Then, the triggering means triggers the second sub acquiring means and the second determining means to repeatedly perform operations to determine that the Object1 and the Object2 and the Object4 cannot be grouped into one type, and the triggering means triggers the second selecting means to repeatedly perform operations to select the Object1 and the Object2 grouped into one type as the first partial Object and the Object5 as the second partial Object.
Then, the triggering means triggers the second sub acquisition means and the second determination means to repeatedly perform operations to determine that the Object1 and the Object2 cannot be grouped into one type with the Object5, and the triggering means triggers the second selection means to repeatedly perform operations to select the Object3 as the first partial Object and the Object4 as the second partial Object.
Then, the triggering means triggers the second sub acquiring means and the second determining means to repeatedly perform operations to determine that the Object3 and the Object4 are grouped into one type, and the triggering means triggers the second selecting means to repeatedly perform operations to select the Object3 and the Object4 that have been grouped into one type as the first partial Object and select the Object5 as the second partial Object.
Then, the triggering means triggers the second sub-acquiring means and the second determining means to repeatedly perform operations to determine that the Object3 and the Object4 are grouped as one with the Object5, and the triggering means triggers the second selecting means to repeatedly perform operations to select the Object1 and the Object2 which are grouped as one as the first partial Object and select the Object3, the Object4, and the Object5 which are grouped as one as the second partial Object.
Then, the triggering device triggers the second sub-acquiring device and the second determining device to repeatedly perform operations to determine that the objects 1 and 2 and the objects 3, 4 and 5 cannot be grouped into one type. And the triggering device judges that the first part of objects and the second part of objects which are not clustered do not exist at present, and stops clustering operation. Then the clustering results of the objects Object1, Object2, Object3, Object4, Object5 are: [ Object1, Object2], [ Object3, Object4, Object5 ].
In the prior art, objects are generally classified by performing natural language analysis on description texts of the objects. In particular, when the object is related to business use, such as when the object is a brand, the object is influenced by human supervision, and in addition to natural language analysis of the object name, the object is classified in combination with data from the perspective of the object, such as the industry and region to which the object belongs, the sales condition of the object, and market demand. That is, in classifying objects designed for commercial use, a bias exists for those skilled in the art: the objects are classified according to business data from the perspective of the objects.
The scheme of the invention breaks through the prejudice, and can cluster the objects by analyzing the transfer condition information of the user in the objects; compared with data from the aspect of objects, the method and the device have the advantages that the scheme of analyzing the transfer condition of the user among the objects is closer to the aspect of the user, and the cognition of the user on the objects can be reflected more intuitively, so that the object classification determined by the scheme is more objective and accurate; in addition, even in data from the perspective of the user, the transition situation information of the present invention is not common data, and in fact, if data from the perspective of the user is explicitly mentioned, a person skilled in the art can easily think of direct evaluation (such as scoring, comment text, etc.) from the user.
It is noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, the various means of the invention may be implemented using Application Specific Integrated Circuits (ASICs) or any other similar hardware devices. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Claims (16)
1. A method for clustering objects in a computer device, wherein the method comprises:
acquiring transfer situation information of a plurality of objects, wherein the transfer situation information is used for indicating the transfer situations of users in the plurality of objects based on object information acquisition behaviors;
clustering the plurality of objects according to the transfer condition information to obtain clustering results of the plurality of objects;
wherein the step of obtaining the transfer situation information comprises:
acquiring transition situation information of a plurality of keywords, wherein the transition situation information of the plurality of keywords is used for indicating transition situations of a user in the plurality of keywords based on object information acquisition behaviors;
determining transfer condition information of a user in the objects according to the objects to which the keywords are respectively associated and the transfer condition information of the keywords;
wherein the step of obtaining the transfer condition information of the plurality of keywords comprises:
acquiring a keyword attention record of at least one user, wherein the keyword attention record comprises keywords which are attended by the users in object information acquisition behaviors and time information of the keywords which are attended;
and determining the transfer condition information of the keywords according to the keyword attention records.
2. The method of claim 1, wherein the step of clustering comprises:
clustering the plurality of objects by obtaining transition distances between the objects based on the transition situation information, to obtain a clustering result of the plurality of objects.
3. The method of claim 2, wherein the step of clustering comprises:
selecting a first partial object and a second partial object among the plurality of objects;
acquiring a transfer distance between the first part of objects and the second part of objects determined based on transfer condition information related to the first part of objects and the second part of objects;
determining whether the first part of objects and the second part of objects are gathered into one class according to the transfer distance between the first part of objects and the second part of objects;
reselecting the first part of objects and the second part of objects, wherein clustering operation is not performed between the reselected first part of objects and the second part of objects;
repeating the steps of obtaining the transfer distance between the first part of objects and the second part of objects, determining whether the first part of objects and the second part of objects are gathered into one class, and reselecting the first part of objects and the second part of objects until the clustering result of the plurality of objects is obtained.
4. The method of claim 3, wherein the step of obtaining a transfer distance between the first portion of objects and the second portion of objects comprises:
when the transfer distance between the first part of objects and the second part of objects already exists, directly reading the transfer distance between the first part of objects and the second part of objects;
when the transfer distance between the first part of objects and the second part of objects does not exist, the transfer distance between the first part of objects and the second part of objects is determined according to the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects, which is determined based on the transfer condition information between the first part of objects and the second part of objects.
5. The method of any of claims 2 to 4, wherein the transition distance between the objects comprises at least one of:
-a transfer distance between one object of the plurality of objects and another object of the plurality of objects;
-transition distances between one of the plurality of objects and a plurality of the plurality of objects;
-transfer distances between a plurality of objects of the plurality of objects and other plurality of objects of the plurality of objects.
6. The method of claim 1, wherein the transition situation information for the plurality of keywords comprises at least one of:
-transfer path information of the user among the plurality of keywords;
-information of the number of transitions of the user between the respective keywords.
7. The method of any of claims 1-4, wherein the transfer case information of the plurality of objects comprises at least one of:
-transfer path information of the user in the plurality of objects;
-information of the number of transitions of the user between the respective objects;
-transition probability information of the user between the respective objects.
8. The method of any of claims 1-4, wherein the object comprises a brand.
9. An apparatus for clustering objects in a computer device, wherein the apparatus comprises:
means for acquiring transition situation information of a plurality of objects, the transition situation information indicating a transition situation of a user among the plurality of objects based on an object information acquisition behavior;
means for clustering the plurality of objects according to the transfer condition information to obtain a clustering result of the plurality of objects;
wherein the means for obtaining the transfer situation information comprises:
acquiring transition situation information of a plurality of keywords, wherein the transition situation information of the plurality of keywords is used for indicating transition situations of users in the plurality of keywords based on object information acquisition behaviors;
means for determining transfer situation information of a user among a plurality of objects to which the plurality of keywords are respectively associated, according to the plurality of objects and the transfer situation information of the plurality of keywords;
wherein the device for acquiring the transfer situation information of the plurality of keywords comprises:
means for acquiring a keyword attention record of at least one user, the keyword attention record including keywords which the plurality of users have paid attention to in the object information acquisition behavior and time information at which the keywords are paid attention to;
and determining the transfer condition information of the plurality of keywords according to the keyword attention records.
10. The apparatus of claim 9, wherein the means for clustering comprises:
means for clustering the plurality of objects by obtaining transition distances between objects based on the transition situation information, to obtain a clustering result of the plurality of objects.
11. The apparatus of claim 10, wherein the means for clustering comprises:
means for selecting a first partial object and a second partial object in the plurality of objects;
means for obtaining a transfer distance between the first part of objects and the second part of objects determined based on transfer case information on the first part of objects and the second part of objects;
means for determining whether the first portion of objects and the second portion of objects are clustered into a class based on a transition distance between the first portion of objects and the second portion of objects;
means for reselecting the first part of objects and the second part of objects, wherein no clustering operation has been performed between the reselected first part of objects and the second part of objects;
the method comprises the steps of triggering a device for obtaining a transfer distance between a first part of objects and a second part of objects, determining whether the first part of objects and the second part of objects are clustered into a class, and repeatedly executing the operation by a device for reselecting the first part of objects and the second part of objects until a clustering result of the plurality of objects is obtained.
12. The apparatus of claim 11, wherein the means for obtaining a transfer distance between the first portion of objects and the second portion of objects comprises:
means for directly reading a transfer distance between the first portion of objects and a second portion of objects when the transfer distance between the first portion of objects and the second portion of objects already exists;
means for determining a transfer distance between the first part of objects and the second part of objects according to the transfer distance between one or more objects in the first part of objects and one or more objects in the second part of objects determined based on the transfer situation information between the first part of objects and the second part of objects when the transfer distance between the first part of objects and the second part of objects does not exist.
13. The apparatus of any of claims 10 to 12, wherein the transition distance between the objects comprises at least one of:
-a transfer distance between one object of the plurality of objects and another object of the plurality of objects;
-transition distances between one of the plurality of objects and a plurality of the plurality of objects;
-transfer distances between a plurality of objects of the plurality of objects and other plurality of objects of the plurality of objects.
14. The apparatus of claim 9, wherein the transition situation information for the plurality of keywords comprises at least one of:
-transfer path information of the user among the plurality of keywords;
-information of the number of transitions of the user between said respective keywords.
15. The apparatus according to any one of claims 9 to 12, wherein the transfer case information of the plurality of objects includes at least one of:
-transfer path information of the user in the plurality of objects;
-information of the number of transitions of the user between said respective objects;
-transition probability information of the user between said respective objects.
16. The apparatus of any of claims 9-12, wherein the object comprises a brand.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510090184.XA CN104731867B (en) | 2015-02-27 | 2015-02-27 | A kind of method and apparatus that object is clustered |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510090184.XA CN104731867B (en) | 2015-02-27 | 2015-02-27 | A kind of method and apparatus that object is clustered |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104731867A CN104731867A (en) | 2015-06-24 |
CN104731867B true CN104731867B (en) | 2018-09-07 |
Family
ID=53455754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510090184.XA Active CN104731867B (en) | 2015-02-27 | 2015-02-27 | A kind of method and apparatus that object is clustered |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104731867B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069542B (en) * | 2017-09-26 | 2021-06-29 | 北京国双科技有限公司 | Keyword evaluation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5504887A (en) * | 1993-09-10 | 1996-04-02 | International Business Machines Corporation | Storage clustering and packing of objects on the basis of query workload ranking |
CN101527000A (en) * | 2009-04-03 | 2009-09-09 | 南京航空航天大学 | Fast movable object orbit clustering method based on sampling |
CN104142950A (en) * | 2013-05-10 | 2014-11-12 | 中国人民大学 | Microblog User Classification Method Based on Keyword Extraction and Gini Coefficient |
CN104199969A (en) * | 2014-09-22 | 2014-12-10 | 北京国双科技有限公司 | Webpage data analysis method and device |
-
2015
- 2015-02-27 CN CN201510090184.XA patent/CN104731867B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5504887A (en) * | 1993-09-10 | 1996-04-02 | International Business Machines Corporation | Storage clustering and packing of objects on the basis of query workload ranking |
CN101527000A (en) * | 2009-04-03 | 2009-09-09 | 南京航空航天大学 | Fast movable object orbit clustering method based on sampling |
CN104142950A (en) * | 2013-05-10 | 2014-11-12 | 中国人民大学 | Microblog User Classification Method Based on Keyword Extraction and Gini Coefficient |
CN104199969A (en) * | 2014-09-22 | 2014-12-10 | 北京国双科技有限公司 | Webpage data analysis method and device |
Also Published As
Publication number | Publication date |
---|---|
CN104731867A (en) | 2015-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980692B (en) | Influence calculation method based on microblog specific events | |
US9589208B2 (en) | Retrieval of similar images to a query image | |
US9317613B2 (en) | Large scale entity-specific resource classification | |
JP5092165B2 (en) | Data construction method and system | |
US20120191694A1 (en) | Generation of topic-based language models for an app search engine | |
US10152478B2 (en) | Apparatus, system and method for string disambiguation and entity ranking | |
EP3918758A1 (en) | Real-time event detection on social data streams | |
US20080270549A1 (en) | Extracting link spam using random walks and spam seeds | |
WO2009085815A1 (en) | Expanding a query to include terms associated through visual content | |
US10135723B2 (en) | System and method for supervised network clustering | |
Bykau et al. | Fine-grained controversy detection in Wikipedia | |
KR101638535B1 (en) | Method of detecting issue patten associated with user search word, server performing the same and storage medium storing the same | |
JP4714710B2 (en) | Automatic tagging device, automatic tagging method, automatic tagging program, and recording medium recording the program | |
CN109446393B (en) | Network community topic classification method and device | |
US9020962B2 (en) | Interest expansion using a taxonomy | |
CN103262079A (en) | Search device, search method, search program, and computer-readable memory medium for recording search program | |
Cheng et al. | Context-based page unit recommendation for web-based sensemaking tasks | |
JP5321258B2 (en) | Information collecting system, information collecting method and program thereof | |
CN104731867B (en) | A kind of method and apparatus that object is clustered | |
CN114491232B (en) | Information query method and device, electronic equipment and storage medium | |
CN109145261B (en) | Method and device for generating label | |
JP2017219929A (en) | Generation device, generation method and generation program | |
CN104809148B (en) | A kind of method and apparatus for determining mark post object | |
JP5810937B2 (en) | Management program and device | |
CN113868481A (en) | Component acquisition method, device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |