Disclosure of Invention
The application provides an information association method and equipment, which are used for accurately associating information of a user on the premise of reducing equipment load and resource consumption; the specific application provides an information association method, which comprises the following steps:
acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string;
and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
Optionally, after obtaining the original data including the identification and the attribute information that have an association relationship in each of the different service scenarios, the method further includes:
classifying and integrating the obtained original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Optionally, the associating the attribute information with the most frequent occurrence frequency in each service scene in the original data with the identification identifier to generate an associated information string, specifically including:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
Optionally, the method includes sequentially selecting types of attribute information to be associated according to a preset association sequence, associating attribute information with a maximum judgment value in attribute information corresponding to the types in each service scene in the original data with an associated information string with a latest current generation time as associated information, and generating an associated information string to implement information association, where the method specifically includes:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Optionally, the associated information string with the latest current generation time includes N attribute information;
the calculating of the conditional probability that each attribute information determined by the calculation appears at the same time with the attribute information in the associated information string with the latest current generation time in each service scene specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
The present application further provides an information associating device, including:
the acquisition module is used for acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
the first generation module is used for associating attribute information with the largest frequency of occurrence in each service scene in the original data with the identification mark as associated information to generate an associated information string;
and the second generation module is used for sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
Optionally, the apparatus further comprises:
the processing module is used for carrying out classification integration processing on the acquired original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Optionally, the first generating module is specifically configured to:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
Optionally, the second generating module is specifically configured to:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Optionally, the associated information string with the latest current generation time includes N attribute information;
the second generation module calculates conditional probabilities that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene, and specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated
Compared with the prior art, the method and the device have the advantages that original data containing the identification marks and the attribute information which have incidence relations under different service scenes are obtained; taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string; and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association. The method and the device realize accurate association of the user information on the premise of reducing equipment load and resource consumption.
Detailed Description
To overcome the defects in the prior art, an embodiment of the present application discloses an information association method, which implements accurate association of user information on the premise of reducing equipment load and resource consumption, and specifically, as shown in fig. 1, the method includes the following steps:
step 101, acquiring original data containing identification marks and attribute information which have incidence relations in different service scenes.
In a specific embodiment, for example, a database of a shopping website, there are possible login, authentication, logistics, etc. in a business scenario, and since various information in the database of the shopping website is carried by an account, the account can be set as an identification. Of course, the identifier is not limited to this, and for example, if a database of a mobile operator is obtained, and the data in the database is a mobile communication number as a carrier, the mobile communication number may also be used as the identifier. The original data refers to data associated with the same identification identifier, for example, the identification identifier is an account 1, and the acquired original data includes the account 1 and attribute information associated in various different service scenarios.
After the original data is acquired, in order to facilitate subsequent data extraction, classification and integration processing may be performed on the acquired data, specifically:
classifying and integrating the obtained original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Specifically, the payment account a001 is used as an identification identifier, and the generated information record table for the specific identification identifier may be as shown in table 1:
TABLE 1
In a specific embodiment, if the original data of a plurality of identification marks needs to be processed, for example, in addition to the original data of the payment account a001, the original data of the payment account a002 is also processed, and the generated information record table may be expanded longitudinally, as shown in table 2:
TABLE 2
And 102, associating the attribute information with the most frequent occurrence frequency in each service scene in the original data as associated information with the identification mark to generate an associated information string.
The method specifically associates attribute information, which appears most frequently in each service scene in original data, with the identification identifier as associated information to generate an associated information string, and specifically includes:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as the initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
In the specific processing process, a specific identification is taken as the payment account b001 as an example, for example, two service scenes exist in the original data corresponding to b001, such as a login scene and an authentication scene. In the login scene, the appearing attribute information is the mobile phone number: 188 × 8254 (300 occurrences); in the authentication scenario, the presented attribute information is: identity card: 3301081975 × 7598 (10 occurrences); name Zhang III (the number of occurrences is 20); the bank card number 40065 × 5874153 (number of occurrences 51).
In this case, in the login scenario, the most frequent number of occurrences is the mobile phone number: 188 × 8254 (300 occurrences); in the authentication scenario, the most frequent occurrence is the bank card number 40065 × 5874153 (the occurrence frequency is 51). And continuously comparing the mobile phone numbers: 188 × 8254 and the bank card number 40065 × 5874153, the mobile phone number: 188 × 8254 is the most frequently occurring original data corresponding to b001, and therefore the mobile phone number: 188 × 8254 is associated with the identification mark as the associated information, and the generated associated information string may be as shown in table 3.
TABLE 3
Payment account number
|
Mobile phone number
|
b001
|
188****8254 |
And 103, sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information of the corresponding type under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
The specific process comprises the following steps: sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Still by way of the foregoing example, the association information string shown in table 3 has been generated, in this case, the types of attribute information to be associated are sequentially selected according to a preset association order, for example, the order may be name-bank card-id card, the specific order and the number of attribute information to be associated may be set based on needs, and the description is given by taking the type of attribute information to be associated as a name as an example.
Judging the attribute information corresponding to the type of the name, as shown in table 1, only zhang san corresponding to the type of the name, because only this attribute information can fully explain the credibility of zhang san, zhang san can be directly associated with the associated information string as shown in table 3.
If the name corresponds to a type of things other than zhang-three but also lie-four, subsequent processing is required, specifically, a conditional probability is first calculated, and a formula of the conditional probability is P (a | B) ═ P (ab)/P (B), where P (a | B) represents an occurrence probability of the event a under a condition that another event B has occurred.
In this specific embodiment, it is assumed that two service scenarios occur, one is a login scenario and the other is an authentication scenario, and here, taking the login scenario as an example, P (aji B) represents a payment account: b001 and mobile phone number: 188 × 8254 in case of simultaneous appearance in landing scene, name: probability of Zhang III appearing in landing scene;
p (ab) denotes payment account: b001, mobile phone number: 188 × 8254 and name: probability of simultaneous occurrence of Zhang III;
p (b) denotes payment account number: b001 and the probability of the simultaneous occurrence of the mobile phone number 188 x 8254 in the landing scene.
If P (a | B) | 0.2 is calculated in the login scenario, the authentication scenario is processed similarly as described above, and if it is determined that P (a | B) | 0.3 is determined in the authentication scenario, and different traffic scenarios are associated with different weights, for example, the weight of the login scenario is 0.6, and the weight of the authentication scenario is 0.5, then two judgment sub-values of zhang san are 0.2 × 0.6 | -0.12 and 0.3 × 0.5 | -0.15, respectively, and the final judgment value is 0.12+0.15 | -0.27.
As for the name: li IV, the above and name are carried out: the same operation of zhang san, assuming that the final judgment value is 0.26, since 0.26 is less than 0.27, the name: zhang III is the association information and the associated information string shown in Table 3 are associated, and the generated associated information string is shown in Table 4.
TABLE 4
Payment account number
|
Mobile phone number
|
Name (I)
|
b001
|
188****8254
|
Zhang three |
If other attribute information, such as a bank card, needs to be associated, the above operation may be performed based on the newly generated association information string shown in table 4 until all the attribute information needing to be associated in the preset association sequence is associated. The information association is realized by associating the information strings.
In addition, it is assumed that the association information string whose current generation time is latest includes N pieces of attribute information; the specific process of calculating the conditional probability in step 103 is thus as follows:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
Specifically, assuming that the associated information string with the latest current generation time includes 6 attribute information, first determining each determined attribute information (corresponding to the type of the attribute information to be associated), assuming that the type is a bank card number, corresponding to the bank card number 1 and the bank card number 2, determining, for the bank card number 1, whether the bank card number 1 exists with the 6 attribute information in the same service scenario, if so, calculating conditional probability based on the determination, and the specific calculation mode is shown in step 103.
If the judgment result is negative, judging whether the bank card number 1 and any 5 attribute information in the 6 attribute information exist at the same time in the same service scene, and if the judgment result is positive, calculating the conditional probability based on the judgment result. If the judgment result is negative, judging whether the bank card number 1 and any 4 attribute information in the 6 attribute information exist at the same time in the same service scene, and so on until the conditional probability can be calculated.
Since the case on which the calculation of the conditional probability is based may be different, the process of calculating the judgment sub-value and the judgment value is performed in step 103 on the premise that the calculation of the conditional probability is based on the same case.
Specifically, for example, for the bank card number 1, the conditional probability 1 in the sub-value 1 is determined based on the calculation that the bank card number 1 and 6 attribute information exist at the same time in the business scenario 1, and the conditional probability 2 in the sub-value 2 is determined based on the calculation that the bank card number 1 and 5 attribute information exist at the same time in the business scenario 2, so that the sub-value 1 and the sub-value 2 are determined to be unable to be combined to generate the determination value.
In addition, for the bank card 1, if the conditional probabilities in the two judgment sub-values are calculated based on the simultaneous existence of 6 attribute information in the same service scene, and if the bank card 2 corresponds to 3 judgment sub-values, and the conditional probabilities in the 3 judgment sub-values are calculated based on the simultaneous existence of 5 attribute information in the same service scene, in this case, it is not necessary to specifically judge the size of the finally obtained judgment value, and the bank card 1 can be directly associated as the association information with the association information string with the latest current generation time. Similarly, if the conditional probability in the judgment value a is calculated based on the simultaneous existence of N attribute information in the same service scene, and the conditional probability in the judgment value b is calculated based on the simultaneous existence of N attribute information in the same service scene, where N > N, it is not necessary to compare the specific values, the judgment value a is higher than the judgment value b, and the attribute information corresponding to the judgment value a is associated with the associated information string with the latest current generation time as the associated information.
To further illustrate the technical idea of the present invention, a technical solution of the present invention is now described with reference to a specific application scenario, and an embodiment of the present application further discloses an information association device, as shown in fig. 2, including:
an obtaining module 201, configured to obtain original data including an identification and attribute information that have an association relationship in different service scenarios;
a first association module 202, configured to associate attribute information that appears most frequently in each service scene in the original data as association information with the identification identifier, and generate an association information string;
the second association module 203 is configured to sequentially select types of attribute information to be associated according to a preset association sequence, associate attribute information with a maximum judgment value in each attribute information corresponding to the type in each service scene in the original data as association information with an association information string with a latest current generation time, and generate an association information string to implement information association.
Specifically, the apparatus further includes:
the processing module is used for carrying out classification integration processing on the acquired original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
The first association module 202 is specifically configured to:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
The second association module 203 is specifically configured to:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Specifically, the associated information string with the latest current generation time includes N attribute information;
the second association module 203 calculates the conditional probability that each determined attribute information appears simultaneously with the attribute information in the association information string with the latest current generation time in each service scene, and specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
The method comprises the steps of acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes; taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string; and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association. The method and the device realize accurate association of the user information on the premise of reducing equipment load and resource consumption.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present invention.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned invention numbers are merely for description and do not represent the merits of the implementation scenarios.
The above disclosure is only a few specific implementation scenarios of the present invention, however, the present invention is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.