[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106933829B - Information association method and device - Google Patents

Information association method and device Download PDF

Info

Publication number
CN106933829B
CN106933829B CN201511017699.3A CN201511017699A CN106933829B CN 106933829 B CN106933829 B CN 106933829B CN 201511017699 A CN201511017699 A CN 201511017699A CN 106933829 B CN106933829 B CN 106933829B
Authority
CN
China
Prior art keywords
attribute information
information
association
string
service scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201511017699.3A
Other languages
Chinese (zh)
Other versions
CN106933829A (en
Inventor
杜玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201511017699.3A priority Critical patent/CN106933829B/en
Publication of CN106933829A publication Critical patent/CN106933829A/en
Application granted granted Critical
Publication of CN106933829B publication Critical patent/CN106933829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an information association method and equipment, wherein the method comprises the following steps: acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes; taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string; and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association. The method and the device realize accurate association of the user information on the premise of reducing equipment load and resource consumption.

Description

Information association method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an information association method. The application also relates to information correlation equipment.
Background
Information aggregation is a method for establishing one-to-one correspondence relationship and mapping between identifiable identity attributes and main behavior information thereof and a specified account. With the continuous development of internet technology, people can move through different business scenes, and a lot of fragmented personal information data can be left in the data, so that how to aggregate a plurality of personal information data of the same user is very necessary.
However, when different information is aggregated and spliced in the face of a large-scale information carrier (such as a payment account of a user), the information storage table is often very large. If the large tables are transversely spliced according to the existing splicing processing mode, the processing capacity of the current equipment is difficult to bear the huge load, and the basic operation cannot be realized or the operation efficiency is very low.
In addition, because the current information aggregation mode is to process and judge information separately, the relevance between different information is already broken when each information is processed separately, and therefore, the existing information aggregation may splice together information that does not appear together when splicing, resulting in low accuracy.
Therefore, how to accurately correlate the user information on the premise of reducing the equipment load and the resource consumption becomes a technical problem to be solved by the technical personnel in the field.
Disclosure of Invention
The application provides an information association method and equipment, which are used for accurately associating information of a user on the premise of reducing equipment load and resource consumption; the specific application provides an information association method, which comprises the following steps:
acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string;
and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
Optionally, after obtaining the original data including the identification and the attribute information that have an association relationship in each of the different service scenarios, the method further includes:
classifying and integrating the obtained original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Optionally, the associating the attribute information with the most frequent occurrence frequency in each service scene in the original data with the identification identifier to generate an associated information string, specifically including:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
Optionally, the method includes sequentially selecting types of attribute information to be associated according to a preset association sequence, associating attribute information with a maximum judgment value in attribute information corresponding to the types in each service scene in the original data with an associated information string with a latest current generation time as associated information, and generating an associated information string to implement information association, where the method specifically includes:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Optionally, the associated information string with the latest current generation time includes N attribute information;
the calculating of the conditional probability that each attribute information determined by the calculation appears at the same time with the attribute information in the associated information string with the latest current generation time in each service scene specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
The present application further provides an information associating device, including:
the acquisition module is used for acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
the first generation module is used for associating attribute information with the largest frequency of occurrence in each service scene in the original data with the identification mark as associated information to generate an associated information string;
and the second generation module is used for sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
Optionally, the apparatus further comprises:
the processing module is used for carrying out classification integration processing on the acquired original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Optionally, the first generating module is specifically configured to:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
Optionally, the second generating module is specifically configured to:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Optionally, the associated information string with the latest current generation time includes N attribute information;
the second generation module calculates conditional probabilities that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene, and specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated
Compared with the prior art, the method and the device have the advantages that original data containing the identification marks and the attribute information which have incidence relations under different service scenes are obtained; taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string; and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association. The method and the device realize accurate association of the user information on the premise of reducing equipment load and resource consumption.
Drawings
Fig. 1 is a schematic flowchart of an information association method disclosed in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an information association device disclosed in an embodiment of the present application.
Detailed Description
To overcome the defects in the prior art, an embodiment of the present application discloses an information association method, which implements accurate association of user information on the premise of reducing equipment load and resource consumption, and specifically, as shown in fig. 1, the method includes the following steps:
step 101, acquiring original data containing identification marks and attribute information which have incidence relations in different service scenes.
In a specific embodiment, for example, a database of a shopping website, there are possible login, authentication, logistics, etc. in a business scenario, and since various information in the database of the shopping website is carried by an account, the account can be set as an identification. Of course, the identifier is not limited to this, and for example, if a database of a mobile operator is obtained, and the data in the database is a mobile communication number as a carrier, the mobile communication number may also be used as the identifier. The original data refers to data associated with the same identification identifier, for example, the identification identifier is an account 1, and the acquired original data includes the account 1 and attribute information associated in various different service scenarios.
After the original data is acquired, in order to facilitate subsequent data extraction, classification and integration processing may be performed on the acquired data, specifically:
classifying and integrating the obtained original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
Specifically, the payment account a001 is used as an identification identifier, and the generated information record table for the specific identification identifier may be as shown in table 1:
TABLE 1
Figure BDA0000894373200000071
In a specific embodiment, if the original data of a plurality of identification marks needs to be processed, for example, in addition to the original data of the payment account a001, the original data of the payment account a002 is also processed, and the generated information record table may be expanded longitudinally, as shown in table 2:
TABLE 2
Figure BDA0000894373200000081
And 102, associating the attribute information with the most frequent occurrence frequency in each service scene in the original data as associated information with the identification mark to generate an associated information string.
The method specifically associates attribute information, which appears most frequently in each service scene in original data, with the identification identifier as associated information to generate an associated information string, and specifically includes:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as the initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
In the specific processing process, a specific identification is taken as the payment account b001 as an example, for example, two service scenes exist in the original data corresponding to b001, such as a login scene and an authentication scene. In the login scene, the appearing attribute information is the mobile phone number: 188 × 8254 (300 occurrences); in the authentication scenario, the presented attribute information is: identity card: 3301081975 × 7598 (10 occurrences); name Zhang III (the number of occurrences is 20); the bank card number 40065 × 5874153 (number of occurrences 51).
In this case, in the login scenario, the most frequent number of occurrences is the mobile phone number: 188 × 8254 (300 occurrences); in the authentication scenario, the most frequent occurrence is the bank card number 40065 × 5874153 (the occurrence frequency is 51). And continuously comparing the mobile phone numbers: 188 × 8254 and the bank card number 40065 × 5874153, the mobile phone number: 188 × 8254 is the most frequently occurring original data corresponding to b001, and therefore the mobile phone number: 188 × 8254 is associated with the identification mark as the associated information, and the generated associated information string may be as shown in table 3.
TABLE 3
Payment account number Mobile phone number
b001 188****8254
And 103, sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information of the corresponding type under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association.
The specific process comprises the following steps: sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Still by way of the foregoing example, the association information string shown in table 3 has been generated, in this case, the types of attribute information to be associated are sequentially selected according to a preset association order, for example, the order may be name-bank card-id card, the specific order and the number of attribute information to be associated may be set based on needs, and the description is given by taking the type of attribute information to be associated as a name as an example.
Judging the attribute information corresponding to the type of the name, as shown in table 1, only zhang san corresponding to the type of the name, because only this attribute information can fully explain the credibility of zhang san, zhang san can be directly associated with the associated information string as shown in table 3.
If the name corresponds to a type of things other than zhang-three but also lie-four, subsequent processing is required, specifically, a conditional probability is first calculated, and a formula of the conditional probability is P (a | B) ═ P (ab)/P (B), where P (a | B) represents an occurrence probability of the event a under a condition that another event B has occurred.
In this specific embodiment, it is assumed that two service scenarios occur, one is a login scenario and the other is an authentication scenario, and here, taking the login scenario as an example, P (aji B) represents a payment account: b001 and mobile phone number: 188 × 8254 in case of simultaneous appearance in landing scene, name: probability of Zhang III appearing in landing scene;
p (ab) denotes payment account: b001, mobile phone number: 188 × 8254 and name: probability of simultaneous occurrence of Zhang III;
p (b) denotes payment account number: b001 and the probability of the simultaneous occurrence of the mobile phone number 188 x 8254 in the landing scene.
If P (a | B) | 0.2 is calculated in the login scenario, the authentication scenario is processed similarly as described above, and if it is determined that P (a | B) | 0.3 is determined in the authentication scenario, and different traffic scenarios are associated with different weights, for example, the weight of the login scenario is 0.6, and the weight of the authentication scenario is 0.5, then two judgment sub-values of zhang san are 0.2 × 0.6 | -0.12 and 0.3 × 0.5 | -0.15, respectively, and the final judgment value is 0.12+0.15 | -0.27.
As for the name: li IV, the above and name are carried out: the same operation of zhang san, assuming that the final judgment value is 0.26, since 0.26 is less than 0.27, the name: zhang III is the association information and the associated information string shown in Table 3 are associated, and the generated associated information string is shown in Table 4.
TABLE 4
Payment account number Mobile phone number Name (I)
b001 188****8254 Zhang three
If other attribute information, such as a bank card, needs to be associated, the above operation may be performed based on the newly generated association information string shown in table 4 until all the attribute information needing to be associated in the preset association sequence is associated. The information association is realized by associating the information strings.
In addition, it is assumed that the association information string whose current generation time is latest includes N pieces of attribute information; the specific process of calculating the conditional probability in step 103 is thus as follows:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
Specifically, assuming that the associated information string with the latest current generation time includes 6 attribute information, first determining each determined attribute information (corresponding to the type of the attribute information to be associated), assuming that the type is a bank card number, corresponding to the bank card number 1 and the bank card number 2, determining, for the bank card number 1, whether the bank card number 1 exists with the 6 attribute information in the same service scenario, if so, calculating conditional probability based on the determination, and the specific calculation mode is shown in step 103.
If the judgment result is negative, judging whether the bank card number 1 and any 5 attribute information in the 6 attribute information exist at the same time in the same service scene, and if the judgment result is positive, calculating the conditional probability based on the judgment result. If the judgment result is negative, judging whether the bank card number 1 and any 4 attribute information in the 6 attribute information exist at the same time in the same service scene, and so on until the conditional probability can be calculated.
Since the case on which the calculation of the conditional probability is based may be different, the process of calculating the judgment sub-value and the judgment value is performed in step 103 on the premise that the calculation of the conditional probability is based on the same case.
Specifically, for example, for the bank card number 1, the conditional probability 1 in the sub-value 1 is determined based on the calculation that the bank card number 1 and 6 attribute information exist at the same time in the business scenario 1, and the conditional probability 2 in the sub-value 2 is determined based on the calculation that the bank card number 1 and 5 attribute information exist at the same time in the business scenario 2, so that the sub-value 1 and the sub-value 2 are determined to be unable to be combined to generate the determination value.
In addition, for the bank card 1, if the conditional probabilities in the two judgment sub-values are calculated based on the simultaneous existence of 6 attribute information in the same service scene, and if the bank card 2 corresponds to 3 judgment sub-values, and the conditional probabilities in the 3 judgment sub-values are calculated based on the simultaneous existence of 5 attribute information in the same service scene, in this case, it is not necessary to specifically judge the size of the finally obtained judgment value, and the bank card 1 can be directly associated as the association information with the association information string with the latest current generation time. Similarly, if the conditional probability in the judgment value a is calculated based on the simultaneous existence of N attribute information in the same service scene, and the conditional probability in the judgment value b is calculated based on the simultaneous existence of N attribute information in the same service scene, where N > N, it is not necessary to compare the specific values, the judgment value a is higher than the judgment value b, and the attribute information corresponding to the judgment value a is associated with the associated information string with the latest current generation time as the associated information.
To further illustrate the technical idea of the present invention, a technical solution of the present invention is now described with reference to a specific application scenario, and an embodiment of the present application further discloses an information association device, as shown in fig. 2, including:
an obtaining module 201, configured to obtain original data including an identification and attribute information that have an association relationship in different service scenarios;
a first association module 202, configured to associate attribute information that appears most frequently in each service scene in the original data as association information with the identification identifier, and generate an association information string;
the second association module 203 is configured to sequentially select types of attribute information to be associated according to a preset association sequence, associate attribute information with a maximum judgment value in each attribute information corresponding to the type in each service scene in the original data as association information with an association information string with a latest current generation time, and generate an association information string to implement information association.
Specifically, the apparatus further includes:
the processing module is used for carrying out classification integration processing on the acquired original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
The first association module 202 is specifically configured to:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
The second association module 203 is specifically configured to:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
Specifically, the associated information string with the latest current generation time includes N attribute information;
the second association module 203 calculates the conditional probability that each determined attribute information appears simultaneously with the attribute information in the association information string with the latest current generation time in each service scene, and specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
The method comprises the steps of acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes; taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string; and sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association. The method and the device realize accurate association of the user information on the premise of reducing equipment load and resource consumption.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present invention.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned invention numbers are merely for description and do not represent the merits of the implementation scenarios.
The above disclosure is only a few specific implementation scenarios of the present invention, however, the present invention is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims (8)

1. An information association method, comprising:
acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
taking attribute information which appears most frequently in each service scene in the original data as associated information to be associated with the identification mark, and generating an associated information string;
sequentially selecting types of attribute information to be associated according to a preset association sequence, associating attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with an association information string with the latest current generation time, and generating an association information string to realize information association;
the method includes the steps of sequentially selecting types of attribute information to be associated according to a preset association sequence, associating attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data with an associated information string with the latest current generation time as associated information, and generating the associated information string to realize information association, and specifically includes the steps of:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
2. The method of claim 1, wherein after obtaining the original data including the identification and the attribute information associated with different service scenarios, the method further comprises:
classifying and integrating the obtained original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
3. The method according to claim 1 or 2, wherein associating attribute information that appears most frequently in each service scenario in the original data with the identification identifier as association information to generate an association information string specifically includes:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
4. The method according to claim 1, wherein the association information string whose current generation time is latest includes N pieces of attribute information;
the calculating of the conditional probability that each attribute information determined by the calculation appears at the same time with the attribute information in the associated information string with the latest current generation time in each service scene specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
5. An information association device, comprising:
the acquisition module is used for acquiring original data containing identification marks and attribute information which have incidence relations under different service scenes;
the first generation module is used for associating attribute information with the largest frequency of occurrence in each service scene in the original data with the identification mark as associated information to generate an associated information string;
the second generation module is used for sequentially selecting the types of the attribute information to be associated according to a preset association sequence, associating the attribute information with the maximum judgment value in the attribute information corresponding to the types under each service scene in the original data as association information with the association information string with the latest current generation time, and generating the association information string to realize information association;
the second generation module is specifically configured to:
sequentially selecting the types of the attribute information to be associated according to a preset association sequence;
determining each attribute information corresponding to the currently selected type in the original data;
if only one attribute information corresponding to the currently selected type is available, selecting the attribute information as the associated information to be associated with the associated information string with the latest current generation time, and generating the associated information string to realize information association;
if a plurality of attribute information corresponding to the currently selected types exist, calculating the conditional probability that each determined attribute information appears simultaneously with the attribute information in the associated information string with the latest current generation time in each service scene;
determining a judgment sub-value of each attribute information under each service scene based on the product of the conditional probability and the preset weight of the corresponding service scene;
summarizing judgment sub-values corresponding to the same attribute information to determine judgment values of the attribute information;
and comparing the judgment values of the attribute information, determining the attribute information with the maximum judgment value, associating the attribute information with the maximum judgment value as associated information with the associated information string with the latest current generation time, and generating the associated information string to realize information association.
6. The apparatus of claim 5, further comprising:
the processing module is used for carrying out classification integration processing on the acquired original data according to different service scenes to generate an information recording list; the information recording table comprises the name and content of attribute information associated with the identification identifier, and the name, time and times of a service scene when the identification identifier is associated with the attribute information.
7. The device according to claim 6 or 5, wherein the first generating module is specifically configured to:
determining the occurrence frequency of each attribute information in the original data under each service scene;
determining attribute information with the maximum occurrence frequency in each service scene by comparing the occurrence frequency of each attribute information in the same service scene;
determining the attribute information with the maximum occurrence frequency in the original data as initial attribute information to be associated by comparing the occurrence frequency of the attribute information with the maximum occurrence frequency in each service scene;
and associating the initial attribute information to be associated with the identification mark as associated information to generate an associated information string.
8. The apparatus according to claim 5, wherein the association information string whose current generation time is latest includes N pieces of attribute information;
the second generation module calculates conditional probabilities that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene, and specifically includes the following steps:
step A, judging whether the determined attribute information exists at the same time with N attribute information in the associated information string with the latest current generation time in each service scene;
b, if the judgment result is yes, calculating the conditional probability that each determined attribute information exists simultaneously with N attribute information in the associated information string with the latest current generation time in each service scene;
and C, if the judgment result is negative, setting N to be N-1, and repeating the step A until the conditional probability that the determined attribute information and the attribute information in the associated information string with the latest current generation time appear at the same time in each service scene is calculated.
CN201511017699.3A 2015-12-29 2015-12-29 Information association method and device Active CN106933829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511017699.3A CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511017699.3A CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Publications (2)

Publication Number Publication Date
CN106933829A CN106933829A (en) 2017-07-07
CN106933829B true CN106933829B (en) 2020-08-04

Family

ID=59442286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511017699.3A Active CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Country Status (1)

Country Link
CN (1) CN106933829B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580309A (en) * 2019-08-14 2019-12-17 阿里巴巴集团控股有限公司 personal information display device method, device and equipment based on block chain type account book
CN110968785B (en) * 2019-11-26 2023-03-14 腾讯科技(深圳)有限公司 Target account identification method and device, storage medium and electronic device
CN111680248A (en) * 2020-04-28 2020-09-18 五八有限公司 Method and device for generating batch number of message pushed

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620625A (en) * 2009-07-30 2010-01-06 腾讯科技(深圳)有限公司 Method, device and search engine for sequencing searching keywords
CN102368788A (en) * 2011-12-09 2012-03-07 中国电信股份有限公司 Information pushing method and apparatus thereof
CN104573094A (en) * 2015-01-30 2015-04-29 深圳市华傲数据技术有限公司 Online account recognizing and matching method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8892571B2 (en) * 2004-10-12 2014-11-18 International Business Machines Corporation Systems for associating records in healthcare database with individuals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620625A (en) * 2009-07-30 2010-01-06 腾讯科技(深圳)有限公司 Method, device and search engine for sequencing searching keywords
CN102368788A (en) * 2011-12-09 2012-03-07 中国电信股份有限公司 Information pushing method and apparatus thereof
CN104573094A (en) * 2015-01-30 2015-04-29 深圳市华傲数据技术有限公司 Online account recognizing and matching method

Also Published As

Publication number Publication date
CN106933829A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
US10410128B2 (en) Method, device, and server for friend recommendation
CN107436875B (en) Text classification method and device
CN106682906B (en) Risk identification and service processing method and equipment
US11003896B2 (en) Entity recognition from an image
WO2019024496A1 (en) Enterprise recommendation method and application server
EP3971798A1 (en) Data processing method and apparatus, and computer readable storage medium
CN110941598A (en) Data deduplication method, device, terminal and storage medium
CN107733869A (en) A kind of device identification method and device
WO2021036453A1 (en) Method and device for user identification, and computer device
CN106817390B (en) User data sharing method and device
CN106933829B (en) Information association method and device
CN110990541A (en) Method and device for realizing question answering
US20140052497A1 (en) Correlating location data
US20160248724A1 (en) Social Message Monitoring Method and Apparatus
CN111444364B (en) Image detection method and device
CN114528916A (en) Sample clustering processing method, device, equipment and storage medium
CN104573132A (en) Method and device for finding songs
CN106169979B (en) Service processing method and equipment
CN116703141A (en) Audit data processing method, audit data processing device, computer equipment and storage medium
CN105512270A (en) Method and device for determining related objects
CN118052223A (en) Method, device, equipment and storage medium for generating sensitive data identification model
CN115239066A (en) Communication informationization data management and control platform
CN114970495A (en) Name disambiguation method and device, electronic equipment and storage medium
CN117009430A (en) Data management method, device, storage medium and electronic equipment
CN107480271B (en) Crowd image drawing method and system based on sampling search and index search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1238738

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.