WO2021000084A1 - 数据分类方法及相关产品 - Google Patents
数据分类方法及相关产品 Download PDFInfo
- Publication number
- WO2021000084A1 WO2021000084A1 PCT/CN2019/093971 CN2019093971W WO2021000084A1 WO 2021000084 A1 WO2021000084 A1 WO 2021000084A1 CN 2019093971 W CN2019093971 W CN 2019093971W WO 2021000084 A1 WO2021000084 A1 WO 2021000084A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- target
- data
- application
- ids
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000004891 communication Methods 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 235000014510 cooky Nutrition 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- AMGQUBHHOARCQH-UHFFFAOYSA-N indium;oxotin Chemical compound [In].[Sn]=O AMGQUBHHOARCQH-UHFFFAOYSA-N 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Definitions
- This application relates to the field of communication technology, and specifically relates to a data classification method and related products.
- the embodiments of the present application provide a data classification method and related products, which can improve friend classification efficiency and improve user experience.
- a data classification method includes:
- ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each user ID;
- the multiple user IDs are divided into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
- an embodiment of the present application provides a data classification device, and the device includes:
- An acquiring unit for acquiring application data of a target application of a target object, and acquiring a target user ID of the target object;
- the extraction unit is used for ID extraction of the application data to obtain multiple user IDs and associated data corresponding to each user ID;
- the bucket processing unit is configured to perform bucket processing on the associated data of the multiple user IDs by using a local sensitive hash algorithm to obtain multiple buckets, and each bucket includes associated data of at least one user ID;
- the dividing unit is configured to divide the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
- an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured by Executed by a processor, and the foregoing program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
- an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
- embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect.
- the computer program product may be a software installation package.
- FIG. 1A is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 1B is a schematic flowchart of a data classification method disclosed in an embodiment of the present application.
- FIG. 1C is a schematic diagram of a data classification method disclosed in an embodiment of the present application.
- FIG. 1D is a schematic diagram showing the structure of a user portrait disclosed in an embodiment of the present application.
- FIG. 1E is a schematic diagram of a locality sensitive hash algorithm disclosed in an embodiment of the present application.
- FIG. 1F is another schematic diagram showing the locality sensitive hash algorithm disclosed in the embodiment of the present application.
- FIG. 2 is a schematic flowchart of another data classification method disclosed in an embodiment of the present application.
- FIG. 3 is a schematic flowchart of another data classification method disclosed in an embodiment of the present application.
- Fig. 4 is a schematic structural diagram of another electronic device disclosed in an embodiment of the present application.
- 5A is a schematic structural diagram of a data classification device disclosed in an embodiment of the present application.
- Fig. 5B is a schematic structural diagram of another data classification device disclosed in an embodiment of the present application.
- Fig. 5C is a schematic structural diagram of another data classification device disclosed in an embodiment of the present application.
- the electronic devices involved in the embodiments of the present application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (user equipment).
- equipment UE
- mobile station MS
- smart home equipment smart TV, smart air conditioner, smart range hood, smart fan, smart wheelchair, smart dining table, etc.
- the above-mentioned devices are collectively referred to as electronic devices, and the above-mentioned electronic devices may also be servers, service platforms, and so on.
- FIG. 1A is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
- the electronic device 100 may include a control circuit, and the control circuit may include a storage and processing circuit 110.
- the storage and processing circuit 110 can be memory, such as hard disk drive memory, non-volatile memory (such as flash memory or other electronic programmable read-only memory used to form a solid-state drive, etc.), volatile memory (such as static or dynamic random access memory). Access to memory, etc.), etc., are not limited in the embodiment of the present application.
- the processing circuit in the storage and processing circuit 110 may be used to control the operation of the electronic device 100.
- the processing circuit can be implemented based on one or more microprocessors, microcontrollers, baseband processors, power management units, audio codec chips, application specific integrated circuits, display driver integrated circuits, etc.
- the storage and processing circuit 110 can be used to run software in the electronic device 100, such as Internet browsing applications, voice over internet protocol (VOIP) phone call applications, email applications, media playback applications, and operating system functions Wait. These softwares can be used to perform some control operations, for example, camera-based image capture, ambient light measurement based on ambient light sensors, proximity sensor measurement based on proximity sensors, and information based on status indicators such as LED status indicators Display functions, touch event detection based on touch sensors, functions associated with displaying information on multiple (eg layered) displays, operations associated with performing wireless communication functions, operations associated with collecting and generating audio signals , The control operations associated with the collection and processing of button press event data, and other functions in the electronic device 100, are not limited in the embodiment of the present application.
- the electronic device 100 may further include an input-output circuit 150.
- the input-output circuit 150 can be used to enable the electronic device 100 to implement data input and output, that is, allow the electronic device 100 to receive data from an external device and also allow the electronic device 100 to output data from the electronic device 100 to the external device.
- the input-output circuit 150 may further include a sensor 170.
- the sensor 170 may include an ambient light sensor, a proximity sensor based on light and capacitance, and a touch sensor (for example, a light-based touch sensor and/or a capacitive touch sensor, where the touch sensor may be a part of a touch screen, or may be used as a The touch sensor structure is used independently), acceleration sensor, gravity sensor, and other sensors.
- the input-output circuit 150 may also include one or more displays, such as the display 130.
- the display 130 may include one or a combination of a liquid crystal display, an organic light emitting diode display, an electronic ink display, a plasma display, and a display using other display technologies.
- the display 130 may include a touch sensor array (ie, the display 130 may be a touch display screen).
- the touch sensor can be a capacitive touch sensor formed by an array of transparent touch sensor electrodes (such as indium tin oxide (ITO) electrodes), or can be a touch sensor formed using other touch technologies, such as sonic touch, pressure-sensitive touch, and resistance Touch, optical touch, etc., are not limited in the embodiment of the present application.
- ITO indium tin oxide
- the audio component 140 may be used to provide audio input and output functions for the electronic device 100.
- the audio component 140 in the electronic device 100 may include a speaker, a microphone, a buzzer, a tone generator, and other components for generating and detecting sounds.
- the communication circuit 120 may be used to provide the electronic device 100 with the ability to communicate with external devices.
- the communication circuit 120 may include analog and digital input-output interface circuits, and wireless communication circuits based on radio frequency signals and/or optical signals.
- the wireless communication circuit in the communication circuit 120 may include a radio frequency transceiver circuit, a power amplifier circuit, a low noise amplifier, a switch, a filter, and an antenna.
- the wireless communication circuit in the communication circuit 120 may include a circuit for supporting near field communication (NFC) by transmitting and receiving near-field coupled electromagnetic signals.
- the communication circuit 120 may include a near field communication antenna and a near field communication transceiver.
- the communication circuit 120 may also include a cellular phone transceiver and antenna, a wireless local area network transceiver circuit and antenna, and so on.
- the electronic device 100 may further include a battery, a power management circuit, and other input-output units 160.
- the input-output unit 160 may include buttons, joysticks, click wheels, scroll wheels, touch pads, keypads, keyboards, cameras, light emitting diodes, and other status indicators.
- the user can input commands through the input-output circuit 150 to control the operation of the electronic device 100, and can use the output data of the input-output circuit 150 to realize receiving status information and other outputs from the electronic device 100.
- FIG. 1B is a schematic flowchart of a data classification method provided by an embodiment of the present application.
- the data transmission method described in this embodiment is applied to the electronic device as shown in FIG. 1A.
- the data classification method includes:
- the target object can be understood as the owner or other users.
- the target application may be at least one of the following: video application, social application, instant messaging application, shopping application, payment application, game application, navigation application, camera application, financial management application, etc., which are not limited here.
- the target application may be one application or a type of application, the target application may include one or more applications, and the target application may be a third-party application or a system application.
- the application data may include at least one of the following: registration application data, application cache data, or instant messaging data, etc., which are not limited here.
- the application data may include: user cookies, APP browsing behavior
- a user ID such as an identification ID and an account ID, where the nature of the user ID of the user identification can be a device hardware ID or a character identification.
- electronic devices may be used by multiple people.
- IMEI device IMEI
- SSOID Session Object Identity
- oppenid user location data
- Internet behavior data etc.
- a multi-dimensional feature layer and ID-mapping relationship layer can be constructed.
- the multi-code relationship can be used in the natural person recognition layer.
- the letter recognition filtering algorithm and the graph connection algorithm complete the accurate recognition of natural persons, so that the owner can be accurately identified, after all, the owner is still using electronic equipment most of the time.
- the foregoing step 101, obtaining application data of the target application of the target object may include the following steps:
- the aforementioned preset time period can be set by the user or the system defaults.
- the preset time period can be understood as a period of time during which the electronic device has been used recently, or between registering any user ID in at least one user ID and the current time
- the target object may be the owner.
- the user ID may be at least one of the following: phone number, integrated circuit card identity (ICCID), and international mobile equipment identification code (International Mobile Equipment Identity (IMEI), Single Sign On ID (Single Sign On identification, SSOID), third-party application ID, oppenId, etc., are not limited here.
- the electronic device can obtain at least one user ID of the target application that has been used in the electronic device, and further, can determine the target application data in the electronic device according to the at least one user ID.
- the electronic device can store the target application related data. All data in, for example, cache data, application running state data, etc. However, in the embodiment of the present application, only application data related to the user ID may be extracted.
- step 101 when the at least one user ID is a natural person ID, before step 101, the following steps may be further included:
- A1 Acquire historical usage data of the target application of the electronic device corresponding to the target object
- A2. Construct a multi-dimensional feature layer and ID-mapping relationship layer according to the historical usage data
- A3. Determine a natural person ID according to the multi-dimensional feature layer and the ID-mapping relationship layer, and use the natural person ID as the target user ID.
- historical user data can be understood as the use data corresponding to the current time when the user first used the target application on the electronic device, or all use data corresponding to at least one user ID of the target object.
- the historical use data may include the following At least one: registration application data, application cache data or instant messaging data, etc., which are not limited here.
- application data may include: user's cookie, APP-side browsing behavior identification ID, and account ID and other user identification User ID
- the aforementioned user data may also be at least one of the following: CPU operating frequency, CPU core number, CPU operating mode, GPU frame rate, GPU resolution, device brightness, device sound, memory parameters, some or all of the parameters.
- the nature of the user ID of the user identity identification can be a device hardware ID or a character identification.
- the electronic device can obtain historical usage data of the target application corresponding to the target object.
- the historical usage data can be obtained from a data source.
- the data source can include at least one of the following: browser, software store, account System, AutoNavi data, shopping data, communication data, game data, social data, office data, smart home data, etc. are not limited here.
- the ID-MAPPing relationship layer data can be obtained based on the historical usage data.
- the ID-MAPPing relationship layer data can include at least one of the following: OSSID ⁇ ->IMEI (the mapping relationship between OSSID and IMEI), TEL ⁇ ->IMEI, OppenId ⁇ ->ICCID, etc., are not limited here.
- Multi-dimensional feature layer data can also be obtained based on historical usage data.
- Multi-dimensional feature layer data can include at least one of the following: device features, APP features, positioning features, etc., which are not limited here
- the multi-dimensional feature layer and the ID-mapping relationship layer it can be determined that each natural person ID can correspond to a user portrait.
- the user portrait can include at least one of the following: demographic attributes, human-land relationship, interest Hobbies, equipment attributes, assets, business interests, etc., are not limited here.
- the above-mentioned device characteristics may include at least one of the following: device attributes (such as equipment daily management, model configuration, activation date, etc.), network connection conditions (such as: WIFI connection, network IP, base station, connection distribution, etc.) ), ID's own attributes (such as ID format, character length, etc.), etc., which are not limited here.
- APP features can include at least one of the following: APP installation, startup, uninstallation, APP type preferences (such as games, applications), APP active periods (working days, holidays, etc.), etc., which are not limited here.
- Positioning features can include the following At least one: location attribute (for example, home or company, resident business district, frequently active place), travel preference (for example, travel mode, travel time, travel frequency, travel trajectory, etc.), POI preference (POI arrival, POI search for).
- the application data may include multiple user IDs, that is, when the user uses the device to communicate with another user, the application data may record the user ID of the other user.
- the associated data may be at least one of the following: user level, point consumption, activity, preference type, online time, online time, operating habits, communication times, communication time, user ID, and so on.
- ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each ID to obtain multiple user IDs, which may include the following steps:
- the above-mentioned preset ID keyword is understood as a keyword in a specific format, for example, user name: xxx, then xxx is a keyword, and the specific format can be defaulted by the system.
- the electronic device can search application data according to preset ID keywords to obtain multiple IDs, and can also integrate multiple IDs to obtain multiple user IDs.
- the specific integration algorithm can be a clustering algorithm or It is a local sensitive hash algorithm, etc., which are not limited here.
- the associated data corresponding to multiple user IDs can be obtained from the application data.
- the associated data can be understood as data related to the user ID, and the multiple user IDs are obtained The associated data corresponding to each user ID.
- locality sensitive hashing is the most popular kind of approximate nearest neighbor search algorithm. It has a solid theoretical basis and performs well in high-dimensional data spaces. Its main function is to dig out similar data from massive data, which can be specifically applied to text similarity detection, web search and other fields. Its basic idea is similar to a spatial domain conversion idea.
- the LSH algorithm is based on a hypothesis. If the texts are similar in the original data space, they also have a high degree of similarity after being transformed by the hash function; on the contrary, if they are not similar themselves, they should still not have similarity after the transformation .
- the electronic device may perform bucket processing on the associated data of multiple user IDs by using a local hash sensitive algorithm to obtain multiple buckets, each of which corresponds to at least one user ID associated data.
- LSH Locality-Sensitive Hashing
- the "user set" where each user is located will be relatively small, because only need to calculate the intimacy of the user set in the bucket, you can reduce the complexity of the user intimacy calculation, and then after sorting, and some rules, you can The corresponding intimacy is used to classify the user's friends.
- the foregoing step 104 dividing the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups, may include the following steps:
- the preset threshold can be set by the user or the system defaults.
- the i-th bucket is any one of the above-mentioned multiple buckets.
- the electronic device can determine the relationship between the user ID and the target user ID according to the associated data of each user ID in the i-th bucket To obtain multiple relevance degrees, and then select a relevance degree greater than a preset threshold from the multiple relevance degrees to obtain at least one target relevance degree.
- the ID corresponding to the at least one target relevance degree can be regarded as a group, That is, some user IDs with high similarity in each bucket are regarded as a group.
- step 41 determining the degree of association between the user ID and the target user ID according to the associated data of each user ID in the i-th bucket, to obtain multiple degrees of association, may include the following steps:
- the first user ID is any user ID in the i-th bucket.
- the electronic device can obtain the associated data of the first user ID, and perform feature extraction on the first user ID.
- the feature set may include at least one of the following: geographic location, communication time period, communication content, communication times, etc., which are not limited here.
- the above-mentioned features of each dimension can be represented by a feature value.
- the degree of association between the first user ID and the target user ID can be determined according to the target feature set. For example, the weight value corresponding to each feature in the target feature set can be determined, and then a weighting operation is performed based on each feature and its corresponding weight value to obtain the correlation degree between the first user ID and the target user ID.
- the target feature set includes feature values of multiple dimensions; the above step 413 determines the degree of association between the first user ID and the target user ID according to the target feature set, which may include The following steps:
- the target feature set may include feature sets of multiple dimensions, and the electronic device may pre-store the weight value corresponding to the feature value of each dimension, and furthermore, the weight value corresponding to each dimension of the feature values of multiple dimensions can be determined.
- the weight values of multiple dimensions are obtained, and further, a weighting operation can be performed according to the feature values of multiple dimensions and the weight values of multiple dimensions to obtain the degree of association between the first user ID and the target user ID.
- the foregoing step 413, determining the degree of association between the first user ID and the target user ID according to the target feature set may include the following steps:
- C2. Determine the degree of association between the first user ID and the target user ID according to the first feature set and the second feature set.
- the target feature set since the target feature set includes both the features of the first user ID and the features of the target user ID, the target feature set can be separated to obtain the first feature set corresponding to the first user ID, and the target The second feature set corresponding to the user ID. There may be an intersection between the first feature set and the second feature set. Both the first feature set and the second feature set may include at least one of the following features: user level, point consumption, activity , Preference type, online time, online time, operating habits, communication times, communication time, user ID, etc., are not limited here. Furthermore, the first user ID and the target user can be determined according to the first feature set and the second feature set The degree of association between IDs.
- the degree of association between the first user ID and the target user ID can be calculated by Euclidean distance, as follows:
- p represents the first feature set
- q represents the second feature set
- i represents any dimension
- the degree of association between the first user ID and the target user ID can be calculated through the Jaccard distance, as follows:
- p represents the first feature set
- q represents the second feature set
- the degree of association between the first user ID and the target user ID can be calculated through the cosine distance, as follows:
- p represents the first feature set
- q represents the second feature set
- each user ID corresponds to at least one tag.
- each user ID can correspond to at least one tag
- the tag can be a tag of a user portrait.
- the label can be at least one of the following: age, occupation, income, hobbies, etc., which are not limited here.
- the electronic device can obtain the label corresponding to the user ID in the group j to obtain multiple labels, and the group j is Any one of the plurality of groups, and further, the tag that appears most frequently among the plurality of tags may be used as the group name of the group j.
- the locality-sensitive hashing algorithm is not only used in user identification and user data classification, but also in many fields that require similarity calculation, such as friend recommendation, document similarity, etc.
- recommendation it can be calculated by LSH For similar users and similar products, it can also save a lot of computing resources, thereby completing accurate recommendations for users and improving recommendation efficiency.
- the data classification method described in the above embodiments of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets The associated data of the user ID divides multiple user IDs into groups to obtain multiple groups.
- multiple user IDs can be extracted from the application data, and the multiple user IDs can be bucketed through the local sensitive hash algorithm Processing, and finally grouping based on the ID in each bucket, can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
- FIG. 2 is a schematic flowchart of another data classification method provided by an embodiment of the present application.
- the data classification method described in this embodiment is applied to the electronic device shown in FIG. 1A , The method may include the following steps:
- the data classification method described in the above embodiments of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets
- the associated data of user ID divides multiple user IDs into groups to obtain multiple groups, obtain the label corresponding to the user ID in group j, and obtain multiple labels, and group j is any of the multiple groups ,
- the tag with the most occurrences among multiple tags is used as the group name of group j.
- the group is divided based on the ID in each bucket, and the group can be named, which can reduce the complexity of calculation, save the corresponding time and computing resources, and improve the efficiency of data classification.
- FIG. 3 is a schematic flowchart of an embodiment of another data classification method provided by an embodiment of this application.
- the data classification method described in this embodiment is applied to the electronic device as shown in FIG. 1A.
- the method may include the following steps:
- the data classification method described in the above embodiments of the present application can first obtain the natural ID of the user object, obtain the application data of the target application of the target object based on the natural ID, and extract the ID of the application data to obtain multiple users.
- ID, and the associated data corresponding to each user ID the associated data of multiple user IDs are divided into buckets through the local sensitive hash algorithm, and multiple buckets are obtained.
- Each bucket includes the associated data of at least one user ID.
- the associated data of the user IDs in each bucket divides multiple user IDs into groups to obtain multiple groups.
- multiple user IDs can be extracted from the application data, and the local sensitive hash algorithm is used for the multiple users ID is divided into buckets, and finally group is divided based on the ID in each bucket, which can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
- FIG. 4 is an electronic device provided by an embodiment of the present application, including: a processor and a memory; and one or more programs, the one or more programs are stored in the In the memory and configured to be executed by the processor, the program includes instructions for executing the following steps:
- ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each user ID;
- the multiple user IDs are divided into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
- the electronic device described in the above embodiment of the application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to the user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets.
- Each bucket includes associated data of at least one user ID based on the users in the multiple buckets.
- the ID-related data divides multiple user IDs into groups to obtain multiple groups.
- the program includes steps for performing the following steps The instructions:
- each of the multiple user IDs corresponds to a natural person
- the program includes: Step instructions:
- the user ID corresponding to the at least one target relevance is regarded as a group.
- the program includes Instructions to perform the following steps:
- the degree of association between the first user ID and the target user ID is determined according to the target feature set.
- the target feature set includes feature values of multiple dimensions
- the program includes instructions for executing the following steps:
- the program includes instructions for executing the following steps:
- the program includes instructions for executing the following steps:
- the program when the at least one user ID is a natural person ID, the program further includes instructions for executing the following steps:
- the natural person ID is determined according to the multi-dimensional feature layer and the ID-mapping relationship layer, and the natural person ID is used as the target user ID.
- each user ID corresponds to at least one tag
- the program further includes instructions for performing the following steps:
- the tag with the most occurrences among the plurality of tags is used as the group name of the group j.
- the electronic device includes hardware structures and/or software modules corresponding to each function.
- this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
- the embodiment of the present application may divide the electronic device into functional units according to the foregoing method examples.
- each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
- FIG. 5A is a schematic structural diagram of a data classification device provided in this embodiment.
- the data classification device is applied to the electronic equipment as shown in FIG. 1A.
- the data classification device includes an acquisition unit 501, an extraction unit 502, a bucket processing unit 503, and a division unit 504, wherein:
- the obtaining unit 501 is configured to obtain application data of a target application of a target object, and obtain a target user ID of the target object;
- the extraction unit 502 is configured to extract ID from the application data to obtain multiple user IDs and associated data corresponding to each user ID;
- the bucket processing unit 503 is configured to perform bucket processing on the associated data of the multiple user IDs by using a local sensitive hash algorithm to obtain multiple buckets, each of which includes at least one user ID associated data;
- the dividing unit 504 is configured to divide the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
- the data classification device described in the above embodiment of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, and extracts the application data to obtain multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets
- the associated data of the user ID divides multiple user IDs into groups to obtain multiple groups.
- multiple user IDs can be extracted from the application data, and the multiple user IDs can be bucketed through the local sensitive hash algorithm Processing, and finally grouping based on the ID in each bucket, can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
- the extraction unit 502 is specifically configured to:
- each of the multiple user IDs corresponds to a natural person
- the dividing unit 504 is specifically configured to :
- the user ID corresponding to the at least one target relevance is regarded as a group.
- the dividing unit 504 is specifically used for:
- the degree of association between the first user ID and the target user ID is determined according to the target feature set.
- the target feature set includes feature values of multiple dimensions
- the dividing unit 504 is collectively configured to:
- the dividing unit 504 is specifically configured to:
- the acquiring unit 501 is specifically configured to:
- FIG. 5B is another modified structure of the data classification method shown in FIG. 5A. Compared with FIG. 5A, it further includes: an establishment unit 505 and a determination unit 506, which are specifically as follows :
- the obtaining unit 501 is further configured to obtain historical usage data of the target application of the electronic device corresponding to the target object;
- the establishing unit 505 is configured to construct a multi-dimensional feature layer and an ID-mapping relationship layer according to the historical usage data;
- the determining unit 506 is configured to determine a natural person ID according to the multi-dimensional feature layer and the ID-mapping relationship layer, and use the natural person ID as the target user ID.
- each user ID corresponds to at least one label, as shown in FIG. 5C.
- FIG. 5C is another device of the data classification method shown in FIG. 5A. Compared with FIG. 5A, it further includes: a selection unit 507, as follows:
- the obtaining unit 501 is configured to obtain a tag corresponding to a user ID in a group j to obtain multiple tags, where the group j is any one of the multiple groups;
- the selection unit 507 is configured to use the tag with the most occurrences among the plurality of tags as the group name of the group j.
- An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes a computer to execute a part of any data transmission method described in the above method embodiment Or all steps.
- the embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method described in the foregoing method embodiment Part or all of the steps of any data transmission method.
- the disclosed device may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be realized in the form of hardware or software program module.
- the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
- the technical solution of the present application essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned memory includes: U disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), mobile hard disk, magnetic disk, or optical disk and other media that can store program codes.
- the program can be stored in a computer-readable memory, and the memory can include: flash disk , ROM, RAM, magnetic disk or CD, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (20)
- 一种数据分类方法,其特征在于,包括:获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
- 根据权利要求1所述的方法,其特征在于,所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID,包括:依据预设ID关键字对所述应用数据进行搜索,得到多个ID;对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
- 根据权利要求1或2所述的方法,其特征在于,所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组,包括:依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;将所述至少一个目标关联度对应的用户ID作为一个群组。
- 根据权利要求3所述的方法,其特征在于,所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,包括:获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;对所述第一用户ID的关联数据进行特征提取,得到目标特征集;依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求4所述的方法,其特征在于,所述目标特征集包括多个维度的特征值;所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,包括:确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求4所述的方法,其特征在于,所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,包括:依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述获取目标对象的目标应用的应用数据,包括:获取所述目标对象的目标应用的至少一个用户ID;依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标应用的应用数据。
- 根据权利要求7所述的方法,其特征在于,在所述至少一个用户ID为自然人ID时,所述方法还包括:获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;依据所述历史使用数据构建出多维特征层和ID-mapping关系层;依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
- 根据权利要求1-8任一项所述的方法,其特征在于,每一用户ID至少对应一个标签,所述方法还包括:获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组;将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
- 一种数据分类装置,其特征在于,所述装置包括:获取单元,用于获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;提取单元,用于对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;分桶处理单元,用于通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;划分单元,用于基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
- 根据权利要求10所述的装置,其特征在于,在所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID方面,所述提取单元具体用于:依据预设ID关键字对所述应用数据进行搜索,得到多个ID;对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
- 根据权利要求10或11所述的装置,其特征在于,在所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组方面,所述划分单元具体用于:依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;将所述至少一个目标关联度对应的用户ID作为一个群组。
- 根据权利要求12所述的装置,其特征在于,在所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度方面,所述划分单元具体用于:获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;对所述第一用户ID的关联数据进行特征提取,得到目标特征集;依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求13所述的装置,其特征在于,所述目标特征集包括多个维度的特征值;在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元集体用于:确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求13所述的装置,其特征在于,在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元具体用于:依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
- 根据权利要求10-15任一项所述的装置,其特征在于,在所述获取目标对象的目标应用的应用数据方面,所述获取单元具体用于:获取所述目标对象的目标应用的至少一个用户ID;依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标 应用的应用数据。
- 根据权利要求16所述的装置,其特征在于,在所述至少一个用户ID为自然人ID时,所述装置还包括:建立单元和确定单元,其中,所述获取单元,还用于获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;所述建立单元,用于依据所述历史使用数据构建出多维特征层和ID-mapping关系层;所述确定单元,用于依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
- 一种电子设备,其特征在于,包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法中的步骤的指令。
- 一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-9任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1-9任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980089586.4A CN113366469A (zh) | 2019-06-29 | 2019-06-29 | 数据分类方法及相关产品 |
PCT/CN2019/093971 WO2021000084A1 (zh) | 2019-06-29 | 2019-06-29 | 数据分类方法及相关产品 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/093971 WO2021000084A1 (zh) | 2019-06-29 | 2019-06-29 | 数据分类方法及相关产品 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021000084A1 true WO2021000084A1 (zh) | 2021-01-07 |
Family
ID=74100206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/093971 WO2021000084A1 (zh) | 2019-06-29 | 2019-06-29 | 数据分类方法及相关产品 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113366469A (zh) |
WO (1) | WO2021000084A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113366469A (zh) * | 2019-06-29 | 2021-09-07 | 深圳市欢太科技有限公司 | 数据分类方法及相关产品 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357302B (zh) * | 2021-12-31 | 2025-06-27 | 广州趣丸网络科技有限公司 | 一种信息存储的方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198418A (zh) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | 一种应用推荐方法和系统 |
CN106357895A (zh) * | 2016-08-31 | 2017-01-25 | 上海斐讯数据通信技术有限公司 | 一种来电提示系统及来电提示方法 |
CN106850924A (zh) * | 2017-01-23 | 2017-06-13 | 北京奇虎科技有限公司 | 通讯录数据处理方法及处理终端 |
CN109255640A (zh) * | 2017-07-13 | 2019-01-22 | 阿里健康信息技术有限公司 | 一种确定用户分组的方法、装置及系统 |
CN109815406A (zh) * | 2019-01-31 | 2019-05-28 | 腾讯科技(深圳)有限公司 | 一种数据处理、信息推荐方法及装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013178286A1 (en) * | 2012-06-01 | 2013-12-05 | Qatar Foundation | A method for processing a large-scale data set, and associated apparatus |
CN104239324B (zh) * | 2013-06-17 | 2019-09-17 | 阿里巴巴集团控股有限公司 | 一种基于用户行为的特征提取、个性化推荐的方法和系统 |
CN106503015A (zh) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | 一种构建用户画像的方法 |
CN106548255A (zh) * | 2016-11-24 | 2017-03-29 | 山东浪潮云服务信息科技有限公司 | 一种基于海量用户行为的商品推荐方法 |
CN113366469A (zh) * | 2019-06-29 | 2021-09-07 | 深圳市欢太科技有限公司 | 数据分类方法及相关产品 |
-
2019
- 2019-06-29 CN CN201980089586.4A patent/CN113366469A/zh active Pending
- 2019-06-29 WO PCT/CN2019/093971 patent/WO2021000084A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198418A (zh) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | 一种应用推荐方法和系统 |
CN106357895A (zh) * | 2016-08-31 | 2017-01-25 | 上海斐讯数据通信技术有限公司 | 一种来电提示系统及来电提示方法 |
CN106850924A (zh) * | 2017-01-23 | 2017-06-13 | 北京奇虎科技有限公司 | 通讯录数据处理方法及处理终端 |
CN109255640A (zh) * | 2017-07-13 | 2019-01-22 | 阿里健康信息技术有限公司 | 一种确定用户分组的方法、装置及系统 |
CN109815406A (zh) * | 2019-01-31 | 2019-05-28 | 腾讯科技(深圳)有限公司 | 一种数据处理、信息推荐方法及装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113366469A (zh) * | 2019-06-29 | 2021-09-07 | 深圳市欢太科技有限公司 | 数据分类方法及相关产品 |
Also Published As
Publication number | Publication date |
---|---|
CN113366469A (zh) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020257990A1 (zh) | 设备推荐方法及相关产品 | |
TWI684148B (zh) | 聯絡人的分組處理方法及裝置 | |
CN110472145B (zh) | 一种内容推荐方法和电子设备 | |
CN113939814B (zh) | 内容推送方法及相关产品 | |
US9241242B2 (en) | Information recommendation method and apparatus | |
CN108280115B (zh) | 识别用户关系的方法及装置 | |
CN109033156B (zh) | 一种信息处理方法、装置及终端 | |
CN111125523B (zh) | 搜索方法、装置、终端设备及存储介质 | |
CN107977431A (zh) | 图像处理方法、装置、计算机设备和计算机可读存储介质 | |
CN108052591A (zh) | 信息推荐方法、装置、移动终端及计算机可读存储介质 | |
CN108121803A (zh) | 一种确定页面布局的方法和服务器 | |
CN111444425B (zh) | 一种信息推送方法、电子设备及介质 | |
CN108205568A (zh) | 基于标签选择数据的方法及装置 | |
CN108399232A (zh) | 一种信息推送方法、装置及电子设备 | |
CN107292235A (zh) | 指纹的采集方法及相关产品 | |
CN104980559A (zh) | 一种设置彩铃、彩铃音乐确定方法及装置 | |
CN108449481A (zh) | 一种联系人信息推荐方法及终端 | |
CN113950817B (zh) | 内容推送方法及相关产品 | |
WO2021000084A1 (zh) | 数据分类方法及相关产品 | |
CN113940033B (zh) | 用户识别方法及相关产品 | |
CN116307394A (zh) | 产品用户体验评分方法、装置、介质及设备 | |
CN108595481A (zh) | 一种通知消息显示方法及终端设备 | |
CN107368998A (zh) | 日程管理方法及相关产品 | |
CN113366523B (zh) | 资源推送方法及相关产品 | |
CN107707719B (zh) | 一种联系人信息的显示方法及移动终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19936318 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19936318 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/05/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19936318 Country of ref document: EP Kind code of ref document: A1 |