[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021000084A1 - 数据分类方法及相关产品 - Google Patents

数据分类方法及相关产品 Download PDF

Info

Publication number
WO2021000084A1
WO2021000084A1 PCT/CN2019/093971 CN2019093971W WO2021000084A1 WO 2021000084 A1 WO2021000084 A1 WO 2021000084A1 CN 2019093971 W CN2019093971 W CN 2019093971W WO 2021000084 A1 WO2021000084 A1 WO 2021000084A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
target
data
application
ids
Prior art date
Application number
PCT/CN2019/093971
Other languages
English (en)
French (fr)
Inventor
郭子亮
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to CN201980089586.4A priority Critical patent/CN113366469A/zh
Priority to PCT/CN2019/093971 priority patent/WO2021000084A1/zh
Publication of WO2021000084A1 publication Critical patent/WO2021000084A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • This application relates to the field of communication technology, and specifically relates to a data classification method and related products.
  • the embodiments of the present application provide a data classification method and related products, which can improve friend classification efficiency and improve user experience.
  • a data classification method includes:
  • ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each user ID;
  • the multiple user IDs are divided into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
  • an embodiment of the present application provides a data classification device, and the device includes:
  • An acquiring unit for acquiring application data of a target application of a target object, and acquiring a target user ID of the target object;
  • the extraction unit is used for ID extraction of the application data to obtain multiple user IDs and associated data corresponding to each user ID;
  • the bucket processing unit is configured to perform bucket processing on the associated data of the multiple user IDs by using a local sensitive hash algorithm to obtain multiple buckets, and each bucket includes associated data of at least one user ID;
  • the dividing unit is configured to divide the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
  • an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured by Executed by a processor, and the foregoing program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
  • embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect.
  • the computer program product may be a software installation package.
  • FIG. 1A is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1B is a schematic flowchart of a data classification method disclosed in an embodiment of the present application.
  • FIG. 1C is a schematic diagram of a data classification method disclosed in an embodiment of the present application.
  • FIG. 1D is a schematic diagram showing the structure of a user portrait disclosed in an embodiment of the present application.
  • FIG. 1E is a schematic diagram of a locality sensitive hash algorithm disclosed in an embodiment of the present application.
  • FIG. 1F is another schematic diagram showing the locality sensitive hash algorithm disclosed in the embodiment of the present application.
  • FIG. 2 is a schematic flowchart of another data classification method disclosed in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another data classification method disclosed in an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of another electronic device disclosed in an embodiment of the present application.
  • 5A is a schematic structural diagram of a data classification device disclosed in an embodiment of the present application.
  • Fig. 5B is a schematic structural diagram of another data classification device disclosed in an embodiment of the present application.
  • Fig. 5C is a schematic structural diagram of another data classification device disclosed in an embodiment of the present application.
  • the electronic devices involved in the embodiments of the present application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (user equipment).
  • equipment UE
  • mobile station MS
  • smart home equipment smart TV, smart air conditioner, smart range hood, smart fan, smart wheelchair, smart dining table, etc.
  • the above-mentioned devices are collectively referred to as electronic devices, and the above-mentioned electronic devices may also be servers, service platforms, and so on.
  • FIG. 1A is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
  • the electronic device 100 may include a control circuit, and the control circuit may include a storage and processing circuit 110.
  • the storage and processing circuit 110 can be memory, such as hard disk drive memory, non-volatile memory (such as flash memory or other electronic programmable read-only memory used to form a solid-state drive, etc.), volatile memory (such as static or dynamic random access memory). Access to memory, etc.), etc., are not limited in the embodiment of the present application.
  • the processing circuit in the storage and processing circuit 110 may be used to control the operation of the electronic device 100.
  • the processing circuit can be implemented based on one or more microprocessors, microcontrollers, baseband processors, power management units, audio codec chips, application specific integrated circuits, display driver integrated circuits, etc.
  • the storage and processing circuit 110 can be used to run software in the electronic device 100, such as Internet browsing applications, voice over internet protocol (VOIP) phone call applications, email applications, media playback applications, and operating system functions Wait. These softwares can be used to perform some control operations, for example, camera-based image capture, ambient light measurement based on ambient light sensors, proximity sensor measurement based on proximity sensors, and information based on status indicators such as LED status indicators Display functions, touch event detection based on touch sensors, functions associated with displaying information on multiple (eg layered) displays, operations associated with performing wireless communication functions, operations associated with collecting and generating audio signals , The control operations associated with the collection and processing of button press event data, and other functions in the electronic device 100, are not limited in the embodiment of the present application.
  • the electronic device 100 may further include an input-output circuit 150.
  • the input-output circuit 150 can be used to enable the electronic device 100 to implement data input and output, that is, allow the electronic device 100 to receive data from an external device and also allow the electronic device 100 to output data from the electronic device 100 to the external device.
  • the input-output circuit 150 may further include a sensor 170.
  • the sensor 170 may include an ambient light sensor, a proximity sensor based on light and capacitance, and a touch sensor (for example, a light-based touch sensor and/or a capacitive touch sensor, where the touch sensor may be a part of a touch screen, or may be used as a The touch sensor structure is used independently), acceleration sensor, gravity sensor, and other sensors.
  • the input-output circuit 150 may also include one or more displays, such as the display 130.
  • the display 130 may include one or a combination of a liquid crystal display, an organic light emitting diode display, an electronic ink display, a plasma display, and a display using other display technologies.
  • the display 130 may include a touch sensor array (ie, the display 130 may be a touch display screen).
  • the touch sensor can be a capacitive touch sensor formed by an array of transparent touch sensor electrodes (such as indium tin oxide (ITO) electrodes), or can be a touch sensor formed using other touch technologies, such as sonic touch, pressure-sensitive touch, and resistance Touch, optical touch, etc., are not limited in the embodiment of the present application.
  • ITO indium tin oxide
  • the audio component 140 may be used to provide audio input and output functions for the electronic device 100.
  • the audio component 140 in the electronic device 100 may include a speaker, a microphone, a buzzer, a tone generator, and other components for generating and detecting sounds.
  • the communication circuit 120 may be used to provide the electronic device 100 with the ability to communicate with external devices.
  • the communication circuit 120 may include analog and digital input-output interface circuits, and wireless communication circuits based on radio frequency signals and/or optical signals.
  • the wireless communication circuit in the communication circuit 120 may include a radio frequency transceiver circuit, a power amplifier circuit, a low noise amplifier, a switch, a filter, and an antenna.
  • the wireless communication circuit in the communication circuit 120 may include a circuit for supporting near field communication (NFC) by transmitting and receiving near-field coupled electromagnetic signals.
  • the communication circuit 120 may include a near field communication antenna and a near field communication transceiver.
  • the communication circuit 120 may also include a cellular phone transceiver and antenna, a wireless local area network transceiver circuit and antenna, and so on.
  • the electronic device 100 may further include a battery, a power management circuit, and other input-output units 160.
  • the input-output unit 160 may include buttons, joysticks, click wheels, scroll wheels, touch pads, keypads, keyboards, cameras, light emitting diodes, and other status indicators.
  • the user can input commands through the input-output circuit 150 to control the operation of the electronic device 100, and can use the output data of the input-output circuit 150 to realize receiving status information and other outputs from the electronic device 100.
  • FIG. 1B is a schematic flowchart of a data classification method provided by an embodiment of the present application.
  • the data transmission method described in this embodiment is applied to the electronic device as shown in FIG. 1A.
  • the data classification method includes:
  • the target object can be understood as the owner or other users.
  • the target application may be at least one of the following: video application, social application, instant messaging application, shopping application, payment application, game application, navigation application, camera application, financial management application, etc., which are not limited here.
  • the target application may be one application or a type of application, the target application may include one or more applications, and the target application may be a third-party application or a system application.
  • the application data may include at least one of the following: registration application data, application cache data, or instant messaging data, etc., which are not limited here.
  • the application data may include: user cookies, APP browsing behavior
  • a user ID such as an identification ID and an account ID, where the nature of the user ID of the user identification can be a device hardware ID or a character identification.
  • electronic devices may be used by multiple people.
  • IMEI device IMEI
  • SSOID Session Object Identity
  • oppenid user location data
  • Internet behavior data etc.
  • a multi-dimensional feature layer and ID-mapping relationship layer can be constructed.
  • the multi-code relationship can be used in the natural person recognition layer.
  • the letter recognition filtering algorithm and the graph connection algorithm complete the accurate recognition of natural persons, so that the owner can be accurately identified, after all, the owner is still using electronic equipment most of the time.
  • the foregoing step 101, obtaining application data of the target application of the target object may include the following steps:
  • the aforementioned preset time period can be set by the user or the system defaults.
  • the preset time period can be understood as a period of time during which the electronic device has been used recently, or between registering any user ID in at least one user ID and the current time
  • the target object may be the owner.
  • the user ID may be at least one of the following: phone number, integrated circuit card identity (ICCID), and international mobile equipment identification code (International Mobile Equipment Identity (IMEI), Single Sign On ID (Single Sign On identification, SSOID), third-party application ID, oppenId, etc., are not limited here.
  • the electronic device can obtain at least one user ID of the target application that has been used in the electronic device, and further, can determine the target application data in the electronic device according to the at least one user ID.
  • the electronic device can store the target application related data. All data in, for example, cache data, application running state data, etc. However, in the embodiment of the present application, only application data related to the user ID may be extracted.
  • step 101 when the at least one user ID is a natural person ID, before step 101, the following steps may be further included:
  • A1 Acquire historical usage data of the target application of the electronic device corresponding to the target object
  • A2. Construct a multi-dimensional feature layer and ID-mapping relationship layer according to the historical usage data
  • A3. Determine a natural person ID according to the multi-dimensional feature layer and the ID-mapping relationship layer, and use the natural person ID as the target user ID.
  • historical user data can be understood as the use data corresponding to the current time when the user first used the target application on the electronic device, or all use data corresponding to at least one user ID of the target object.
  • the historical use data may include the following At least one: registration application data, application cache data or instant messaging data, etc., which are not limited here.
  • application data may include: user's cookie, APP-side browsing behavior identification ID, and account ID and other user identification User ID
  • the aforementioned user data may also be at least one of the following: CPU operating frequency, CPU core number, CPU operating mode, GPU frame rate, GPU resolution, device brightness, device sound, memory parameters, some or all of the parameters.
  • the nature of the user ID of the user identity identification can be a device hardware ID or a character identification.
  • the electronic device can obtain historical usage data of the target application corresponding to the target object.
  • the historical usage data can be obtained from a data source.
  • the data source can include at least one of the following: browser, software store, account System, AutoNavi data, shopping data, communication data, game data, social data, office data, smart home data, etc. are not limited here.
  • the ID-MAPPing relationship layer data can be obtained based on the historical usage data.
  • the ID-MAPPing relationship layer data can include at least one of the following: OSSID ⁇ ->IMEI (the mapping relationship between OSSID and IMEI), TEL ⁇ ->IMEI, OppenId ⁇ ->ICCID, etc., are not limited here.
  • Multi-dimensional feature layer data can also be obtained based on historical usage data.
  • Multi-dimensional feature layer data can include at least one of the following: device features, APP features, positioning features, etc., which are not limited here
  • the multi-dimensional feature layer and the ID-mapping relationship layer it can be determined that each natural person ID can correspond to a user portrait.
  • the user portrait can include at least one of the following: demographic attributes, human-land relationship, interest Hobbies, equipment attributes, assets, business interests, etc., are not limited here.
  • the above-mentioned device characteristics may include at least one of the following: device attributes (such as equipment daily management, model configuration, activation date, etc.), network connection conditions (such as: WIFI connection, network IP, base station, connection distribution, etc.) ), ID's own attributes (such as ID format, character length, etc.), etc., which are not limited here.
  • APP features can include at least one of the following: APP installation, startup, uninstallation, APP type preferences (such as games, applications), APP active periods (working days, holidays, etc.), etc., which are not limited here.
  • Positioning features can include the following At least one: location attribute (for example, home or company, resident business district, frequently active place), travel preference (for example, travel mode, travel time, travel frequency, travel trajectory, etc.), POI preference (POI arrival, POI search for).
  • the application data may include multiple user IDs, that is, when the user uses the device to communicate with another user, the application data may record the user ID of the other user.
  • the associated data may be at least one of the following: user level, point consumption, activity, preference type, online time, online time, operating habits, communication times, communication time, user ID, and so on.
  • ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each ID to obtain multiple user IDs, which may include the following steps:
  • the above-mentioned preset ID keyword is understood as a keyword in a specific format, for example, user name: xxx, then xxx is a keyword, and the specific format can be defaulted by the system.
  • the electronic device can search application data according to preset ID keywords to obtain multiple IDs, and can also integrate multiple IDs to obtain multiple user IDs.
  • the specific integration algorithm can be a clustering algorithm or It is a local sensitive hash algorithm, etc., which are not limited here.
  • the associated data corresponding to multiple user IDs can be obtained from the application data.
  • the associated data can be understood as data related to the user ID, and the multiple user IDs are obtained The associated data corresponding to each user ID.
  • locality sensitive hashing is the most popular kind of approximate nearest neighbor search algorithm. It has a solid theoretical basis and performs well in high-dimensional data spaces. Its main function is to dig out similar data from massive data, which can be specifically applied to text similarity detection, web search and other fields. Its basic idea is similar to a spatial domain conversion idea.
  • the LSH algorithm is based on a hypothesis. If the texts are similar in the original data space, they also have a high degree of similarity after being transformed by the hash function; on the contrary, if they are not similar themselves, they should still not have similarity after the transformation .
  • the electronic device may perform bucket processing on the associated data of multiple user IDs by using a local hash sensitive algorithm to obtain multiple buckets, each of which corresponds to at least one user ID associated data.
  • LSH Locality-Sensitive Hashing
  • the "user set" where each user is located will be relatively small, because only need to calculate the intimacy of the user set in the bucket, you can reduce the complexity of the user intimacy calculation, and then after sorting, and some rules, you can The corresponding intimacy is used to classify the user's friends.
  • the foregoing step 104 dividing the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups, may include the following steps:
  • the preset threshold can be set by the user or the system defaults.
  • the i-th bucket is any one of the above-mentioned multiple buckets.
  • the electronic device can determine the relationship between the user ID and the target user ID according to the associated data of each user ID in the i-th bucket To obtain multiple relevance degrees, and then select a relevance degree greater than a preset threshold from the multiple relevance degrees to obtain at least one target relevance degree.
  • the ID corresponding to the at least one target relevance degree can be regarded as a group, That is, some user IDs with high similarity in each bucket are regarded as a group.
  • step 41 determining the degree of association between the user ID and the target user ID according to the associated data of each user ID in the i-th bucket, to obtain multiple degrees of association, may include the following steps:
  • the first user ID is any user ID in the i-th bucket.
  • the electronic device can obtain the associated data of the first user ID, and perform feature extraction on the first user ID.
  • the feature set may include at least one of the following: geographic location, communication time period, communication content, communication times, etc., which are not limited here.
  • the above-mentioned features of each dimension can be represented by a feature value.
  • the degree of association between the first user ID and the target user ID can be determined according to the target feature set. For example, the weight value corresponding to each feature in the target feature set can be determined, and then a weighting operation is performed based on each feature and its corresponding weight value to obtain the correlation degree between the first user ID and the target user ID.
  • the target feature set includes feature values of multiple dimensions; the above step 413 determines the degree of association between the first user ID and the target user ID according to the target feature set, which may include The following steps:
  • the target feature set may include feature sets of multiple dimensions, and the electronic device may pre-store the weight value corresponding to the feature value of each dimension, and furthermore, the weight value corresponding to each dimension of the feature values of multiple dimensions can be determined.
  • the weight values of multiple dimensions are obtained, and further, a weighting operation can be performed according to the feature values of multiple dimensions and the weight values of multiple dimensions to obtain the degree of association between the first user ID and the target user ID.
  • the foregoing step 413, determining the degree of association between the first user ID and the target user ID according to the target feature set may include the following steps:
  • C2. Determine the degree of association between the first user ID and the target user ID according to the first feature set and the second feature set.
  • the target feature set since the target feature set includes both the features of the first user ID and the features of the target user ID, the target feature set can be separated to obtain the first feature set corresponding to the first user ID, and the target The second feature set corresponding to the user ID. There may be an intersection between the first feature set and the second feature set. Both the first feature set and the second feature set may include at least one of the following features: user level, point consumption, activity , Preference type, online time, online time, operating habits, communication times, communication time, user ID, etc., are not limited here. Furthermore, the first user ID and the target user can be determined according to the first feature set and the second feature set The degree of association between IDs.
  • the degree of association between the first user ID and the target user ID can be calculated by Euclidean distance, as follows:
  • p represents the first feature set
  • q represents the second feature set
  • i represents any dimension
  • the degree of association between the first user ID and the target user ID can be calculated through the Jaccard distance, as follows:
  • p represents the first feature set
  • q represents the second feature set
  • the degree of association between the first user ID and the target user ID can be calculated through the cosine distance, as follows:
  • p represents the first feature set
  • q represents the second feature set
  • each user ID corresponds to at least one tag.
  • each user ID can correspond to at least one tag
  • the tag can be a tag of a user portrait.
  • the label can be at least one of the following: age, occupation, income, hobbies, etc., which are not limited here.
  • the electronic device can obtain the label corresponding to the user ID in the group j to obtain multiple labels, and the group j is Any one of the plurality of groups, and further, the tag that appears most frequently among the plurality of tags may be used as the group name of the group j.
  • the locality-sensitive hashing algorithm is not only used in user identification and user data classification, but also in many fields that require similarity calculation, such as friend recommendation, document similarity, etc.
  • recommendation it can be calculated by LSH For similar users and similar products, it can also save a lot of computing resources, thereby completing accurate recommendations for users and improving recommendation efficiency.
  • the data classification method described in the above embodiments of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets The associated data of the user ID divides multiple user IDs into groups to obtain multiple groups.
  • multiple user IDs can be extracted from the application data, and the multiple user IDs can be bucketed through the local sensitive hash algorithm Processing, and finally grouping based on the ID in each bucket, can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
  • FIG. 2 is a schematic flowchart of another data classification method provided by an embodiment of the present application.
  • the data classification method described in this embodiment is applied to the electronic device shown in FIG. 1A , The method may include the following steps:
  • the data classification method described in the above embodiments of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets
  • the associated data of user ID divides multiple user IDs into groups to obtain multiple groups, obtain the label corresponding to the user ID in group j, and obtain multiple labels, and group j is any of the multiple groups ,
  • the tag with the most occurrences among multiple tags is used as the group name of group j.
  • the group is divided based on the ID in each bucket, and the group can be named, which can reduce the complexity of calculation, save the corresponding time and computing resources, and improve the efficiency of data classification.
  • FIG. 3 is a schematic flowchart of an embodiment of another data classification method provided by an embodiment of this application.
  • the data classification method described in this embodiment is applied to the electronic device as shown in FIG. 1A.
  • the method may include the following steps:
  • the data classification method described in the above embodiments of the present application can first obtain the natural ID of the user object, obtain the application data of the target application of the target object based on the natural ID, and extract the ID of the application data to obtain multiple users.
  • ID, and the associated data corresponding to each user ID the associated data of multiple user IDs are divided into buckets through the local sensitive hash algorithm, and multiple buckets are obtained.
  • Each bucket includes the associated data of at least one user ID.
  • the associated data of the user IDs in each bucket divides multiple user IDs into groups to obtain multiple groups.
  • multiple user IDs can be extracted from the application data, and the local sensitive hash algorithm is used for the multiple users ID is divided into buckets, and finally group is divided based on the ID in each bucket, which can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
  • FIG. 4 is an electronic device provided by an embodiment of the present application, including: a processor and a memory; and one or more programs, the one or more programs are stored in the In the memory and configured to be executed by the processor, the program includes instructions for executing the following steps:
  • ID extraction is performed on the application data to obtain multiple user IDs and associated data corresponding to each user ID;
  • the multiple user IDs are divided into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
  • the electronic device described in the above embodiment of the application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, performs ID extraction on the application data, and obtains multiple user IDs, and each The associated data corresponding to the user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets.
  • Each bucket includes associated data of at least one user ID based on the users in the multiple buckets.
  • the ID-related data divides multiple user IDs into groups to obtain multiple groups.
  • the program includes steps for performing the following steps The instructions:
  • each of the multiple user IDs corresponds to a natural person
  • the program includes: Step instructions:
  • the user ID corresponding to the at least one target relevance is regarded as a group.
  • the program includes Instructions to perform the following steps:
  • the degree of association between the first user ID and the target user ID is determined according to the target feature set.
  • the target feature set includes feature values of multiple dimensions
  • the program includes instructions for executing the following steps:
  • the program includes instructions for executing the following steps:
  • the program includes instructions for executing the following steps:
  • the program when the at least one user ID is a natural person ID, the program further includes instructions for executing the following steps:
  • the natural person ID is determined according to the multi-dimensional feature layer and the ID-mapping relationship layer, and the natural person ID is used as the target user ID.
  • each user ID corresponds to at least one tag
  • the program further includes instructions for performing the following steps:
  • the tag with the most occurrences among the plurality of tags is used as the group name of the group j.
  • the electronic device includes hardware structures and/or software modules corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the electronic device into functional units according to the foregoing method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 5A is a schematic structural diagram of a data classification device provided in this embodiment.
  • the data classification device is applied to the electronic equipment as shown in FIG. 1A.
  • the data classification device includes an acquisition unit 501, an extraction unit 502, a bucket processing unit 503, and a division unit 504, wherein:
  • the obtaining unit 501 is configured to obtain application data of a target application of a target object, and obtain a target user ID of the target object;
  • the extraction unit 502 is configured to extract ID from the application data to obtain multiple user IDs and associated data corresponding to each user ID;
  • the bucket processing unit 503 is configured to perform bucket processing on the associated data of the multiple user IDs by using a local sensitive hash algorithm to obtain multiple buckets, each of which includes at least one user ID associated data;
  • the dividing unit 504 is configured to divide the multiple user IDs into groups based on the associated data of the user IDs in the multiple buckets to obtain multiple groups.
  • the data classification device described in the above embodiment of the present application obtains the application data of the target application of the target object, and obtains the target user ID of the target object, and extracts the application data to obtain multiple user IDs, and each The associated data corresponding to a user ID is processed by bucketing the associated data of multiple user IDs through a local sensitive hash algorithm to obtain multiple buckets, each bucket including at least one user ID associated data, based on the data in multiple buckets
  • the associated data of the user ID divides multiple user IDs into groups to obtain multiple groups.
  • multiple user IDs can be extracted from the application data, and the multiple user IDs can be bucketed through the local sensitive hash algorithm Processing, and finally grouping based on the ID in each bucket, can reduce the complexity of calculation, save corresponding time and computing resources, and improve the efficiency of data classification.
  • the extraction unit 502 is specifically configured to:
  • each of the multiple user IDs corresponds to a natural person
  • the dividing unit 504 is specifically configured to :
  • the user ID corresponding to the at least one target relevance is regarded as a group.
  • the dividing unit 504 is specifically used for:
  • the degree of association between the first user ID and the target user ID is determined according to the target feature set.
  • the target feature set includes feature values of multiple dimensions
  • the dividing unit 504 is collectively configured to:
  • the dividing unit 504 is specifically configured to:
  • the acquiring unit 501 is specifically configured to:
  • FIG. 5B is another modified structure of the data classification method shown in FIG. 5A. Compared with FIG. 5A, it further includes: an establishment unit 505 and a determination unit 506, which are specifically as follows :
  • the obtaining unit 501 is further configured to obtain historical usage data of the target application of the electronic device corresponding to the target object;
  • the establishing unit 505 is configured to construct a multi-dimensional feature layer and an ID-mapping relationship layer according to the historical usage data;
  • the determining unit 506 is configured to determine a natural person ID according to the multi-dimensional feature layer and the ID-mapping relationship layer, and use the natural person ID as the target user ID.
  • each user ID corresponds to at least one label, as shown in FIG. 5C.
  • FIG. 5C is another device of the data classification method shown in FIG. 5A. Compared with FIG. 5A, it further includes: a selection unit 507, as follows:
  • the obtaining unit 501 is configured to obtain a tag corresponding to a user ID in a group j to obtain multiple tags, where the group j is any one of the multiple groups;
  • the selection unit 507 is configured to use the tag with the most occurrences among the plurality of tags as the group name of the group j.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes a computer to execute a part of any data transmission method described in the above method embodiment Or all steps.
  • the embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method described in the foregoing method embodiment Part or all of the steps of any data transmission method.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software program module.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned memory includes: U disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), mobile hard disk, magnetic disk, or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: flash disk , ROM, RAM, magnetic disk or CD, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种数据分类方法及相关产品,该方法包括:获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。采用本申请实施例,可以能够提升好友分类效率,提升了用户体验。

Description

数据分类方法及相关产品 技术领域
本申请涉及通信技术领域,具体涉及一种数据分类方法及相关产品。
背景技术
随着电子设备(如:手机、平板电脑等)的大量普及应用,电子设备能够支持的应用越来越多,功能越来越强大,电子设备向着多样化、个性化的方向发展,成为用户生活中不可缺少的电子用品。
目前来看,社交应用在手机中应用广泛,但是,在使用过程中,需要用户对好友进行一个一个分类,这种分类效率较低,降低了用户体验。
发明内容
本申请实施例提供了一种数据分类方法及相关产品,能够提升好友分类效率,提升了用户体验。
第一方面,本申请实施例一种数据分类方法,包括:
获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
第二方面,本申请实施例提供了一种数据分类装置,所述装置包括:
获取单元,用于获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
提取单元,用于对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
分桶处理单元,用于通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
划分单元,用于基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
第三方面,本申请实施例提供一种电子设备,包括处理器、存储器、通信接口,以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包 括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
附图说明
下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A是本申请实施例提供的一种电子设备的结构示意图;
图1B是本申请实施例公开的一种数据分类方法的流程示意图;
图1C是本申请实施例公开的一种数据分类方法的演示示意图;
图1D是本申请实施例公开的一种用户画像的结构演示示意图;
图1E是本申请实施例公开的局部敏感哈希算法的演示示意图;
图1F是本申请实施例公开的局部敏感哈希算法的另一演示示意图;
图2是本申请实施例公开的另一种数据分类方法的流程示意图;
图3是本申请实施例公开的另一种数据分类方法的流程示意图;
图4是本申请实施例公开的另一种电子设备的结构示意图;
图5A是本申请实施例公开的一种数据分类装置的结构示意图;
图5B是本申请实施例公开的另一种数据分类装置的结构示意图;
图5C是本申请实施例公开的另一种数据分类装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例所涉及到的电子设备可以包括各种具有无线通信功能的手持设备、车载 设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(user equipment,UE),移动台(mobile station,MS),智能家居设备(智能电视机、智能空调、智能油烟机、智能电扇、智能轮椅、智能饭桌等等)等等。为方便描述,上面提到的设备统称为电子设备,上述电子设备还可以为服务器、业务平台等等。
下面对本申请实施例进行详细介绍。
请参阅图1A,图1A是本申请实施例公开的一种电子设备的结构示意图,电子设备100可以包括控制电路,该控制电路可以包括存储和处理电路110。该存储和处理电路110可以存储器,例如硬盘驱动存储器,非易失性存储器(例如闪存或用于形成固态驱动器的其它电子可编程只读存储器等),易失性存储器(例如静态或动态随机存取存储器等)等,本申请实施例不作限制。存储和处理电路110中的处理电路可以用于控制电子设备100的运转。该处理电路可以基于一个或多个微处理器,微控制器,基带处理器,功率管理单元,音频编解码器芯片,专用集成电路,显示驱动器集成电路等来实现。
存储和处理电路110可用于运行电子设备100中的软件,例如互联网浏览应用程序,互联网协议语音(voice over internet protocol,VOIP)电话呼叫应用程序,电子邮件应用程序,媒体播放应用程序,操作系统功能等。这些软件可以用于执行一些控制操作,例如,基于照相机的图像采集,基于环境光传感器的环境光测量,基于接近传感器的接近传感器测量,基于诸如发光二极管的状态指示灯等状态指示器实现的信息显示功能,基于触摸传感器的触摸事件检测,与在多个(例如分层的)显示器上显示信息相关联的功能,与执行无线通信功能相关联的操作,与收集和产生音频信号相关联的操作,与收集和处理按钮按压事件数据相关联的控制操作,以及电子设备100中的其它功能等,本申请实施例不作限制。
电子设备100还可以包括输入-输出电路150。输入-输出电路150可用于使电子设备100实现数据的输入和输出,即允许电子设备100从外部设备接收数据和也允许电子设备100将数据从电子设备100输出至外部设备。输入-输出电路150可以进一步包括传感器170。传感器170可以包括环境光传感器,基于光和电容的接近传感器,触摸传感器(例如,基于光触摸传感器和/或电容式触摸传感器,其中,触摸传感器可以是触控显示屏的一部分,也可以作为一个触摸传感器结构独立使用),加速度传感器,重力传感器,和其它传感器等。
输入-输出电路150还可以包括一个或多个显示器,例如显示器130。显示器130可以包括液晶显示器,有机发光二极管显示器,电子墨水显示器,等离子显示器,使用其它显示技术的显示器中一种或者几种的组合。显示器130可以包括触摸传感器阵列(即,显示器130可以是触控显示屏)。触摸传感器可以是由透明的触摸传感器电极(例如氧化铟锡(ITO)电极)阵列形成的电容式触摸传感器,或者可以是使用其它触摸技术形成的触摸传感器,例如音波触控,压敏触摸,电阻触摸,光学触摸等,本申请实施例不作限制。
音频组件140可以用于为电子设备100提供音频输入和输出功能。电子设备100中的音频组件140可以包括扬声器,麦克风,蜂鸣器,音调发生器以及其它用于产生和检测声音的组件。
通信电路120可以用于为电子设备100提供与外部设备通信的能力。通信电路120可 以包括模拟和数字输入-输出接口电路,和基于射频信号和/或光信号的无线通信电路。通信电路120中的无线通信电路可以包括射频收发器电路、功率放大器电路、低噪声放大器、开关、滤波器和天线。举例来说,通信电路120中的无线通信电路可以包括用于通过发射和接收近场耦合电磁信号来支持近场通信(near field communication,NFC)的电路。例如,通信电路120可以包括近场通信天线和近场通信收发器。通信电路120还可以包括蜂窝电话收发器和天线,无线局域网收发器电路和天线等。
电子设备100还可以进一步包括电池,电力管理电路和其它输入-输出单元160。输入-输出单元160可以包括按钮,操纵杆,点击轮,滚动轮,触摸板,小键盘,键盘,照相机,发光二极管和其它状态指示器等。
用户可以通过输入-输出电路150输入命令来控制电子设备100的操作,并且可以使用输入-输出电路150的输出数据以实现接收来自电子设备100的状态信息和其它输出。
请参阅图1B,图1B是本申请实施例提供的一种数据分类方法的流程示意图,本实施例中所描述的数据传输方法,应用于如图1A的电子设备,该数据分类方法包括:
101、获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID。
其中,目标对象可以理解为机主或者其他用户。目标应用可以为以下至少一种:视频应用、社交应用、即时通讯应用、购物应用、支付应用、游戏应用、导航应用、拍照应用,理财应用,等等,在此不作限定。目标应用可以为一个应用,或者一类应用,目标应用可以包括一个或者多个应用,目标应用可以为第三方应用或者系统应用。本申请实施例中,应用数据可以包括以下至少一种:注册应用数据、应用缓存数据或者即时通讯数据等等,在此不做限定,例如,应用数据可以包括:用户的cookie、APP端浏览行为标识ID、以及账号ID等用户身份标识的用户ID,其中,该用户身份标识的用户ID的性质可以为设备硬件ID或字符标识。
当然,电子设备可能被多个人使用,可以通过整合设备IMEI、SSOID、oppenid、用户位置数据和互联网行为数据等,构建出多维特征层和ID-mapping关系层,在自然人识别层利用多码关系可信识别过滤算法和图连通算法完成对自然人的精准识别,如此,能够精准识别出机主,毕竟大部分时间还是机主在使用电子设备。
在一个可能的示例中,上述步骤101,获取目标对象的目标应用的应用数据,可以包括如下步骤:
11、获取所述目标对象的目标应用的至少一个用户ID;
12、依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标应用的应用数据。
其中,上述预设时间段可以由用户自行设置或者系统默认,预设时间段可以理解为最近使用电子设备的一段时间,或者,从注册至少一个用户ID中的任一用户ID到当前时间之间的一段时间,目标对象可以为机主,本申请实施例中,用户ID可以为以下至少一种:电话号码、集成电路卡识别码(Integrate circuit card identity,ICCID)、国际移动设备识别码(International Mobile Equipment Identity,IMEI)、单点登录ID(Single Sign On identification,SSOID)、第三方应用的ID、oppenId等等,在此不做限定。
进一步地,电子设备可以获取电子设备中被使用过的目标应用的至少一个用户ID,进而,依据该至少一个用户ID可以确定电子设备中的目标应用数据,当然,电子设备中可以存储目标应用相关的所有数据,例如,缓存数据、应用运行状态数据,等等,但是,本申请实施例中,可以仅仅只提取与用户ID相关的应用数据。
进一步地,在所述至少一个用户ID为自然人ID时,上述步骤101之前,还可以包括如下步骤:
A1、获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;
A2、依据所述历史使用数据构建出多维特征层和ID-mapping关系层;
A3、依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
其中,历史用户数据可以理解为用户从第一次使用在电子设备使用目标应用到当前时间所对应的使用数据,或者,目标对象的至少一个用户ID对应的全部使用数据,历史使用数据可以包括以下至少一种:注册应用数据、应用缓存数据或者即时通讯数据等等,在此不做限定,例如,应用数据可以包括:用户的cookie、APP端浏览行为标识ID、以及账号ID等用户身份标识的用户ID,上述用户数据还可以为以下至少一种:CPU工作频率、CPU核数、CPU工作模式、GPU帧率、GPU分辨率、设备亮度、设备声音、内存参数中的部分参数或者全部参数。其中,该用户身份标识的用户ID的性质可以为设备硬件ID或字符标识。
具体实现中,如图1C所示,电子设备可以获取目标对象对应的目标应用的历史使用数据,历史使用数据可以从数据源获取,数据源可以包括以下至少一种:浏览器、软件商店、账号体系、高德数据、购物数据、通讯数据、游戏数据、社交数据、办公数据、智能家居数据等等在此不作限定。可以依据该历史使用数据得到ID-MAPPing关系层数据,ID-MAPPing关系层数据可以包括以下至少一种:OSSID<->IMEI(OSSID与IMEI之间的映射关系)、TEL<->IMEI、OppenId<->ICCID等等,在此不作限定,还可以依据历史使用数据得到多维特征层数据,多维特征层数据可以包括以下至少一种:设备特征、APP特征、定位特征等等,在此不作限定,依据多维特征层和ID-mapping关系层可以确定出自然人ID每一个自然人ID可以对应一个用户画像,如图1D所示,用户画像可以包括以下至少一项内容:人口属性、人地关系、兴趣爱好、设备属性、资产情况、商业兴趣等等,在此不作限定。
另外,上述设备特征可以包括以下至少一种:设备自身属性(如设备日活打点、机型配置、激活日期等等)、网络连接情况(如:WIFI连接、网络IP、基站、连接度分布等等)、ID自身属性(如ID格式、字符长度等等)等等,在此不作限定。APP特征可以包括以下至少一种:APP安装、启动、卸载、APP类型偏好(如游戏、应用)、APP常活跃时段(工作日、假期等)等等,在此不作限定,定位特征可以包括以下至少一种:位置属性(例如,家或公司、常驻商圈、常活跃地)、出行偏好(例如,出行方式、出行时间、出行频次、出行轨迹等等)、POI偏好(POI到达、POI搜索)。
102、对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联 数据。
其中,应用数据中可以包括多个用户ID,即用户在使用设备与另一用户进行通信时,应用数据中则可以记录该另一用户的用户ID。关联数据可以为以下至少一种:用户等级、积分消耗、活跃度、偏好类型、上线时间、在线时间、操作习惯、通信次数、通信时间、用户ID等等。
在一个可能的示例中,上述步骤102,对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID,可以包括如下步骤:
21、依据预设ID关键字对所述应用数据进行搜索,得到多个ID;
22、对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;
23、从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
其中,上述预设ID关键字理解为特定格式的关键字,例如,用户名:xxx,则xxx为关键字,特定格式可以由系统默认。具体实现中,电子设备可以依据预设ID关键字对应用数据进行搜索,得到多个ID,还可以对多个ID进行整合,得到多个用户ID,整合具体算法可以为聚类算法,还可以是局部敏感哈希算法等等,在此不作限定,接着,可以从应用数据中获取多个用户ID对应的关联数据,关联数据即可以理解为与用户ID相关的数据,得到多个用户ID中每一用户ID对应的关联数据。
103、通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据。
其中,局部敏感哈希是近似最近邻搜索算法中最流行的一种,它有坚实的理论依据并且在高维数据空间中表现优异。它的主要作用就是从海量的数据中挖掘出相似的数据,可以具体应用到文本相似度检测、网页搜索等领域,其基本思想类似于一种空间域转换思想,LSH算法基于一个假设,如果两个文本在原有的数据空间是相似的,那么分别经过哈希函数转换以后的它们也具有很高的相似度;相反,如果它们本身是不相似的,那么经过转换后它们应仍不具有相似性。
具体实现中,电子设备可以通过局部哈希敏感算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶对应至少一个用户ID的关联数据。
举例说明下,局部敏感哈希(Locality-Sensitive Hashing,LSH)的基本思想是,用一系列函数将数据哈希到桶中,这样彼此接近的数据点处于相同的桶中可能性就会很高,而彼此相距很远的数据点很可能处于不同的桶中。以计算用户间的亲密度为例,如果亲密度较高的用户都以较大概率分到同一个桶内,如图1E所示:1、2、3、4、5可以表示5个不同的用户ID。进一步地,如图1F所示,在经过分桶后,1、2可以放在一个桶里,3、4可以放在一个桶里,5可以单独放在一个桶里。每个用户所在的“用户集”就会相对小一些,因为只需要计算桶内用户集的亲密度,就可以降低用户亲密度计算的复杂度,然后再通过排序后,以及一些规则,可以根据相应的亲密度来对用户的好友进行分类。
104、基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
其中,由于每个桶中的用户ID之间,本身有一定的关联性,因而,可以再对这些用户 ID进一步划分,可以得到精准的群组。
在一个可能的示例中,上述步骤104,基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组,可以包括如下步骤:
41、依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;
42、从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;
43、将所述至少一个目标关联度对应的用户ID作为一个群组。
其中,预设阈值可以由用户自行设置或者系统默认。以第i个桶为例,该第i个桶为上述多个桶中的任意一个桶,电子设备可以依据第i个桶中每一用户ID的关联数据确定该用户ID与目标用户ID之间的关联度,得到多个关联度,进而,再从多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度,可以将该至少一个目标关联度对应的ID作为一个群体,即将每个桶中相似度高的一些用户ID作为一个群体。
进一步可选地,上述步骤41,依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,可以包括如下步骤:
411、获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;
412、对所述第一用户ID的关联数据进行特征提取,得到目标特征集;
413、依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
其中,以第一用户ID为例,该第一用户ID为第i个桶中的任意一个用户ID,电子设备可以获取第一用户ID的关联数据,对该第一用户ID进行特征提取,目标特征集可以包括以下至少一种:地理位置、通信时间段、通信内容、通信次数等等,在此不作限定,当然,上述每一维度的特征均可以采用一个特征值进行表示。进而,可以依据目标特征集确定第一用户ID和目标用户ID之间的关联度。例如,可以确定目标特征集中每一特征对应的权重值,再基于每一特征以及其对应的权重值进行加权运算,可以得到第一用户ID与目标用户ID之间的关联度。
在一个可能的示例中,所述目标特征集包括多个维度的特征值;上述步骤413依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,可以包括如下步骤:
B1、确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;
B2、依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
其中,目标特征集可以包括多个维度的特征集,电子设备中可以预先存储每一维度的特征值对应的权重值,进而,可以确定多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值,进而,可以依据多个维度的特征值以及多个维度的权重值进行加权运算,得到第一用户ID与目标用户ID之间的关联度。
在一个可能的示例中,上述步骤413,依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,可以包括如下步骤:
C1、依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;
C2、依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
其中,由于目标特征集中即包含了第一用户ID的特征,也包括了目标用户ID的特征,进而,可以对该目标特征集进行分离,得到第一用户ID对应的第一特征集,和目标用户ID对应的第二特征集,第一特征集与第二特征集之间可以存在交集,第一特征集、第二特征集均可以包括以下至少一种特征:用户等级、积分消耗、活跃度、偏好类型、上线时间、在线时间、操作习惯、通信次数、通信时间、用户ID等等,在此不作限定,进而,可以依据第一特征集和第二特征集确定第一用户ID与目标用户ID之间的关联度。
例如,可以通过欧式距离计算第一用户ID与目标用户ID之间的关联度,具体如下:
Figure PCTCN2019093971-appb-000001
其中,p表示第一特征集,q表示第二特征集,i表示任一维度。
又例如,可以通过Jaccard距离计算第一用户ID与目标用户ID之间的关联度,具体如下:
Figure PCTCN2019093971-appb-000002
其中,p表示第一特征集,q表示第二特征集。
又例如,可以通过余弦距离计算第一用户ID与目标用户ID之间的关联度,具体如下:
Figure PCTCN2019093971-appb-000003
其中,p表示第一特征集,q表示第二特征集。
在一个可能的示例中,每一用户ID至少对应一个标签,所述上述步骤104之后,还可以包括如下步骤:
D1、获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组;
D2、将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
具体实现中,每一用户ID至少可以对应一个标签,该标签可以为用户画像的标签。标签可以为以下至少一种:年龄、职业、收入、兴趣爱好等等,在此不作限定,进而,电子设备可以获取群组j中的用户ID对应的标签,得到多个标签,群组j为多个群组中的任一群组,进而,可以将多个标签中出现次数最多的标签作为群组j的群组名称。
具体实现中,在互联网时代,对于任何数据平台来说,拥有大量的用户以及其所产生的数据是一件非常幸福的事情,然而大量的数据也会带来一些幸福的烦恼,一方面,大量的数据需要大量的存储空间,另外一方面,要将数据转化成真正的数据资产就意味着大量的计算资源,如果不能合理的利用资源,将对企业产生极大的损失。基于上述本申请实施例,无论是用户识别还是用户数据分类,大量用户的相似度或者亲密度计算都是需要大量计算资源的过程,由于局部敏感哈希算法,该算法相对于常规方法而言,可以显著降低所 需要的计算资源和时间。
进一步地,局部敏感哈希算法不仅单单用在用户识别和用户数据分类中,而且对于许多需要相似度计算的领域其实都应用,例如好友推荐,文档相似度等,在推荐中,可以通过LSH计算相似用户,以及相似的商品,它同样可以节约大量的计算资源,从而完成对用户的精准推荐以及提升推荐效率。
可以看出,上述本申请实施例所描述的数据分类方法,获取目标对象的目标应用的应用数据,以及获取目标对象的目标用户ID,对应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据,通过局部敏感哈希算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据,基于多个桶中的用户ID的关联数据将多个用户ID进行群组划分,得到多个群组,如此,能够从应用数据中提取多个用户ID,且通过局部敏感哈希算法对该多个用户ID进行分桶处理,最后基于每个桶内的ID进行群组划分,可以降低计算的复杂度,节约相应的时间和计算资源,提升了数据分类效率。
与上述一致地,请参阅图2,图2是本申请实施例提供的另一种数据分类方法的流程示意图,本实施例中所描述的数据分类方法,应用于如图1A所示的电子设备,该方法可包括以下步骤:
201、获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID。
202、对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据。
203、通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据。
204、基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
205、获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组。
206、将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
其中,上述步骤201-206的具体实现过程可参照图1B所示的方法中相应的描述,在此不再赘述。
可以看出,上述本申请实施例所描述的数据分类方法,获取目标对象的目标应用的应用数据,以及获取目标对象的目标用户ID,对应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据,通过局部敏感哈希算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据,基于多个桶中的用户ID的关联数据将多个用户ID进行群组划分,得到多个群组,获取群组j中的用户ID对应的标签,得到多个标签,群组j为多个群组中的任一群组,将多个标签中出现次数最多的标签作为群组j的群组名称,如此,能够从应用数据中提取多个用户ID,且通过局部敏感哈希算法对该多个用户ID进行分桶处理,最后基于每个桶内的ID进行群组划分,还可以实现对群组进行命名,可以降低计算的复杂度,节约相应的时间和计算资源,提升了数据分 类效率。
与上述一致地,请参阅图3,为本申请实施例提供的另一种数据分类方法的实施例流程示意图,本实施例中所描述的数据分类方法,应用于如图1A的电子设备,本方法可包括以下步骤:
301、获取目标对象对应的电子设备的所述目标应用的历史使用数据。
302、依据所述历史使用数据构建出多维特征层和ID-mapping关系层。
303、依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
304、获取所述目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID。
305、对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据。
306、通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据。
307、基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
其中,上述步骤301-307的具体实现过程可参照图1B所示的方法中相应的描述,在此不再赘述。
可以看出,上述本申请实施例所描述的数据分类方法,可以先获取用户对象的自然ID,基于自然ID,获取目标对象的目标应用的应用数据,对应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据,通过局部敏感哈希算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据,基于多个桶中的用户ID的关联数据将多个用户ID进行群组划分,得到多个群组,如此,能够从应用数据中提取多个用户ID,且通过局部敏感哈希算法对该多个用户ID进行分桶处理,最后基于每个桶内的ID进行群组划分,可以降低计算的复杂度,节约相应的时间和计算资源,提升了数据分类效率。
与上述一致地,请参阅图4,图4是本申请实施例提供的一种电子设备,包括:处理器和存储器;以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置成由所述处理器执行,所述程序包括用于执行以下步骤的指令:
获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
可以看出,上述本申请实施例所描述的电子设备,获取目标对象的目标应用的应用数据,以及获取目标对象的目标用户ID,对应用数据进行ID提取,得到多个用户ID,以及 每一用户ID对应的关联数据,通过局部敏感哈希算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据,基于多个桶中的用户ID的关联数据将多个用户ID进行群组划分,得到多个群组,如此,能够从应用数据中提取多个用户ID,且通过局部敏感哈希算法对该多个用户ID进行分桶处理,最后基于每个桶内的ID进行群组划分,可以降低计算的复杂度,节约相应的时间和计算资源,提升了数据分类效率。
在一个可能的示例中,在所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID方面,所述程序包括用于执行以下步骤的指令:
依据预设ID关键字对所述应用数据进行搜索,得到多个ID;
对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;
从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
在一个可能的示例中,在所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组方面,所述程序包括用于执行以下步骤的指令:
依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;
从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;
将所述至少一个目标关联度对应的用户ID作为一个群组。
在一个可能的示例中,在所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度方面,所述程序包括用于执行以下步骤的指令:
获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;
对所述第一用户ID的关联数据进行特征提取,得到目标特征集;
依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,所述目标特征集包括多个维度的特征值;
在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述程序包括用于执行以下步骤的指令:
确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;
依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述程序包括用于执行以下步骤的指令:
依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;
依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,在所述获取目标对象的目标应用的应用数据方面,所述程序包括用于执行以下步骤的指令:
获取所述目标对象的目标应用的至少一个用户ID;
依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标应用的应用数据。
在一个可能的示例中,在所述至少一个用户ID为自然人ID时,所述程序还包括用于执行以下步骤的指令:
获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;
依据所述历史使用数据构建出多维特征层和ID-mapping关系层;
依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
在一个可能的示例中,每一用户ID至少对应一个标签,所述程序还包括用于执行以下步骤的指令:
获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组;
将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对电子设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
请参阅图5A,图5A是本实施例提供的一种数据分类装置的结构示意图。该数据分类装置应用于如图1A所示的电子设备,所述数据分类装置包括获取单元501、提取单元502、分桶处理单元503和划分单元504,其中,
获取单元501,用于获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
提取单元502,用于对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
分桶处理单元503,用于通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
划分单元504,用于基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
可以看出,上述本申请实施例所描述的数据分类装置,获取目标对象的目标应用的应用数据,以及获取目标对象的目标用户ID,对应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据,通过局部敏感哈希算法对多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据,基于多个桶中的用户ID的关联数据将多个用户ID进行群组划分,得到多个群组,如此,能够从应用数据中提取多个用户ID,且通过局部敏感哈希算法对该多个用户ID进行分桶处理,最后基于每个桶内的ID进行群组划分,可以降低计算的复杂度,节约相应的时间和计算资源,提升了数据分类效率。
在一个可能的示例中,在所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID方面,所述提取单元502具体用于:
依据预设ID关键字对所述应用数据进行搜索,得到多个ID;
对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;
从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
在一个可能的示例中,在所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组方面,所述划分单元504具体用于:
依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;
从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;
将所述至少一个目标关联度对应的用户ID作为一个群组。
在一个可能的示例中,在所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度方面,所述划分单元504具体用于:
获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;
对所述第一用户ID的关联数据进行特征提取,得到目标特征集;
依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,所述目标特征集包括多个维度的特征值;
在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元504集体用于:
确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;
依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元504具体用于:
依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;
依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
在一个可能的示例中,在所述获取目标对象的目标应用的应用数据方面,所述获取单 元501具体用于:
获取所述目标对象的目标应用的至少一个用户ID;
依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标应用的应用数据。
在一个可能的示例中,如图5B所示,图5B为图5A所示的数据分类方法的又一变型结构,其与图5A相比较,还包括:建立单元505和确定单元506,具体如下:
所述获取单元501,还用于获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;
所述建立单元505,用于依据所述历史使用数据构建出多维特征层和ID-mapping关系层;
所述确定单元506,用于依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
在一个可能的示例中,每一用户ID至少对应一个标签,如图5C所示,图5C为图5A所示的数据分类方法的又一装置,其与图5A相比较,还包括:选取单元507,具体如下:
所述获取单元501,用于获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组;
所述选取单元507,用于将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
可以理解的是,本实施例的数据分类装置的各程序模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种数据传输方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种数据传输方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之 间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、ROM、RAM、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种数据分类方法,其特征在于,包括:
    获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
    对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
    通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
    基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID,包括:
    依据预设ID关键字对所述应用数据进行搜索,得到多个ID;
    对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;
    从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组,包括:
    依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;
    从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;
    将所述至少一个目标关联度对应的用户ID作为一个群组。
  4. 根据权利要求3所述的方法,其特征在于,所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,包括:
    获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;
    对所述第一用户ID的关联数据进行特征提取,得到目标特征集;
    依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
  5. 根据权利要求4所述的方法,其特征在于,所述目标特征集包括多个维度的特征值;
    所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,包括:
    确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;
    依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
  6. 根据权利要求4所述的方法,其特征在于,所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度,包括:
    依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;
    依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述获取目标对象的目标应用的应用数据,包括:
    获取所述目标对象的目标应用的至少一个用户ID;
    依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标应用的应用数据。
  8. 根据权利要求7所述的方法,其特征在于,在所述至少一个用户ID为自然人ID时,所述方法还包括:
    获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;
    依据所述历史使用数据构建出多维特征层和ID-mapping关系层;
    依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,每一用户ID至少对应一个标签,所述方法还包括:
    获取群组j中的用户ID对应的标签,得到多个标签,所述群组j为所述多个群组中的任一群组;
    将所述多个标签中出现次数最多的标签作为所述群组j的群组名称。
  10. 一种数据分类装置,其特征在于,所述装置包括:
    获取单元,用于获取目标对象的目标应用的应用数据,以及获取所述目标对象的目标用户ID;
    提取单元,用于对所述应用数据进行ID提取,得到多个用户ID,以及每一用户ID对应的关联数据;
    分桶处理单元,用于通过局部敏感哈希算法对所述多个用户ID的关联数据进行分桶处理,得到多个桶,每一桶包括至少一个用户ID的关联数据;
    划分单元,用于基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组。
  11. 根据权利要求10所述的装置,其特征在于,在所述对所述应用数据进行ID提取,得到多个用户ID,以及每一ID对应的关联数据,得到多个用户ID方面,所述提取单元具体用于:
    依据预设ID关键字对所述应用数据进行搜索,得到多个ID;
    对所述多个ID进行整合,所述多个用户ID,每一用户ID对应一自然人;
    从所述应用数据中获取所述多个用户ID对应的关联数据,得到所述多个用户ID中每一用户ID对应的关联数据。
  12. 根据权利要求10或11所述的装置,其特征在于,在所述基于所述多个桶中的用户ID的关联数据将所述多个用户ID进行群组划分,得到多个群组方面,所述划分单元具体用于:
    依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度,所述第i个桶为所述多个桶中的任意一个桶;
    从所述多个关联度中选取大于预设阈值的关联度,得到至少一个目标关联度;
    将所述至少一个目标关联度对应的用户ID作为一个群组。
  13. 根据权利要求12所述的装置,其特征在于,在所述依据第i个桶中每一用户ID的关联数据确定该用户ID与所述目标用户ID之间的关联度,得到多个关联度方面,所述划分单元具体用于:
    获取第一用户ID的关联数据,所述第一用户ID为所述第i个桶中的任意一个用户ID;
    对所述第一用户ID的关联数据进行特征提取,得到目标特征集;
    依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
  14. 根据权利要求13所述的装置,其特征在于,所述目标特征集包括多个维度的特征值;
    在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元集体用于:
    确定所述多个维度的特征值中每一维度对应的权重值,得到多个维度的权重值;
    依据所述多个维度的特征值以及所述多个维度的权重值进行加权运算,得到所述第一用户ID与所述目标用户ID之间的关联度。
  15. 根据权利要求13所述的装置,其特征在于,在所述依据所述目标特征集确定所述第一用户ID与所述目标用户ID之间的关联度方面,所述划分单元具体用于:
    依据所述目标特征集分离出所述第一用户ID对应的第一特征集,以及所述目标用户ID对应的第二特征集;
    依据所述第一特征集和所述第二特征集确定所述第一用户ID与所述目标用户ID之间的关联度。
  16. 根据权利要求10-15任一项所述的装置,其特征在于,在所述获取目标对象的目标应用的应用数据方面,所述获取单元具体用于:
    获取所述目标对象的目标应用的至少一个用户ID;
    依据所述至少一个用户ID从预设数据库中获取所述目标对象在预设时间段内的目标 应用的应用数据。
  17. 根据权利要求16所述的装置,其特征在于,在所述至少一个用户ID为自然人ID时,所述装置还包括:建立单元和确定单元,其中,
    所述获取单元,还用于获取所述目标对象对应的电子设备的所述目标应用的历史使用数据;
    所述建立单元,用于依据所述历史使用数据构建出多维特征层和ID-mapping关系层;
    所述确定单元,用于依据所述多维特征层和所述ID-mapping关系层确定出自然人ID,将所述自然人ID作为所述目标用户ID。
  18. 一种电子设备,其特征在于,包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法中的步骤的指令。
  19. 一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-9任一项所述的方法。
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1-9任一项所述的方法。
PCT/CN2019/093971 2019-06-29 2019-06-29 数据分类方法及相关产品 WO2021000084A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980089586.4A CN113366469A (zh) 2019-06-29 2019-06-29 数据分类方法及相关产品
PCT/CN2019/093971 WO2021000084A1 (zh) 2019-06-29 2019-06-29 数据分类方法及相关产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/093971 WO2021000084A1 (zh) 2019-06-29 2019-06-29 数据分类方法及相关产品

Publications (1)

Publication Number Publication Date
WO2021000084A1 true WO2021000084A1 (zh) 2021-01-07

Family

ID=74100206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093971 WO2021000084A1 (zh) 2019-06-29 2019-06-29 数据分类方法及相关产品

Country Status (2)

Country Link
CN (1) CN113366469A (zh)
WO (1) WO2021000084A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113366469A (zh) * 2019-06-29 2021-09-07 深圳市欢太科技有限公司 数据分类方法及相关产品

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357302B (zh) * 2021-12-31 2025-06-27 广州趣丸网络科技有限公司 一种信息存储的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198418A (zh) * 2013-03-15 2013-07-10 北京亿赞普网络技术有限公司 一种应用推荐方法和系统
CN106357895A (zh) * 2016-08-31 2017-01-25 上海斐讯数据通信技术有限公司 一种来电提示系统及来电提示方法
CN106850924A (zh) * 2017-01-23 2017-06-13 北京奇虎科技有限公司 通讯录数据处理方法及处理终端
CN109255640A (zh) * 2017-07-13 2019-01-22 阿里健康信息技术有限公司 一种确定用户分组的方法、装置及系统
CN109815406A (zh) * 2019-01-31 2019-05-28 腾讯科技(深圳)有限公司 一种数据处理、信息推荐方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013178286A1 (en) * 2012-06-01 2013-12-05 Qatar Foundation A method for processing a large-scale data set, and associated apparatus
CN104239324B (zh) * 2013-06-17 2019-09-17 阿里巴巴集团控股有限公司 一种基于用户行为的特征提取、个性化推荐的方法和系统
CN106503015A (zh) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 一种构建用户画像的方法
CN106548255A (zh) * 2016-11-24 2017-03-29 山东浪潮云服务信息科技有限公司 一种基于海量用户行为的商品推荐方法
CN113366469A (zh) * 2019-06-29 2021-09-07 深圳市欢太科技有限公司 数据分类方法及相关产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198418A (zh) * 2013-03-15 2013-07-10 北京亿赞普网络技术有限公司 一种应用推荐方法和系统
CN106357895A (zh) * 2016-08-31 2017-01-25 上海斐讯数据通信技术有限公司 一种来电提示系统及来电提示方法
CN106850924A (zh) * 2017-01-23 2017-06-13 北京奇虎科技有限公司 通讯录数据处理方法及处理终端
CN109255640A (zh) * 2017-07-13 2019-01-22 阿里健康信息技术有限公司 一种确定用户分组的方法、装置及系统
CN109815406A (zh) * 2019-01-31 2019-05-28 腾讯科技(深圳)有限公司 一种数据处理、信息推荐方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113366469A (zh) * 2019-06-29 2021-09-07 深圳市欢太科技有限公司 数据分类方法及相关产品

Also Published As

Publication number Publication date
CN113366469A (zh) 2021-09-07

Similar Documents

Publication Publication Date Title
WO2020257990A1 (zh) 设备推荐方法及相关产品
TWI684148B (zh) 聯絡人的分組處理方法及裝置
CN110472145B (zh) 一种内容推荐方法和电子设备
CN113939814B (zh) 内容推送方法及相关产品
US9241242B2 (en) Information recommendation method and apparatus
CN108280115B (zh) 识别用户关系的方法及装置
CN109033156B (zh) 一种信息处理方法、装置及终端
CN111125523B (zh) 搜索方法、装置、终端设备及存储介质
CN107977431A (zh) 图像处理方法、装置、计算机设备和计算机可读存储介质
CN108052591A (zh) 信息推荐方法、装置、移动终端及计算机可读存储介质
CN108121803A (zh) 一种确定页面布局的方法和服务器
CN111444425B (zh) 一种信息推送方法、电子设备及介质
CN108205568A (zh) 基于标签选择数据的方法及装置
CN108399232A (zh) 一种信息推送方法、装置及电子设备
CN107292235A (zh) 指纹的采集方法及相关产品
CN104980559A (zh) 一种设置彩铃、彩铃音乐确定方法及装置
CN108449481A (zh) 一种联系人信息推荐方法及终端
CN113950817B (zh) 内容推送方法及相关产品
WO2021000084A1 (zh) 数据分类方法及相关产品
CN113940033B (zh) 用户识别方法及相关产品
CN116307394A (zh) 产品用户体验评分方法、装置、介质及设备
CN108595481A (zh) 一种通知消息显示方法及终端设备
CN107368998A (zh) 日程管理方法及相关产品
CN113366523B (zh) 资源推送方法及相关产品
CN107707719B (zh) 一种联系人信息的显示方法及移动终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936318

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19936318

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/05/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19936318

Country of ref document: EP

Kind code of ref document: A1