[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113449232A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113449232A
CN113449232A CN202010231016.9A CN202010231016A CN113449232A CN 113449232 A CN113449232 A CN 113449232A CN 202010231016 A CN202010231016 A CN 202010231016A CN 113449232 A CN113449232 A CN 113449232A
Authority
CN
China
Prior art keywords
current
historical
parameter information
interactive data
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010231016.9A
Other languages
Chinese (zh)
Inventor
冯大龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010231016.9A priority Critical patent/CN113449232A/en
Publication of CN113449232A publication Critical patent/CN113449232A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring current interactive data of a current object, at least one first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance; when the current interactive data is detected to contain target parameter information corresponding to the preset statistical dimension parameters, second historical interactive data containing the target parameter information is determined from the first historical interactive data; and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information. By the technical scheme of the embodiment of the invention, the problem of limitation of application scene application in the prior art can be solved, real-time duplicate removal statistics is realized, the application range of the application scene is improved, and the individual requirements of users are met.

Description

Data processing method, device, equipment and storage medium
Technical Field
Embodiments of the present invention relate to computer technologies, and in particular, to a data processing method, an apparatus, a device, and a storage medium.
Background
With the rapid development of technologies such as artificial intelligence, internet and internet of things, real-time computation becomes more and more important. Real-time computing is the real-time analysis of massive streaming data on the second level to support instant management and decision-making based on the analysis results. In real-time computing, deduplication statistics are usually required, such as counting the number of users purchasing an item, or counting the number of independent visitors uv (unique visitors) of a website, and the like.
Currently, the existing deduplication statistical method is deduplication statistics by using a bloom filter. Specifically, a series of random mapping functions, such as Hash functions, are used to calculate a Hash value corresponding to each element that needs to be deduplicated, and the Hash value is stored as an index into a bit binary bit array, and the corresponding position in the binary bit array is stored as 1. When a new element is received, whether the position corresponding to the hash value of the new element in the binary digit array is 1 or not can be detected, if yes, the new element is indicated to exist, statistics is not needed, and if not, the statistics of the new element is carried out.
However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
the existing duplication elimination statistical mode can only judge whether the element exists or not, so that the element is subjected to statistics once when appearing for the first time, and the element cannot be subjected to statistics again after appearing. However, this deduplication statistical method cannot be applied to a scene where statistics is performed only once when two or more times occur, for example, a scene where the number of effective UV of a web page is counted, that is, a scene where a number of clicks of a website is greater than or equal to two times is regarded as an effective UV, so that the existing deduplication statistical method has a limitation of applicability to an application scene, and cannot meet personalized requirements of a user.
Disclosure of Invention
Embodiments of the present invention provide a data processing method, apparatus, device, and storage medium, to solve the problem of limitation of application scenario applicability in the prior art, implement real-time deduplication statistics, improve the application range of an application scenario, and meet personalized requirements of a user.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring current interactive data of a current object, at least one first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
when the current interactive data is detected to contain target parameter information corresponding to preset statistical dimension parameters, second historical interactive data containing the target parameter information is determined from the first historical interactive data, wherein the preset statistical dimension parameters are parameters selected from the preset interactive parameters in advance;
and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus, including:
the interactive data acquisition module is used for acquiring current interactive data of a current object, at least one piece of first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
the second historical interactive data determining module is used for determining second historical interactive data containing target parameter information from the first historical interactive data when the current interactive data is detected to contain the target parameter information corresponding to a preset statistical dimension parameter, wherein the preset statistical dimension parameter is a parameter selected from the preset interactive parameters in advance;
and the historical statistical information updating module is used for updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data processing method according to any embodiment of the present invention.
In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data processing method according to any embodiment of the present invention.
The embodiment of the invention has the following advantages or beneficial effects:
the historical repeated occurrence times of each first historical interaction data of each object are counted in advance, so that the historical repeated occurrence times corresponding to each element needing to be deduplicated can be obtained. When the statistical information corresponding to the target parameter information of the preset statistical dimension parameters is counted in real time, for example, when the number of objects corresponding to the target parameter information is counted, when it is detected that the current interactive data of the current object contains the target parameter information, second historical interactive data containing the target parameter information can be determined from each first historical interactive data of the current object, and according to the number of times of repeated occurrences of the history of the second historical interactive data and the current interactive data, the historical statistical information corresponding to the target parameter information can be updated based on preset statistical conditions to obtain the current statistical information, so that real-time deduplication statistics is realized. The preset statistical condition can be set correspondingly based on the user requirements, for example, the preset statistical condition can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to one time, or can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to two times, so that the method can be applied to a scene in which only one-time statistics is performed when one or more times occur, and can also be applied to a scene in which only one-time statistics is performed when two or more times occur, the application range of the application scene is greatly improved, and the personalized requirements of the user are met.
Drawings
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention;
FIG. 3 is an example of a history byte array according to a second embodiment of the present invention;
fig. 4 is a flowchart of a data processing method according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data processing apparatus according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data processing apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, which is applicable to a situation of performing real-time deduplication statistics on data, and in particular, can be used in a real-time deduplication statistics scenario in an e-commerce platform or a website. The method may be performed by a data processing apparatus, which may be implemented by software and/or hardware, and integrated in a device with data processing function, such as a desktop computer, a notebook computer, and the like. As shown in fig. 1, the method specifically includes the following steps:
s110, obtaining current interaction data of a current object, at least one first history interaction data and a history repeated occurrence number of the first history interaction data counted in advance.
The object may refer to an object for performing real-time deduplication statistics. For example, the object may refer to, but is not limited to, a user or an independent visitor UV. When counting the number of users who purchased a certain item, the object may refer to the user. When counting the number of UV of individual visitors of a website, an object may refer to UV. The current object may refer to an object that generates current interaction data at the current time. The current interaction data may refer to interaction data generated by a current object at a current time. The first historical interaction data may refer to interaction data generated by a current object at a historical time. For example, when counting the number of independent visitors UV of a website, the current interaction data may refer to log data of the website accessed by the current UV at the current time. When counting the number of users who purchase items, the current interaction data may refer to order information of the current user who purchased the items at the current time.
The current interaction data may include current parameter information corresponding to at least one preset interaction parameter, and the first historical interaction data may include historical parameter information corresponding to each preset interaction parameter. The preset interaction parameters may be preset and used to represent information of each dimension in the interaction data generated by the current object, so as to store each source data generated by the current object. The current parameter information may refer to a parameter value corresponding to a preset interaction parameter at the current time. The historical parameter information may refer to a parameter value corresponding to a preset interaction parameter at a historical time. The historical number of repetitions of the first historical interaction data may be used to characterize the number of repetitions of each historical parameter information in the first historical interaction data. For example, when the historical recurrence number of the first historical interaction data is two, the historical recurrence number of each historical parameter information in the first historical interaction data is indicated to be two.
For example, when performing real-time deduplication statistics in the e-commerce platform, the current interaction data of the current object may refer to the current item acquisition task data of the current user. The current item acquisition task data may refer to current order data. The preset interaction parameters in the current item acquisition task data may include, but are not limited to: at least one of current article acquisition platform information, a current article acquisition parent task identifier, a current article acquisition child task identifier, a current article attribution party identifier and current article acquisition time. The current item acquisition platform information may refer to order placement platform information, which may include but is not limited to: APP mobile terminal identification and PC computer terminal identification. The current item acquisition parent task identification may refer to a parent order identification of the order. The current item acquisition subtask identifier may refer to a subtask identifier of the order. For example, if a current user places an order for two items of different logistics, a nested parent order is generated, and a child order is generated for each item in the parent order. And if the current user performs ordering operation on a certain article or a plurality of articles belonging to the same logistics party, the current article acquisition parent task identifier is the same as the current article acquisition child task identifier. The current item identification may refer to an identification of an item purchased by the current user. The current item attribution identity may refer to a store identity where the current user purchased the item. The current item acquisition time may refer to a current user order placement time.
Similarly, the first historical interaction data of the current object may refer to historical item acquisition task data of the current user. The historical item acquisition task data may refer to historical order data. Historical item acquisition task data may include, but is not limited to: at least one of historical item acquisition platform information, historical item acquisition parent task identification, historical item acquisition child task identification, historical item attribution identification and historical item acquisition time.
Specifically, a cache unit may be allocated to each object in advance, and is used for storing the first historical interaction data generated by each object and the historical repeated occurrence number of the first historical interaction data counted in advance. When the current interactive data is acquired, according to a current object generating the current interactive data, acquiring each first historical interactive data of the current object and the historical repeated occurrence times of the first historical interactive data counted in advance from the cache unit corresponding to the current object.
It should be noted that the current interactive data and each of the first historical interactive data may be source data that is not subjected to data processing, so that real-time deduplication statistics can be performed on different preset statistical dimension parameters based on the same piece of interactive data, thereby avoiding a situation that performance overhead is too large due to the fact that a different bloom filter needs to be started when deduplication statistics is performed on different dimensions in the prior art, and greatly saving device resources.
S120, when the current interactive data is detected to contain target parameter information corresponding to the preset statistical dimension parameters, determining second historical interactive data containing the target parameter information from the first historical interactive data.
The preset statistical dimension parameter may be a parameter selected from preset interaction parameters in advance. The target parameter information may refer to a parameter value of a preset statistical dimension parameter referred to in performing the deduplication statistics process. For example, if the number of users purchasing the article a is counted, the preset statistical dimension parameter may be set as an article identifier, and the corresponding target parameter information may be set as a specific identifier of the article a. It should be noted that the preset statistical dimension parameter may be one selected parameter or a plurality of selected parameters. For example, if the number of users purchasing article a on 3 month and 23 days is counted, the preset statistical dimension parameter may be set as the article identifier and the article acquisition time, and the corresponding target parameter information may be set as the specific identifier of article a and 3 month and 23 days. The second historical interaction data may be the first historical interaction data that is filtered from the first historical interaction data and includes the target parameter information. The number of the second historical interaction data can be one or more, and the specific number is determined by the screening result.
Specifically, current parameter information corresponding to a preset statistical dimension parameter in the current interactive data may be matched with the target parameter information, and if the matching is successful, it is determined that the current interactive data includes the target parameter information. When it is detected that the current interactive data includes the target parameter information, it is indicated that the current interactive data may affect the historical statistical information corresponding to the target parameter information, that is, the historical statistical information corresponding to the target parameter information needs to be updated, at this time, the historical parameter information corresponding to the preset statistical dimension parameter in each first historical interactive data of the current object may be matched with the target parameter information, and if the matching is successful, it is determined that the first historical interactive data includes the target parameter information, so that each successfully-matched first historical interactive data may be obtained, that is, each second historical interactive data may be obtained.
And S130, updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
The preset statistical condition may be a statistical condition that is set in advance based on a user requirement and performs deduplication statistics. For example, the preset statistical condition may be set to be counted once when the number of repeated occurrences of the target parameter information is greater than or equal to one, or may be set to be counted once when the number of repeated occurrences of the target parameter information is greater than or equal to two, so that the preset statistical condition may be applied to a scene where the target parameter information is counted once when the target parameter information occurs once or more than one, or may be applied to a scene where the target parameter information is counted once when the target parameter information occurs twice or more than two, which greatly improves the application range of the application scene, and meets the personalized requirements of the user.
The historical statistical information corresponding to the target parameter information may refer to an object statistical result corresponding to the target parameter information obtained before the current time. For example, when counting the number of users purchasing the item a, the target parameter information is set as a specific identifier of the item a, and the history statistical information corresponding to the target parameter information may refer to the counted number of users purchasing the item a before the current time.
Specifically, in this embodiment, when the current interactive data includes the target parameter information, the number of times of repeated occurrence of the target parameter information in the current interactive data is determined to be a preset number of times, for example, 1 time. And the historical repeated occurrence frequency of the second historical interaction data is the historical repeated occurrence frequency of the target parameter information in the second historical interaction data. According to the historical repeated occurrence frequency of the target parameter information in each second historical interactive data and the repeated occurrence frequency of the target parameter information in the current interactive data, the historical statistical information corresponding to the target parameter information can be updated based on preset statistical conditions to obtain the current statistical information, and therefore real-time deduplication statistics is achieved. For example, when the preset statistical condition is set to be that the number of repeated occurrences of the target parameter information is greater than or equal to two times, the statistical condition is only counted once, if the total number of repeated occurrences of the target parameter information in each second historical interactive data is 1, the number of repeated occurrences of the target parameter information in the current interactive data is 1, and the historical statistical information corresponding to the target parameter information: if the number of the objects is 2, the current object is not counted in the history statistical information because the total number of repeated occurrences of the history is 1 time and does not meet the preset statistical condition, that is, the counted 2 objects do not include the current object. However, since the sum of the total number of repeated occurrences of the history and the number of repeated occurrences of the target parameter information in the current interactive data is 2 times, which satisfies the preset statistical condition, the history statistical information needs to be updated, that is, the number of objects is updated to 3. If the total number of repeated occurrences of the history of each second historical interaction data is 2, it indicates that the total number of repeated occurrences of the history meets the preset statistical condition for 2 times, so that the current object is counted in the historical statistical information, that is, the counted 2 objects include the current object, and at this time, even if the sum of the total number of repeated occurrences of the history and the number of repeated occurrences of the target parameter information in the current interaction data also meets the preset statistical condition for 3 times, the historical statistical information does not need to be updated, the historical statistical information can be directly determined as the current statistical information, so that real-time deduplication statistics is realized, the method can be applied to various application scenes of deduplication statistics, and the problem of scene adaptation limitation existing in the conventional deduplication statistical method is solved.
It should be noted that, when each first historical interaction data does not include the target parameter information, that is, the second historical interaction data does not exist, the total number of repeated occurrences of the history may be determined to be 0.
It should be noted that, when it is detected that the current interactive data does not include the target parameter information corresponding to the preset statistical dimension parameter, it indicates that the historical statistical information corresponding to the target parameter information does not need to be updated, and at this time, the historical statistical information may be directly determined as the current statistical information.
According to the technical scheme of the embodiment, the history repeated occurrence times of each first history interactive data of each object are counted in advance, so that the history repeated occurrence times corresponding to each element needing deduplication can be obtained. When the statistical information corresponding to the target parameter information of the preset statistical dimension parameters is counted in real time, for example, when the number of objects corresponding to the target parameter information is counted, when it is detected that the current interactive data of the current object contains the target parameter information, second historical interactive data containing the target parameter information can be determined from each first historical interactive data of the current object, and according to the number of times of repeated occurrences of the history of the second historical interactive data and the current interactive data, the historical statistical information corresponding to the target parameter information can be updated based on preset statistical conditions to obtain the current statistical information, so that real-time deduplication statistics is realized. The preset statistical condition can be set correspondingly based on the user requirements, for example, the preset statistical condition can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to one time, or can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to two times, so that the method can be applied to a scene in which only one-time statistics is performed when one or more times occur, and can also be applied to a scene in which only one-time statistics is performed when two or more times occur, the application range of the application scene is greatly improved, and the personalized requirements of the user are met.
On the basis of the above technical solution, after S130, the method may further include: matching each current parameter information in the current interactive data with corresponding historical parameter information in each first historical interactive data; if third history interactive data successfully matched with each piece of current parameter information in the current interactive data exist, updating the repeated times of history of the third history interactive data; and if the third history interactive data which are successfully matched with each piece of current parameter information in the current interactive data do not exist, storing the current interactive data into a current cache unit corresponding to the current object, wherein the current cache unit can be used for storing the first history interactive data of the current object.
Specifically, after the current statistical information is obtained, the current interactive data needs to be stored in the current cache unit corresponding to the current object, so as to store each interactive data generated by the current object in the current cache unit in real time. In this embodiment, the current parameter information corresponding to each preset interaction parameter in the current interaction data may be matched with the historical parameter information corresponding to the same preset interaction parameter in each first historical interaction data, if each current parameter information is successfully matched with the corresponding historical parameter information in a certain first historical interaction data, it is indicated that the current interaction data is the same as the first historical interaction data, at this time, it may be determined that the first historical interaction data is third historical interaction data, and the current interaction data is stored by updating the historical repeated occurrence number of the third historical interaction data, for example, the historical repeated occurrence number of the third historical interaction data may be added by 1. If the current interaction data is not matched with each first historical interaction data, that is, if third historical interaction data does not exist, the current interaction data can be independently stored in the current cache unit as a new first historical interaction data, and the historical recurrence frequency of the current interaction data is stored, for example, 1, so that each first historical interaction data and the historical recurrence frequency generated by the current object can be stored in the current cache unit in real time, and the real-time deduplication statistics can be accurately performed subsequently.
Illustratively, the current interaction data may further include: and identifying the interactive mode of the current object. The interactive mode identification may include: adding an identifier and canceling the identifier; the adding identifier may be used to represent that the current interactive data is data to be added to the current cache unit. The cancel identifier may be used to represent that the current interactive data is data to be cancelled, that is, data to be deleted in the current cache unit. For example, if the user places an order to purchase an item a, the interaction mode identifier in the current interaction data of the user is the addition identifier. And if the user cancels the order after placing the order to purchase the article A, the interactive mode identifier in the current interactive data of the user is a cancellation identifier.
Illustratively, updating the historical recurrence times of the third history interaction data may include: if the interactive mode identification in the current interactive data is the added identification, updating the interactive data by accumulating the repeated times of history of the interactive data of the third history by a preset numerical value; and if the interactive mode identifier in the current interactive data is the cancel identifier, updating in a mode of subtracting a preset numerical value from the repeated historical occurrence times of the interactive data of the third history. The preset value may refer to the number of times of processing corresponding to the addition identifier or the cancellation identifier. For example, the preset value may be set to 1.
Specifically, the history repeated occurrence number of the third history interaction data can be updated based on the interaction mode identification in the current interaction data. When the interactive mode identifier is the addition identifier, the current interactive data may be added by accumulating the historical repeated occurrence times of the third history interactive data by a preset value, for example, by accumulating 1. When the interactive mode is identified as the cancel identifier, the current interactive data can be deleted by subtracting a preset value, for example, subtracting 1, from the historical repeated occurrence number of the third history interactive data, so that the data deletion operation can be supported. In the prior art, when a group of binary bits BitMap is used for deduplication statistics, when an element needs to be deleted, the binary bit corresponding to the element may be updated from 1 to 0 to delete the element, but the deduplication statistics method only reflects whether the current operation is an adding operation or a deleting operation, and cannot count the current times after the adding and the deleting in real time. For example, when the same element is added twice in succession and then deleted once, the binary bit corresponding to the element in the BitMap is updated to 0, which indicates that the element does not exist currently, and the effect that the element is added once cannot be reflected. However, the history repeated occurrence frequency can be updated in real time based on the interactive mode identifier, so that the repeated occurrence frequency of the interactive data can be accurately stored, for example, when the same element is continuously added twice and then deleted once, the corresponding history repeated occurrence frequency is updated to 1, so that the method can be applied to an application scene with data deletion operation, and the application range of the application scene is further improved.
It should be noted that, in this embodiment, the addition identifier and the cancellation identifier may be represented by using signs and preset values. For example, if the preset value is set to 1, the addition identifier may be +1, and the cancellation identifier may be-1, so that the interactive mode identifier may be directly updated in a manner that the history repeated occurrence times of the third history interactive data are accumulated, thereby further simplifying the updating operation and improving the processing efficiency.
For example, due to network delay and the like, the current interactive data for cancellation is acquired first, and then the current interactive data for addition is acquired, so that for a case that the third history interactive data does not exist, when the current interactive data is stored in the current cache unit, the history repeated occurrence frequency corresponding to the current interactive data can be determined based on the interactive mode identifier. For example, when the interactive mode identifier is an addition identifier, a positive preset value, for example +1, may be used as the number of repeated occurrences of the history corresponding to the current interactive data. When the interactive mode identifier is the cancel identifier, a negative preset numerical value, such as-1, can be used as the historical repeated occurrence frequency corresponding to the current interactive data, so that the accuracy of data storage can be ensured.
On the basis of the above technical solution, S130 may include: determining the total repeated times of the history according to the repeated times of the history of the second historical interaction data; determining the total number of repeated occurrences at present according to the current interactive data and the total number of repeated occurrences in history; and judging whether the total repeated occurrence frequency of the history and the total repeated occurrence frequency of the current meet preset statistical conditions, and updating the history statistical information corresponding to the target parameter information according to the judgment result to obtain the current statistical information.
Specifically, the historical recurrence times of each second historical interaction data may be added, and the addition result may be determined as the total historical recurrence times of the target parameter information. If the current interactive data is only used for representing the interactive data to be added, the result of accumulating 1 in the total number of repeated occurrences in the history can be directly determined as the total number of repeated occurrences in the current of the target parameter information. Whether the current object is counted in the historical statistical information can be determined by judging whether the total repeated occurrence frequency of the history meets the preset statistical condition. Whether the current object can be counted at the current moment can be determined by judging whether the total number of repeated occurrences at the current moment meets a preset counting condition. According to the two judgment results, the historical statistical information corresponding to the target parameter information can be updated to obtain the current statistical information, so that real-time duplicate removal statistics is realized.
For example, the determining whether the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time satisfy a preset statistical condition, and updating the history statistical information corresponding to the target parameter information according to the determination result to obtain the current statistical information may include:
when the total number of repeated occurrences of the history meets a preset statistical condition and the total number of repeated occurrences of the current history does not meet the preset statistical condition, determining a result obtained by subtracting the preset number from the historical statistical information corresponding to the target parameter information as the current statistical information; when the total number of repeated occurrences of the history does not meet the preset statistical condition and the total number of repeated occurrences of the current history meets the preset statistical condition, determining a result obtained by accumulating the preset number of the historical statistical information corresponding to the target parameter information as the current statistical information; and when the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both meet the preset statistical condition, or the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both do not meet the preset statistical condition, determining the history statistical information corresponding to the target parameter information as the current statistical information.
The preset number may be a statistical value corresponding to the current object. For example, the preset value may be set to 1. Specifically, when the total number of repeated occurrences of the history satisfies the preset statistical condition and the total number of repeated occurrences of the current object does not satisfy the preset statistical condition, it indicates that the current object has been counted in the history statistical information, and the current object cannot be continuously counted at the current moment, so that a result obtained by subtracting the preset number from the history statistical information corresponding to the target parameter information needs to be determined as the current statistical information, for example, a result obtained by subtracting 1 from the number of objects counted in the history is determined as the number of objects counted at the current moment. When the total number of repeated occurrences of the history does not satisfy the preset statistical condition and the total number of repeated occurrences of the current object satisfies the preset statistical condition, it is indicated that the current object is not counted in the history statistical information, and the current object can be counted at the current moment, so that a result obtained by accumulating the preset number of the history statistical information corresponding to the target parameter information needs to be determined as the current statistical information, for example, a result obtained by accumulating the number of the objects counted by the history by 1 is determined as the current counted number of the objects. When the total repeated occurrence frequency of the history and the total repeated occurrence frequency of the current meet the preset statistical conditions, the current object is counted in the history statistical information, and the current object can be continuously counted at the current moment, so that the history statistical information does not need to be modified, and the history statistical information can be directly determined as the current statistical information. When the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time do not meet the preset statistical conditions, it is indicated that the current object is not counted in the history statistical information, and the current object cannot be counted at the current time, so that the history statistical information does not need to be modified, and the history statistical information can be directly determined as the current statistical information.
For example, when the current interaction data further includes an interaction mode identifier of the current object, determining the total number of current repeated occurrences according to the current interaction data and the total number of historical repeated occurrences may include: if the interactive mode identifier in the current interactive data is the added identifier, determining the result of accumulating the historical repeated occurrence total times by a preset numerical value as the current repeated occurrence total times; and if the interactive mode identifier in the current interactive data is the cancel identifier, determining the result of subtracting a preset numerical value from the historical repeated occurrence total number as the current repeated occurrence total number.
Specifically, the total number of current repeated occurrences may be determined according to the interaction mode identifier and the total number of historical repeated occurrences in the current interaction data. When the interactive mode identifier is the addition identifier, it indicates that the current interactive data is the addition data, and at this time, the total number of the historical repeated occurrences may be accumulated by a preset value, for example, the result of accumulating 1 is determined as the total number of the current repeated occurrences. When the interactive mode identifier is the cancel identifier, it indicates that the current interactive data is the deleted data, and at this time, the result of subtracting a preset value, for example, 1, from the historical repeated occurrence total number may be determined as the current repeated occurrence total number, so that the data deletion operation may be supported, and the method is applicable to an application scenario in which the data deletion operation exists, and further improves the application range of the application scenario.
It should be noted that, in this embodiment, when the addition identifier and the cancellation identifier in the interactive mode identifier are represented in a manner of using a sign and a preset value, a result of accumulating the historical repeated total times of the interactive mode identifier may be directly determined as the current repeated total times, so that the repeated statistical operation may be further simplified, and the statistical efficiency may be improved.
Example two
Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention, and this embodiment describes in detail a storage manner of current interaction data and each first history interaction data based on the above embodiment. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.
Referring to fig. 2, the data processing method provided in this embodiment specifically includes the following steps:
s210, converting historical parameter information in the first historical interactive data of the current object into byte-type historical parameter information, and generating a historical byte array corresponding to the first historical interactive data according to a preset arrangement sequence, the historical parameter information of each byte type and the historical repeated occurrence times of the first historical interactive data counted in advance.
Wherein, Byte type (Byte) may refer to a data type containing 8 binary bits. The preset arrangement sequence may be preset, and the arrangement sequence of each preset interaction parameter.
Specifically, the data type (e.g., int integer, short integer, long integer, etc.) of the history parameter information corresponding to each preset interaction parameter may be converted into a byte type, so as to obtain corresponding byte data, so that each parameter information may be uniformly stored in a byte array manner. And distributing corresponding byte quantity to each preset interactive parameter based on the maximum length of the parameter information corresponding to each preset interactive parameter, and storing corresponding parameter information in each distributed byte. The present embodiment may determine, based on the preset arrangement order, a position of the historical parameter information of each byte type in the historical byte array, and store the historical recurrence number of the first historical interaction data counted in advance at the preset position in the historical byte array, for example, the historical recurrence number of the first historical interaction data counted in advance may be stored on the first byte or the last byte of the historical byte array, so that the historical byte array corresponding to each first historical interaction data may be obtained.
Illustratively, FIG. 3 gives an example of a history byte array. As shown in fig. 3, when performing real-time deduplication statistics in the e-commerce platform, the first historical interaction data of the current object may refer to historical item acquisition task data of the current user. Each historical item acquiring task data may include: historical item acquisition platform information, historical item acquisition parent task identification, historical item acquisition child task identification, historical item attribution identification, historical item acquisition date, hours and minutes. Wherein, the historical item acquisition date may refer to the date of purchasing the item, which is accurate to the day, such as 3 months and 24 days in 2020. The number of hours may refer to the number of hours from 0 point of the historical item acquisition date at the present time. The number of minutes may refer to a number of minutes from 0 point 0 of the historical item acquisition date at the present time. For example, if the current time is 1 point 5 minutes on 3 months, 24 days in 2020, the number of hours is 1 hour, and the number of minutes is 65 minutes. As shown in fig. 3, the historical parameter information in each historical item acquisition task data and the historical recurrence number of the historical item acquisition task data may be stored into the corresponding historical byte array based on the arrangement order in fig. 3. Fig. 3 shows a data structure of each first history interaction data and the history recurrence number stored in the cache unit corresponding to one user, and the user identifier may be stored as a primary key in an external data source.
Exemplarily, "converting the historical parameter information in each of the first historical interaction data of the current object into the historical parameter information of the byte type" in S210 may include: if the target parameter information corresponding to the preset statistical dimension parameter comprises at least two pieces of statistical parameter information, binary coding is carried out on the target parameter information and each piece of statistical parameter information based on a preset coding mode, and binary coding information corresponding to the target parameter information and binary coding information corresponding to each piece of statistical parameter information are obtained; and determining historical binary coding information corresponding to the historical parameter information of the preset statistical dimension parameter in each first historical interactive data of the current object as historical parameter information of the byte type according to the binary coding information corresponding to each statistical parameter information.
The statistical parameter information can be used for representing the dimension value of the preset statistical dimension parameter. The target parameter information corresponding to the preset statistical dimension parameter may include one statistical parameter information, or may include two or more statistical parameter information, and the specific number thereof may be preset based on the service requirement. For example, if the number of users purchasing an article in the APP platform is counted, the preset interaction parameter of the article acquisition platform information is used as a preset statistic dimension parameter, the target parameter information corresponding to the preset statistic dimension parameter is the APP platform, and the target parameter information only includes one statistic parameter information, namely, the APP platform. For another example, if the number of users purchasing an article in a total platform (i.e., an APP platform and/or a PC platform) is counted, the preset interaction parameter of the article acquisition platform information is used as a preset statistical dimension parameter, the target parameter information corresponding to the preset statistical dimension parameter is an ALL platform, and the target parameter information includes two pieces of statistical parameter information, i.e., the APP platform and the PC platform.
The preset encoding mode may be preset, and when the target parameter information includes at least two pieces of statistical parameter information, the target parameter information having the "including and included" relationship and each piece of statistical parameter information are binary-encoded based on the logic of binary bit operation, so as to ensure that each piece of statistical parameter information can be included in the target parameter information. For example, table 1 gives an example of binary coded information corresponding to platform information. As shown in table 1, the result obtained by performing bit operation on binary coded information 0b0001(0b represents binary) corresponding to the target parameter information ALL and binary coded information 0b0011 corresponding to APP is also binary coded information corresponding to ALL, that is, APP & ALL ═ ALL; and the binary coded information 0b0001 corresponding to the target parameter information ALL is bit-operated with the binary coded information 0b0101 corresponding to the PC to obtain a result which is also bit-operated with the binary coded information corresponding to the ALL, that is, PC & ALL ═ ALL, so that based on the bit-operated, it can be ensured that both the APP and the PC are included in the ALL, so that the accuracy of the deduplication statistics can be ensured when deduplication statistics is subsequently performed based on the target parameter information ALL.
TABLE 1 binary coded values corresponding to platform information
Platform information ALL APP PC
Binary coded information 0b0001 0b0011 0b0101
Specifically, based on the determined binary coded information corresponding to each statistical parameter information, the historical parameter information corresponding to the preset statistical dimension parameter in each first historical interactive data may be matched with the statistical parameter information, and the binary coded information corresponding to the statistical parameter information that is successfully matched is determined as the historical binary coded information corresponding to the historical parameter information and is used as the historical parameter information of the byte type. For example, if the preset statistical dimension parameter is platform information and the historical platform information in a certain first historical interactive data is APP, the historical binary coding information corresponding to the historical platform information may be determined to be 0b0011 based on table 1, so that binary coding may be performed on each piece of historical parameter information corresponding to the target parameter information having the relation of "including and being included", so that information matching may be performed in a subsequent manner based on bit operation, and accuracy of deduplication statistics is improved.
S220, converting each piece of current parameter information in the current interactive data of the current object into the current parameter information of the byte type, and generating a current byte array corresponding to the current interactive data according to the preset arrangement sequence and the current parameter information of each byte type.
Specifically, the data type (e.g., int integer, short integer, long integer, etc.) of the current parameter information corresponding to each preset interaction parameter may be converted into a byte type, so as to obtain corresponding byte data, so that each parameter information may be uniformly stored in a byte array manner. Based on the preset arrangement sequence in the historical byte array, the current parameter information of each byte type is stored to the corresponding position in the current byte array, so that the byte storage positions in the current byte array and the historical byte array are the same, and the parameter information corresponding to the same preset interaction parameter is conveniently acquired and subjected to information matching.
For example, when the current interaction data further includes the interaction mode identifier of the current object, the interaction mode identifier of the current object may be stored at a preset position of the current byte array, for example, the interaction mode identifier of the current object may be stored on the first byte or the last byte of the current byte array.
Exemplarily, "converting each current parameter information in the current interaction data of the current object into the current parameter information of the byte type" in S220 may include: when the target parameter information corresponding to the preset statistical dimension parameter comprises at least two pieces of statistical parameter information, determining current binary coding information corresponding to the current parameter information of the preset statistical dimension parameter in the current interactive data of the current object according to the binary coding information corresponding to each piece of statistical parameter information, and using the current binary coding information as the current parameter information of the byte type.
Specifically, if the preset statistical dimension parameter is platform information and the current platform information in the current interactive data is PC, the current binary coding information corresponding to the current platform information may be determined to be 0b0101 based on table 1, so that the current parameter information corresponding to the target parameter information having the relation of "including and included" may be binary coded, and information matching may be performed subsequently based on a bit operation, thereby improving the accuracy of deduplication statistics.
S230, acquiring a current byte array corresponding to the current interactive data of the current object and a historical byte array corresponding to each first historical interactive data.
S240, when the current interactive data is detected to contain target parameter information corresponding to the preset statistical dimension parameters, determining second historical interactive data containing the target parameter information from the first historical interactive data.
Specifically, if the target parameter information corresponding to the preset statistical dimension parameter is non-binary coded information, it may be determined whether the current interactive data includes the target parameter information directly in a character matching manner, for example, it is detected whether the current parameter information corresponding to the preset statistical dimension parameter in the current interactive data is the same as the target parameter information, and if so, it is determined that the current interactive data includes the target parameter information. When it is detected that the current interactive data includes the target parameter information, second historical interactive data including the target parameter information may also be determined from the first historical interactive data based on the character matching method. For example, "determining second historical interaction data containing target parameter information from the first historical interaction data" may include: and detecting whether historical parameter information corresponding to the preset statistical dimension parameter in the first historical interactive data is the same as the target parameter information, if so, determining that the first historical interactive data contains the target parameter information, namely determining the first historical interactive data as second historical interactive data.
For example, if the target parameter information corresponding to the preset statistical dimension parameter is binary coded information, it may be determined whether the current interactive data includes the target parameter information in a binary bit operation manner, for example, the current binary coded information corresponding to the preset statistical dimension parameter in the current interactive data is bit-operated with the target parameter information, and whether a bit operation result is the same as the target parameter information is detected, and if so, it is determined that the current interactive data includes the target parameter information. When it is detected that the current interactive data includes the target parameter information, second historical interactive data including the target parameter information may also be determined from the first historical interactive data based on the binary bit operation. For example, "determining second historical interaction data containing target parameter information from the first historical interaction data" may include: and performing bit operation on historical binary coding information corresponding to the preset statistical dimension parameter in the first historical interactive data and the target parameter information, detecting whether a bit operation result is the same as the target dimension parameter information, and if so, determining that the first historical interactive data comprises the target parameter information, namely determining the first historical interactive data as second historical interactive data.
It should be noted that, in this embodiment, whether the current interactive data and the first historical interactive data include the target parameter information or not can be determined more accurately and conveniently by means of binary bit operation, and the deduplication statistical efficiency is further improved.
And S250, updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
According to the technical scheme, historical parameter information in each first historical interactive data is converted into byte type historical parameter information, a historical byte array corresponding to each first historical interactive data is generated, each current parameter information in the current interactive data is converted into byte type current parameter information, and a current byte array corresponding to the current interactive data is generated, so that information matching can be performed more conveniently by using the byte data in the byte array, information matching can be performed in a binary bit operation mode, and the deduplication statistical efficiency is further improved.
On the basis of the technical scheme, the method further comprises the following steps: generating a historical hash value corresponding to the historical interactive data according to each historical parameter information in the first historical interactive data, and adding the historical hash value to a historical byte array; and generating a current hash value corresponding to the current interactive data according to each current parameter information in the current interactive data, and adding the current hash value to the current byte array.
Specifically, based on a random mapping function, hash calculation may be performed on each historical parameter information in the first historical interactive data to determine a historical hash value, and hash calculation may be performed on each current parameter information in the current interactive data to determine a current hash value. The historical hash value may be stored at a preset position in the historical byte array and the current hash value may be stored at a preset position in the current byte array. As shown in FIG. 3, the historical hash value may be stored at the first byte in the historical byte array. After S250, it may be determined whether there is third history interactive data successfully matched with each current parameter information in the current interactive data from each first history interactive data by comparing whether the current hash value is the same as each history hash value. If the historical hash value corresponding to a certain first historical interactive data is the same as the current hash value, it is indicated that the first historical interactive data is possibly third historical interactive data, at this time, it may be continuously detected whether each piece of historical parameter information in the first historical interactive data is matched with the corresponding current historical parameter, and if so, it is determined that the first historical interactive data is the third historical interactive data. If the historical hash value corresponding to each first historical interactive data is different from the current hash value, the fact that no third historical interactive data exists can be directly determined, and therefore whether third historical interactive data matched with the current interactive data exists or not can be rapidly determined based on the hash values, and storage efficiency of the current interactive data is improved.
EXAMPLE III
Fig. 4 is a flowchart of a data processing method according to a third embodiment of the present invention, where "when it is detected that current interactive data includes target parameter information corresponding to a preset statistical dimension parameter, determining second historical interactive data including the target parameter information from the first historical interactive data" is optimized in this embodiment. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.
Referring to fig. 4, the data processing method provided in this embodiment specifically includes the following steps:
s310, obtaining current interaction data of a current object, at least one first history interaction data and the history repeated occurrence times of the first history interaction data counted in advance.
S320, determining target parameter information matched with current parameter information corresponding to the preset statistical dimension parameter in the current interactive data from all candidate parameter information corresponding to the preset statistical dimension parameter.
The candidate parameter information may refer to a parameter value that is applicable to statistics and corresponds to a preset statistical dimension parameter. For example, if the preset statistical dimension parameter is an item identifier, the specific identifier of each item in the item database may be used as candidate parameter information, and/or the specific identifiers of two or more items may be used as candidate parameter information (in this case, binary coded information corresponding to the specific identifiers of two or more items may be determined as candidate parameter information). For another example, if the preset statistical dimension parameter is the item acquisition platform information, binary coding information corresponding to the APP platform, binary coding information corresponding to the PC platform, and binary coding information corresponding to the total ALL platform may ALL be used as candidate parameter information, so that duplicate removal statistics may be performed on each candidate parameter information.
Specifically, each candidate parameter information corresponding to the preset statistical dimension parameter may be matched with the current parameter information corresponding to the preset statistical dimension parameter in the current interactive data, so as to determine the target parameter information matched with the current parameter information. For example, if the candidate parameter information is non-binary coded information, it may be directly detected whether the candidate parameter information is the same as the current parameter information in a character matching manner, and if so, it is determined that the candidate parameter information is the target parameter information. If the candidate parameter information is binary coded information, bit operation can be performed on the candidate parameter information and the current parameter information in a binary bit operation mode, whether a bit operation result is the same as the candidate parameter information or not is detected, and if yes, the candidate parameter information is target parameter information.
It should be noted that one or more target parameter information matched with the current parameter information may be provided, and for each target parameter information, the historical statistical information corresponding to each target parameter information may be updated through the following steps S330 to S350, so that duplicate removal statistics may be performed on each candidate parameter information of the preset statistical dimension parameter at the same time based on the same interactive data, thereby avoiding the situation that performance overhead is too large due to the fact that a different bloom filter needs to be opened when duplicate removal statistics is performed on different candidate parameter information in the prior art, greatly saving device resources, and achieving duplicate removal statistics based on dimensions rather than duplicate removal statistics of a specific dimension value.
Exemplarily, if the preset statistical dimension parameter is the item acquisition platform information, the corresponding candidate parameter information is: binary coding information 0b0011 corresponding to the APP platform, binary coding information 0b0101 corresponding to the PC platform, and binary coding information 0b0001 corresponding to the total ALL platform. Table 2 gives an example of current interaction data for a current object. As shown in table 2, the current platform information in the current interactive data is 0b0011 (i.e., APP platform), so that bit operation can be performed based on the binary representation information corresponding to each candidate parameter information, that is, 0b0011&0b0011 ═ 0b0011 indicates matching, and at this time, it may be determined that the candidate parameter information 0b0011 is the target parameter information; 0b0101&0b0011 ═ 0b0001, which indicates a mismatch, it may be determined that the candidate parameter information 0b0101 is not the target parameter information; 0b0001&0b0011 ═ 0b0001, which indicates a match, and at this time, the candidate parameter information 0b0001 may be determined as the target parameter information, so that two pieces of target parameter information may be determined: 0b0011 (i.e., APP platform) and 0b0001 (i.e., ALL platform), and performs deduplication statistics for each piece of target parameter information through the following steps S330-S350.
TABLE 2 Current interaction data for the Current object
Figure BDA0002429277290000251
It should be noted that, in the embodiment, deduplication statistics may be performed based on one preset statistical dimension parameter, or deduplication statistics may be performed based on a plurality of preset statistical dimension parameters. For example, when the preset statistical dimension parameter includes two parameters, namely article acquisition platform information and article identification, target parameter information matched with current parameter information corresponding to the preset statistical dimension parameter in the current interactive data can be determined from each candidate parameter information corresponding to each preset statistical dimension parameter, for example, the determined target parameter information is platform information 0b0011 (namely APP platform) and article identification 21, so that the number of users purchasing article identification 21 in the APP platform can be counted, multi-dimensional deduplication statistics is realized, the situation that performance consumption is too large due to independent statistics of each dimension is avoided, and device resources are further saved.
S330, determining second historical interactive data containing target parameter information from the first historical interactive data.
For example, if the preset statistical dimension parameter is an item identifier, the target parameter information that can be determined based on table 2 is: the item identifier 21 is non-binary coded information, so that the historical item identifier and the target parameter information in each first historical interactive data can be directly subjected to character matching. Table 3 gives an example of the respective first historical interaction data of a current object. In table 3, two second historical interaction data may be determined from the three first historical interaction data, which are the first historical interaction data of the first row and the first historical interaction data of the third row.
TABLE 3 respective first historical interaction data for the current object
Figure BDA0002429277290000261
S340, determining historical statistical information corresponding to the target parameter information according to the mapping relation between each candidate parameter information corresponding to the preset statistical dimension parameter and the historical statistical information.
The historical statistical information may refer to an object statistical result corresponding to each candidate parameter information obtained before the current time. Specifically, the present embodiment may pre-store a mapping relationship between each candidate parameter information corresponding to the preset statistical dimension parameter and the historical statistical information, so as to determine the historical statistical information corresponding to the target parameter information based on the mapping relationship, that is, obtain the historical statistical information that needs to be updated currently.
Illustratively, when the preset statistical dimension parameter is an article identifier and the target parameter information is an article identifier 21, the historical statistical information corresponding to the target parameter information is: the historical number of users who purchased the item identifier 21 counted before the current time.
And S350, updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
Illustratively, when the preset statistical dimension parameter is the item identifier, following the examples in table 2 and table 3, the historical recurrence times of the two second historical interaction data in table 3 are 1 and 1, respectively, and the interaction mode identifier in the current interaction data in table 2 is-1, so that it can be determined that the historical recurrence time is 2, and the current recurrence time is 1. If the preset statistical condition is that the current object is counted only once when the repeated occurrence frequency of the target parameter information is greater than or equal to one time, the current object is determined to have been counted by the number of the historical users based on the total repeated occurrence frequency 2, and the current object can be counted at the current time based on the total repeated occurrence frequency 1, so that the counted number of the historical users does not need to be modified, namely the number of the historical users is directly used as the counted number of the current users at the current time. If the preset statistical condition is that the number of repeated occurrences of the target parameter information is greater than or equal to two times and only one time is counted, it is determined that the current object is counted by the number of historical users based on the total number of repeated occurrences 2, it is determined that the current object cannot be counted at the current moment based on the total number of repeated occurrences 1, so that 1 is required to be subtracted from the counted number of historical users, the counted number of historical users is determined as the counted number of current users at the current moment, and the number of users purchasing the item identifier 21 is counted in real time based on the current interactive data.
In the technical scheme of the embodiment, the target parameter information matched with the current parameter information corresponding to the preset statistical dimension parameter in the current interactive data is determined from the candidate parameter information corresponding to the preset statistical dimension parameter, the historical statistical information corresponding to the target parameter information is determined according to the mapping relation between each candidate parameter information corresponding to the preset statistical dimension parameter and the historical statistical information, and the historical statistical information is subjected to real-time deduplication statistics, so that the deduplication statistics can be simultaneously performed on the candidate parameter information of the preset statistical dimension parameter based on the same interactive data, the condition that performance cost is overlarge due to the fact that different bloom filters need to be started when deduplication statistics is performed on different candidate parameter information in the prior art is avoided, equipment resources are greatly saved, and deduplication statistics based on dimensions is realized, rather than deduplication statistics for a particular dimension value.
The following is an embodiment of a data processing apparatus according to an embodiment of the present invention, which belongs to the same inventive concept as the data processing methods of the above embodiments, and reference may be made to the above embodiments of the data processing method for details that are not described in detail in the embodiments of the data processing apparatus.
Example four
Fig. 5 is a schematic structural diagram of a data processing apparatus according to a fourth embodiment of the present invention, which is applicable to a situation of performing real-time deduplication statistics on data, and the apparatus specifically includes: an interactive data acquisition module 410, a second historical interactive data determination module 420, and a historical statistical information update module 430.
The interactive data acquiring module 410 is configured to acquire current interactive data of a current object, at least one first historical interactive data, and historical repeated occurrence times of the first historical interactive data counted in advance, where the current interactive data includes current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data includes historical parameter information corresponding to each preset interactive parameter; a second historical interactive data determining module 420, configured to determine, when it is detected that the current interactive data includes target parameter information corresponding to a preset statistical dimension parameter, second historical interactive data including the target parameter information from the first historical interactive data, where the preset statistical dimension parameter is a parameter selected from preset interactive parameters in advance; and a historical statistical information updating module 430, configured to update the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on a preset statistical condition, so as to obtain the current statistical information.
According to the technical scheme of the embodiment, the history repeated occurrence times of each first history interactive data of each object are counted in advance, so that the history repeated occurrence times corresponding to each element needing deduplication can be counted. When the statistical information corresponding to the target parameter information of the preset statistical dimension parameters is counted in real time, for example, when the number of objects corresponding to the target parameter information is counted, when it is detected that the current interactive data of the current object contains the target parameter information, second historical interactive data containing the target parameter information can be determined from each first historical interactive data of the current object, and according to the number of times of repeated occurrences of the history of the second historical interactive data and the current interactive data, the historical statistical information corresponding to the target parameter information can be updated based on preset statistical conditions to obtain the current statistical information, so that real-time deduplication statistics is realized. The preset statistical condition can be set correspondingly based on the user requirements, for example, the preset statistical condition can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to one time, or can be set to be only counted once when the repeated occurrence frequency of the target parameter information is greater than or equal to two times, so that the method can be applied to a scene in which only one-time statistics is performed when one or more times occur, and can also be applied to a scene in which only one-time statistics is performed when two or more times occur, the application range of the application scene is greatly improved, and the personalized requirements of the user are met.
Optionally, the apparatus further comprises:
the parameter information matching module is used for matching each piece of current parameter information in the current interactive data with corresponding historical parameter information in each piece of first historical interactive data after updating the historical statistical information corresponding to the target parameter information and obtaining the current statistical information;
the history repeated occurrence frequency updating module is used for updating the history repeated occurrence frequency of the third history interactive data if the third history interactive data successfully matched with each piece of current parameter information in the current interactive data exists;
and the current interactive data storage module is used for storing the current interactive data into a current cache unit corresponding to the current object if third history interactive data which are successfully matched with each piece of current parameter information in the current interactive data do not exist, wherein the current cache unit is used for storing the first history interactive data of the current object.
Optionally, the current interaction data further includes an interaction mode identifier of the current object, and the interaction mode identifier includes an addition identifier and a cancellation identifier;
correspondingly, the history repeated occurrence number updating module is specifically configured to: if the interactive mode identification in the current interactive data is the added identification, updating the interactive data by accumulating the repeated times of history of the interactive data of the third history by a preset numerical value; and if the interactive mode identifier in the current interactive data is the cancel identifier, updating in a mode of subtracting a preset numerical value from the repeated historical occurrence times of the interactive data of the third history.
Optionally, the historical statistical information updating module 430 includes:
the history repeated occurrence total number determining unit is used for determining the history repeated occurrence total number according to the history repeated occurrence number of each second history interactive data;
the current repeated occurrence total number determining unit is used for determining the current repeated occurrence total number according to the current interactive data and the historical repeated occurrence total number;
and the history statistical information updating unit is used for judging whether the total repeated occurrence frequency of the history and the total repeated occurrence frequency of the current meet the preset statistical conditions or not, and updating the history statistical information corresponding to the target parameter information according to the judgment result to obtain the current statistical information.
Optionally, the historical statistical information updating unit is specifically configured to: when the total number of repeated occurrences of the history meets a preset statistical condition and the total number of repeated occurrences of the current history does not meet the preset statistical condition, determining a result obtained by subtracting the preset number from the historical statistical information corresponding to the target parameter information as the current statistical information; when the total number of repeated occurrences of the history does not meet the preset statistical condition and the total number of repeated occurrences of the current history meets the preset statistical condition, determining a result obtained by accumulating the preset number of the historical statistical information corresponding to the target parameter information as the current statistical information; and when the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both meet the preset statistical condition, or the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both do not meet the preset statistical condition, determining the history statistical information corresponding to the target parameter information as the current statistical information.
Optionally, the current interaction data further includes an interaction mode identifier of the current object, and the interaction mode identifier includes an addition identifier and a cancellation identifier;
correspondingly, the current total number of repeated occurrences determining unit is specifically configured to: if the interactive mode identifier in the current interactive data is the added identifier, determining the result of accumulating the historical repeated occurrence total times by a preset numerical value as the current repeated occurrence total times; and if the interactive mode identifier in the current interactive data is the cancel identifier, determining the result of subtracting a preset numerical value from the historical repeated occurrence total number as the current repeated occurrence total number.
Optionally, the apparatus further comprises:
the historical byte array generating module is used for converting historical parameter information in the first historical interactive data of the current object into historical parameter information of byte types before acquiring the current interactive data of the current object, at least one first historical interactive data and the historical repeated occurrence times of the first historical interactive data counted in advance, and generating a historical byte array corresponding to the first historical interactive data according to a preset arrangement sequence, the historical parameter information of each byte type and the historical repeated occurrence times of the first historical interactive data counted in advance;
and the current byte array generating module is used for converting each current parameter information in the current interactive data of the current object into the current parameter information of the byte type, and generating the current byte array corresponding to the current interactive data according to the preset arrangement sequence and the current parameter information of each byte type.
Optionally, the history byte array generating module is further configured to: if the target parameter information corresponding to the preset statistical dimension parameter comprises at least two pieces of statistical parameter information, binary coding is carried out on the target parameter information and each piece of statistical parameter information based on a preset coding mode, and binary coding information corresponding to the target parameter information and binary coding information corresponding to each piece of statistical parameter information are obtained; according to binary coding information corresponding to each statistical parameter information, determining historical binary coding information corresponding to historical parameter information of a preset statistical dimension parameter in each first historical interactive data of the current object as historical parameter information of a byte type;
accordingly, the current byte array generating module is further configured to: and determining current binary coding information corresponding to the current parameter information of the preset statistical dimension parameter in the current interactive data of the current object as the current parameter information of the byte type according to the binary coding information corresponding to each piece of statistical parameter information.
Optionally, the second historical interaction data determining module 420 is further configured to:
if the target parameter information corresponding to the preset statistical dimension parameter is binary coded information, performing bit operation on the historical binary coded information corresponding to the preset statistical dimension parameter in the first historical interactive data and the target parameter information, and determining that the first historical interactive data contains the target parameter information when the bit operation result is the same as the target parameter information; if the target parameter information corresponding to the preset statistical dimension parameter is non-binary coded information, detecting whether the historical parameter information corresponding to the preset statistical dimension parameter in the first historical interactive data is the same as the target parameter information, and if so, determining that the first historical interactive data contains the target parameter information.
Optionally, the apparatus further comprises:
the history hash value adding module is used for generating a history hash value corresponding to the history interactive data according to each history parameter information in the first history interactive data and adding the history hash value to the history byte array;
and the current hash value adding module is used for generating a current hash value corresponding to the current interactive data according to each current parameter information in the current interactive data and adding the current hash value to the current byte array.
Optionally, the second historical interaction data determining module 420 is specifically configured to: determining target parameter information matched with current parameter information corresponding to the preset statistical dimension parameters in the current interactive data from all candidate parameter information corresponding to the preset statistical dimension parameters, and determining second historical interactive data containing the target parameter information from the first historical interactive data;
correspondingly, the device also comprises: a historical statistical information determination module to: and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions, and determining the historical statistical information corresponding to the target parameter information according to the mapping relation between each candidate parameter information corresponding to the preset statistical dimension parameter and the historical statistical information before obtaining the current statistical information.
Optionally, the current interactive data of the current object refers to current item acquisition task data of the current user; the current item acquisition task data comprises: at least one of current article acquisition platform information, a current article acquisition parent task identifier, a current article acquisition child task identifier, a current article attribution party identifier and current article acquisition time;
the first historical interactive data of the current object refers to historical item acquisition task data of the current user; the historical item acquisition task data comprises the following steps: at least one of historical item acquisition platform information, historical item acquisition parent task identification, historical item acquisition child task identification, historical item attribution identification and historical item acquisition time.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional module and the beneficial effect of executing the data processing method.
It should be noted that, in the embodiment of the data processing apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE five
Fig. 6 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention. Fig. 6 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 6 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention.
As shown in FIG. 6, device 12 is in the form of a general purpose computing device. The components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with device 12, and/or with any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with the other modules of the device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, to implement a data processing method provided by the embodiment of the present invention, the method including:
acquiring current interactive data of a current object, at least one first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
when the current interactive data is detected to contain target parameter information corresponding to preset statistical dimension parameters, second historical interactive data containing the target parameter information is determined from the first historical interactive data, wherein the preset statistical dimension parameters are parameters selected from preset interactive parameters in advance;
and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
Of course, those skilled in the art can understand that the processor can also implement the technical solution of the data processing method provided by any embodiment of the present invention.
EXAMPLE six
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a data processing method according to any embodiment of the invention, the method comprising:
acquiring current interactive data of a current object, at least one first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
when the current interactive data is detected to contain target parameter information corresponding to preset statistical dimension parameters, second historical interactive data containing the target parameter information is determined from the first historical interactive data, wherein the preset statistical dimension parameters are parameters selected from preset interactive parameters in advance;
and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (15)

1. A data processing method, comprising:
acquiring current interactive data of a current object, at least one first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
when the current interactive data is detected to contain target parameter information corresponding to preset statistical dimension parameters, second historical interactive data containing the target parameter information is determined from the first historical interactive data, wherein the preset statistical dimension parameters are parameters selected from the preset interactive parameters in advance;
and updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
2. The method according to claim 1, wherein after updating the historical statistical information corresponding to the target parameter information and obtaining the current statistical information, the method further comprises:
matching each current parameter information in the current interactive data with corresponding historical parameter information in each first historical interactive data;
if third history interactive data successfully matched with each piece of current parameter information in the current interactive data exist, updating the repeated historical occurrence times of the third history interactive data;
and if the third history interactive data which are successfully matched with each piece of current parameter information in the current interactive data do not exist, storing the current interactive data into a current cache unit corresponding to the current object, wherein the current cache unit is used for storing the first history interactive data of the current object.
3. The method of claim 2, wherein the current interaction data further comprises an interaction mode identifier of the current object, wherein the interaction mode identifier comprises an addition identifier and a cancellation identifier;
accordingly, updating the historical recurrence times of the third history interaction data comprises:
if the interactive mode identification in the current interactive data is the added identification, updating the current interactive data in a mode of accumulating a preset numerical value on the repeated times of history of the interactive data of the third history;
and if the interactive mode identifier in the current interactive data is a cancel identifier, updating in a mode of subtracting a preset numerical value from the repeated occurrence times of the history of the third history interactive data.
4. The method of claim 1, wherein updating historical statistical information corresponding to the target parameter information according to the historical repeated occurrence frequency of the second historical interaction data and the current interaction data based on a preset statistical condition to obtain current statistical information comprises:
determining the total repeated times of the history according to the repeated times of the history of the second historical interaction data;
determining the total repeated occurrence frequency according to the current interactive data and the total repeated occurrence frequency of the history;
and judging whether the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time meet preset statistical conditions, and updating the history statistical information corresponding to the target parameter information according to the judgment result to obtain the current statistical information.
5. The method according to claim 4, wherein determining whether the total number of repeated occurrences in the history and the total number of repeated occurrences in the current time satisfy a preset statistical condition, and updating the historical statistical information corresponding to the target parameter information according to the determination result to obtain current statistical information comprises:
when the total number of repeated occurrences of the history meets a preset statistical condition and the total number of repeated occurrences of the current history does not meet the preset statistical condition, determining a result obtained by subtracting a preset number from the historical statistical information corresponding to the target parameter information as the current statistical information;
when the total number of repeated occurrences of the history does not meet a preset statistical condition and the total number of repeated occurrences of the current history meets the preset statistical condition, determining a result obtained by accumulating the preset number of the historical statistical information corresponding to the target parameter information as the current statistical information;
and when the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both meet preset statistical conditions, or the total number of repeated occurrences of the history and the total number of repeated occurrences of the current time both do not meet preset statistical conditions, determining the history statistical information corresponding to the target parameter information as the current statistical information.
6. The method of claim 4, wherein the current interaction data further comprises an interaction mode identifier of the current object, wherein the interaction mode identifier comprises an addition identifier and a cancellation identifier;
correspondingly, determining the total number of repeated occurrences at present according to the current interactive data and the total number of repeated occurrences at history, comprising:
if the interactive mode identifier in the current interactive data is an added identifier, determining the result of accumulating the historical repeated occurrence total times by a preset numerical value as the current repeated occurrence total times;
and if the interactive mode identifier in the current interactive data is a cancel identifier, determining the result of subtracting a preset numerical value from the historical repeated occurrence total number as the current repeated occurrence total number.
7. The method of claim 1, before obtaining current interaction data of a current object, at least one first historical interaction data and a pre-counted number of repeated occurrences of the first historical interaction data, comprising:
converting historical parameter information in first historical interactive data of a current object into historical parameter information of byte types, and generating a historical byte array corresponding to the first historical interactive data according to a preset arrangement sequence, the historical parameter information of each byte type and the historical repeated occurrence times of the first historical interactive data counted in advance;
converting each current parameter information in the current interactive data of the current object into the current parameter information of the byte type, and generating a current byte array corresponding to the current interactive data according to the preset arrangement sequence and the current parameter information of each byte type.
8. The method of claim 7, wherein converting the historical parameter information in each of the first historical interaction data of the current object into byte-type historical parameter information comprises:
if the target parameter information corresponding to the preset statistical dimension parameter comprises at least two pieces of statistical parameter information, binary coding is carried out on the target parameter information and each piece of statistical parameter information based on a preset coding mode, and binary coding information corresponding to the target parameter information and binary coding information corresponding to each piece of statistical parameter information are obtained;
according to binary coding information corresponding to each statistical parameter information, determining historical binary coding information corresponding to historical parameter information of a preset statistical dimension parameter in each first historical interactive data of the current object as historical parameter information of a byte type;
correspondingly, converting each current parameter information in the current interactive data of the current object into the current parameter information of byte type, including:
and determining current binary coding information corresponding to the current parameter information of the preset statistical dimension parameter in the current interactive data of the current object as the current parameter information of the byte type according to the binary coding information corresponding to each statistical parameter information.
9. The method of claim 8, wherein determining second historical interaction data including the target parameter information from the first historical interaction data comprises:
if the target parameter information corresponding to the preset statistical dimension parameter is binary coded information, performing bit operation on the historical binary coded information corresponding to the preset statistical dimension parameter in the first historical interactive data and the target parameter information, and determining that the first historical interactive data contains the target parameter information when the bit operation result is the same as the target parameter information;
if the target parameter information corresponding to the preset statistical dimension parameter is non-binary coded information, detecting whether the historical parameter information corresponding to the preset statistical dimension parameter in the first historical interactive data is the same as the target parameter information, and if so, determining that the first historical interactive data contains the target parameter information.
10. The method of claim 7, further comprising:
generating a historical hash value corresponding to the historical interactive data according to each historical parameter information in the first historical interactive data, and adding the historical hash value to the historical byte array;
and generating a current hash value corresponding to the current interactive data according to each piece of current parameter information in the current interactive data, and adding the current hash value to the current byte array.
11. The method according to any one of claims 1 to 10, wherein when it is detected that the current interactive data includes target parameter information corresponding to a preset statistical dimension parameter, determining second historical interactive data including the target parameter information from the first historical interactive data includes:
determining target parameter information matched with current parameter information corresponding to preset statistical dimension parameters in the current interactive data from each candidate parameter information corresponding to the preset statistical dimension parameters, and determining second historical interactive data containing the target parameter information from the first historical interactive data;
correspondingly, before updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence frequency of the second historical interaction data and the current interaction data based on a preset statistical condition to obtain the current statistical information, the method further includes:
and determining historical statistical information corresponding to the target parameter information according to the mapping relation between each candidate parameter information corresponding to the preset statistical dimension parameter and the historical statistical information.
12. The method according to any one of claims 1 to 10,
the current interactive data of the current object refers to the current item acquisition task data of the current user; the current item acquisition task data comprises: at least one of current article acquisition platform information, a current article acquisition parent task identifier, a current article acquisition child task identifier, a current article attribution party identifier and current article acquisition time;
the first historical interactive data of the current object refers to historical item acquisition task data of a current user; the historical item acquisition task data comprises the following steps: at least one of historical item acquisition platform information, historical item acquisition parent task identification, historical item acquisition child task identification, historical item attribution identification and historical item acquisition time.
13. A data processing apparatus, comprising:
the interactive data acquisition module is used for acquiring current interactive data of a current object, at least one piece of first historical interactive data and historical repeated occurrence times of the first historical interactive data counted in advance, wherein the current interactive data comprises current parameter information corresponding to at least one preset interactive parameter, and the first historical interactive data comprises historical parameter information corresponding to each preset interactive parameter;
the second historical interactive data determining module is used for determining second historical interactive data containing target parameter information from the first historical interactive data when the current interactive data is detected to contain the target parameter information corresponding to a preset statistical dimension parameter, wherein the preset statistical dimension parameter is a parameter selected from the preset interactive parameters in advance;
and the historical statistical information updating module is used for updating the historical statistical information corresponding to the target parameter information according to the historical repeated occurrence times of the second historical interactive data and the current interactive data based on preset statistical conditions to obtain the current statistical information.
14. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-12.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 12.
CN202010231016.9A 2020-03-27 2020-03-27 Data processing method, device, equipment and storage medium Pending CN113449232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010231016.9A CN113449232A (en) 2020-03-27 2020-03-27 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010231016.9A CN113449232A (en) 2020-03-27 2020-03-27 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113449232A true CN113449232A (en) 2021-09-28

Family

ID=77807939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010231016.9A Pending CN113449232A (en) 2020-03-27 2020-03-27 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113449232A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722078A (en) * 2022-03-16 2022-07-08 百果园技术(新加坡)有限公司 Data statistical method, device, equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8799601B1 (en) * 2012-06-28 2014-08-05 Emc Corporation Techniques for managing deduplication based on recently written extents
US20160171009A1 (en) * 2014-12-10 2016-06-16 International Business Machines Corporation Method and apparatus for data deduplication
CN107911232A (en) * 2017-10-27 2018-04-13 北京神州绿盟信息安全科技股份有限公司 A kind of method and device of definite business operation rule
CN108920668A (en) * 2018-07-05 2018-11-30 平安科技(深圳)有限公司 A kind of uniform resource position mark URL De-weight method and device
CN110751227A (en) * 2019-10-28 2020-02-04 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8799601B1 (en) * 2012-06-28 2014-08-05 Emc Corporation Techniques for managing deduplication based on recently written extents
US20160171009A1 (en) * 2014-12-10 2016-06-16 International Business Machines Corporation Method and apparatus for data deduplication
CN107911232A (en) * 2017-10-27 2018-04-13 北京神州绿盟信息安全科技股份有限公司 A kind of method and device of definite business operation rule
CN108920668A (en) * 2018-07-05 2018-11-30 平安科技(深圳)有限公司 A kind of uniform resource position mark URL De-weight method and device
WO2020006909A1 (en) * 2018-07-05 2020-01-09 平安科技(深圳)有限公司 Method and device for deduplicating urls
CN110751227A (en) * 2019-10-28 2020-02-04 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHIHAO HUANG等: "SS-dedup: A high throughput stateful data routing algorithm for cluster deduplication system", 《2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》, 6 February 2017 (2017-02-06) *
周升;陶敏;: "实时/历史数据库平台通用访问方法研究", 浙江电力, no. 12, 25 December 2012 (2012-12-25) *
庞超;刘倩;魏虹雨;: "基于云存储的邮政培训大数据备份模式研究", 自动化应用, no. 09, 25 September 2018 (2018-09-25) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722078A (en) * 2022-03-16 2022-07-08 百果园技术(新加坡)有限公司 Data statistical method, device, equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
CN107909493B (en) Policy information processing method and device, computer equipment and storage medium
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
CN107784063B (en) Algorithm generation method and terminal equipment
CN110019367B (en) Method and device for counting data characteristics
CN112579621B (en) Data display method and device, electronic equipment and computer storage medium
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
CN110879808A (en) Information processing method and device
US20220229814A1 (en) Maintaining stable record identifiers in the presence of updated data records
CN113449232A (en) Data processing method, device, equipment and storage medium
CN114817347A (en) Business approval method and device, electronic equipment and storage medium
CN110895761A (en) Method and device for processing after-sale service application information
CN109345175B (en) Goods source pushing method, system, equipment and storage medium based on driver matching degree
CN107092700A (en) It is a kind of based on the method and device for importing data under big data quantity in batches
CN110070383B (en) Abnormal user identification method and device based on big data analysis
US20150213098A1 (en) Business Rules Influenced Quasi-Cubes with Higher Diligence of Data Optimization
CN113362097B (en) User determination method and device
CN110008264B (en) Data acquisition method and device of cost accounting system
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN113326253A (en) Data cleaning method, device, equipment and storage medium based on full-text database
CN110648208B (en) Group identification method and device and electronic equipment
CN114185890B (en) Database retrieval method and device, storage medium and electronic equipment
CN114741577B (en) Service data management method, system, electronic equipment and readable storage medium
CN114722819B (en) Entity type classification and identification method, device, equipment and medium
CN111369346B (en) User credit evaluation method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination