CN114708003B - Abnormal data detection method, device, equipment and readable storage medium - Google Patents
Abnormal data detection method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN114708003B CN114708003B CN202210458381.2A CN202210458381A CN114708003B CN 114708003 B CN114708003 B CN 114708003B CN 202210458381 A CN202210458381 A CN 202210458381A CN 114708003 B CN114708003 B CN 114708003B
- Authority
- CN
- China
- Prior art keywords
- information
- data
- clustering
- abnormal
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 103
- 238000001514 detection method Methods 0.000 title claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 77
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000012216 screening Methods 0.000 claims abstract description 13
- 238000012795 verification Methods 0.000 claims description 32
- 230000002776 aggregation Effects 0.000 claims description 18
- 238000004220 aggregation Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 235000018185 Betula X alpestris Nutrition 0.000 claims description 5
- 235000018212 Betula X uliginosa Nutrition 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 3
- 238000013506 data mapping Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012360 testing method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000005856 abnormality Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000010219 correlation analysis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0609—Buyer or seller confidence or verification
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to the field of data processing, in particular to an abnormal data detection method, an abnormal data detection device, abnormal data detection equipment and a readable storage medium, wherein the method acquires first information, and commodity sales data information of at least one store of the first information; the first information is sent to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; the second information is sent to an anomaly detection model to detect anomaly data, so that third information is obtained, and the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice; and checking the third information to obtain abnormal commodity sales data screened by the model after checking parameters. The method integrates the advantages of the two clustering algorithms, overcomes the defects of the two algorithms, and achieves the effect of efficiently and accurately judging abnormal data.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a method, apparatus, device, and readable storage medium for detecting abnormal data.
Background
In recent years, the internet technology has developed rapidly, and the electronic commerce industry has stepped on the developed express way. "online shopping" is becoming more popular because of its convenience and rapidness, time and labor saving, and delivery of goods to the door. While the scale of each platform is continuously enlarged and the commodity number is continuously increased, some improper operation behaviors such as false mark price and line of sale behavior also occur, serious violation of the electronic commerce law is caused, and accurate identification of the commodity data is required. Aiming at the huge commodity quantity, if the screening is simply checked manually, the workload is huge, and the situations of omission and mistakes can also occur. The data detection method is needed, so that the abnormal commodity can be accurately positioned, the manual intervention cost is reduced, and the error rate is reduced.
Disclosure of Invention
An object of the present application is to provide an abnormal data detection method, apparatus, device, and readable storage medium, to improve the above problems. In order to achieve the above purpose, the technical scheme adopted by the application is as follows:
in one aspect, the present application provides a method for detecting abnormal data, the method comprising: acquiring first information, wherein the first information is commodity sales data information of at least one store; the first information is sent to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; the second information is sent to an anomaly detection model to detect anomaly data, so that third information is obtained, and the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice; and sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after verification parameters.
In a second aspect, an embodiment of the present application provides an abnormal data detection apparatus, including:
a first acquisition unit configured to acquire first information, the first information being merchandise sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice;
and the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the verification parameters.
In a third aspect, an embodiment of the present application provides an abnormal data detection apparatus, including a memory and a processor. The memory is used for storing a computer program; the processor is used for realizing the steps of the abnormal data detection method when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described abnormal data detection method.
The beneficial effects of the application are as follows:
the application extracts the characteristics of commodity sales receipts, adopts two different clustering algorithms to perform secondary clustering, can accurately position commodities, reduces manual intervention and reduces error rate, and adopts a high-efficiency clustering method to perform first processing on data, effectively reduces the data quantity to be detected, and further adopts a high-accuracy clustering method to process the first clustered data, so that the advantages of the two algorithms are integrated, the defects of the two algorithms are overcome, and the effect of efficiently and accurately judging abnormal data is achieved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an abnormal data detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an abnormal data detecting apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an abnormal data detecting apparatus according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Example 1
As shown in fig. 1, the present embodiment provides an abnormal data detection method, which includes step S1, step S2, step S3, and step S4.
S1, acquiring first information, wherein the first information is commodity sales data information of at least one store;
step S2, the first information is sent to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
step S3, the second information is sent to an anomaly detection model to detect anomaly data, and third information is obtained, wherein the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice;
and S4, sending the third information to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after verification parameters.
It is understood that the above-described abnormal data is abnormal data in the commodity sales data.
It can be understood that the application can accurately position the commodity by extracting the characteristics of the commodity sales receipt and adopting two different clustering algorithms to perform secondary clustering, reduces manual intervention and error rate, and adopts a high-efficiency clustering method to perform first processing on the data, effectively reduces the data quantity to be detected, and further adopts a high-accuracy clustering method to process the first clustered data, thus integrating the advantages of the two algorithms, overcoming the defects of the two algorithms and achieving the effect of efficiently and accurately judging abnormal data.
In a specific embodiment of the disclosure, the step S2 includes a step S21, a step S22, and a step S23.
Step S21, performing data processing on the commodity sales data information, removing invalid data in the first information, and filling the average value of incomplete data in the first information to obtain first sub-information;
step S22, calculating to obtain characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and S23, carrying out normalization processing on the characteristic data of the first information, and carrying out smoothing processing on the characteristic data after normalization processing to obtain the preprocessed first information.
It can be understood that the application eliminates invalid data and performs average filling on the data by preprocessing commodity sales data, wherein the filling method is to sum the data corresponding to other months, then calculate the average value, and take the average value as the filling value of incomplete data, thereby reducing the error value generated during feature data extraction and increasing the clustering accuracy.
In a specific embodiment of the disclosure, the step S3 includes a step S31, a step S32, and a step S33.
Step S31, price characteristic data information in the second information is sent to a first clustering module to be clustered, and first clustering information is obtained, wherein the first clustering information is abnormal data information in the price characteristic data information;
step S32, mapping the first clustering information and the sales volume characteristic data information in the second information in a data corresponding manner to obtain second sub-information, wherein the second sub-information comprises the sales volume characteristic data information corresponding to the first clustering information;
and step S33, the second sub-information is sent to a second aggregation module for processing, so that second aggregation information is obtained, and the second aggregation information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice.
It can be understood that the above steps are to perform primary clustering on the price characteristic data information, map the data after the primary clustering with the sales volume characteristic data, pick the data of the secondary clustering, and further perform secondary clustering, so that the data is efficiently screened by the primary clustering, and the data corresponding to the data screened for the first time is only screened by the secondary clustering, so that the advantages can be integrated under the condition of high efficiency.
In a specific embodiment of the disclosure, the step S31 includes a step S311, a step S312, a step S313, and a step S314.
Step S311, traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
step S312, obtaining at least one cluster feature cluster based on the price cluster feature tree, and obtaining a threshold range corresponding to each cluster feature cluster;
step S313, analyzing all the threshold ranges, and taking the minimum threshold range of all the threshold ranges as a normal threshold range for judging normal points;
and step S314, determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
It can be understood that the above steps set the preset first initial parameters to the BIRCH algorithm, generate a cluster feature tree based on the price feature data information, then cluster the price feature data information based on the cluster feature tree to obtain at least one cluster, analyze the size range of the cluster, select the minimum range as the threshold range of the normal data, and further reversely judge the threshold range of the abnormal data to obtain the abnormal data, so that the data can be processed efficiently and rapidly, and the data amount to be processed in the second clustering is reduced.
In a specific embodiment of the disclosure, the step S33 includes a step S331, a step S332, a step S333, and a step S334.
Step 331, performing data processing based on a preset second initial parameter and data information in the second sub-information, wherein the data information in the second sub-information is converted into coordinate data points in a space coordinate system, and the mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
step S332, generating a weighted distance map based on the mutually reachable distances, and generating a minimum spanning tree of the mutually reachable distances based on the weighted distance map;
s333, converting the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and constructing the hierarchical cluster structure based on the component of the hierarchical cluster structure;
and step 334, compressing the hierarchical cluster structure, and classifying the data information in the second sub-information based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information.
It can be understood that the above steps are that the second initial parameters are used for parameter setting of the clustering algorithm, the second sub-information is used for space coordinate transformation, the mutual reachable distance between each coordinate point is calculated, the robustness of the algorithm to noise is increased, then the minimum spanning tree is constructed based on the mutual reachable distance, the coordinate points are further clustered, abnormal data in the second sub-information are obtained, and thus the abnormal data information in each sales characteristic data information can be more accurately determined.
In a specific embodiment of the disclosure, the step S4 includes a step S41, a step S42, a step S43, a step S44, a step S45, and a step S46.
Step S41, obtaining third sub-information, wherein the third sub-information is historical commodity normal sales data information and historical commodity abnormal sales data information;
step S42, dividing the third sub-information into a test set and a verification set, and sending the test set of the third sub-information to a second aggregation module for processing to obtain historical abnormal commodity sales data;
s43, comparing the historical abnormal commodity sales data with the verification set to obtain verification result information;
s44, carrying out gray correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain correlation degrees of the verification result information and all the initial parameters;
step S45, adjusting the initial parameters based on the verification result information and the association degree to obtain an abnormality detection model with the adjusted initial parameters, wherein if the verification result is inconsistent with the test set, the initial parameters with the highest association degree with the verification result are adjusted until the verification result is consistent with the test set;
and step S46, the first information is sent to an abnormality detection model after initial parameter adjustment to carry out second abnormality detection, and the data after the second abnormality detection is screened for the third information to obtain screened abnormal commodity sales data.
It can be understood that the method classifies the historical data, sends the historical data into the abnormal detection model for detection based on the historical data, carries out grey correlation analysis on the detection result and the initial parameter to obtain the correlation degree of the detection result and the initial parameter, adjusts the first initial parameter and the second initial parameter based on the correlation degree to obtain an adjusted abnormal detection model, and then carries out secondary abnormal data screening to obtain the screened abnormal commodity sales data.
Example 2
As shown in fig. 2, the present embodiment provides an abnormal data detection apparatus including a first acquisition unit 701, a first processing unit 702, a second processing unit 703, and a third processing unit 704.
A first acquiring unit 701 configured to acquire first information, which is merchandise sales data information of at least one store;
the first processing unit 702 is configured to send the first information to a data preprocessing model to obtain second information, where the second information is price feature data information and sales feature data information obtained by preprocessing the first information;
the second processing unit 703 is configured to send the second information to an anomaly detection model for anomaly data detection, to obtain third information, where the third information is abnormal commodity sales data obtained by performing cluster screening on the second information twice;
and the third processing unit 704 is configured to send the third information to the verification module for processing, so as to obtain fourth information, where the fourth information is abnormal commodity sales data screened by the model after the verification parameter.
In one embodiment of the present disclosure, the first processing unit 702 includes a first processing subunit 7021, a second processing subunit 7022, and a third processing subunit 7023.
A first processing subunit 7021, configured to perform data processing on the commodity sales data information, remove invalid data in the first information, and perform average filling on incomplete data in the first information to obtain first sub-information;
a second processing subunit 7022, configured to calculate, based on the first sub-information and a preset calculation formula, to obtain feature data of the first information, where the feature data of the first information includes price feature data and sales feature data;
and a third processing subunit 7023, configured to normalize the feature data of the first information, and smooth the feature data after normalization to obtain preprocessed first information.
In a specific embodiment of the disclosure, the second processing unit 703 includes a first clustering subunit 7031, a fourth processing subunit 7032, and a second clustering subunit 7033.
The first clustering subunit 7031 is configured to send the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, where the first clustering information is abnormal data information in the price characteristic data information;
a fourth processing subunit 7032, configured to perform data mapping on the first cluster information and the sales volume feature data information in the second information, so as to obtain second sub-information, where the second sub-information includes sales volume feature data information corresponding to the first cluster information;
the second aggregation subunit 7033 is configured to send the second sub-information to a second aggregation module for processing, so as to obtain second aggregation information, where the second aggregation information is abnormal commodity sales data obtained by performing clustering screening on the second information twice.
In one embodiment of the present disclosure, the first clustering subunit 7031 includes a third clustering subunit 70311, a fourth clustering subunit 70312, a fifth clustering subunit 70313, and a sixth clustering subunit 70314.
A third class subunit 70311, configured to traverse the price characteristic data information based on preset first initial parameter information, and process the price characteristic data according to a method for generating a cluster characteristic tree in a BIRCH algorithm, so as to obtain a price cluster characteristic tree;
a fourth clustering subunit 70312, configured to obtain at least one cluster feature cluster based on the price cluster feature tree, and obtain a threshold range corresponding to each cluster feature cluster;
a fifth clustering subunit 70313, configured to analyze all the threshold ranges, and use a minimum threshold range in all the threshold ranges as a normal threshold range for determining a normal point;
a sixth clustering subunit 70314 is configured to determine an abnormal point in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal point.
In one embodiment of the present disclosure, the second clustering subunit 7033 includes a seventh clustering subunit 70331, an eighth clustering subunit 70332, a ninth clustering subunit 70333, and a tenth clustering subunit 70334.
A seventh clustering subunit 70331, configured to perform data processing based on a preset second initial parameter and data information in the second sub-information, where the data information in the second sub-information is converted into coordinate data points in a spatial coordinate system, and obtain, based on each of the coordinate data points, a mutual reachable distance between each of the coordinate data points;
an eighth clustering subunit 70332 configured to generate a weighted distance map based on the mutually reachable distances and generate a minimum spanning tree of mutually reachable distances based on the weighted distance map;
a ninth clustering subunit 70333, configured to convert the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and construct a hierarchical cluster structure based on the component of the hierarchical cluster structure;
and a tenth clustering subunit 70334, configured to compress the hierarchical cluster structure, and classify the data information in the second sub-information based on the compressed hierarchical cluster structure, so as to obtain abnormal data in the second sub-information.
In one embodiment of the disclosure, the third processing unit 704 includes a first acquisition subunit 7041, a fifth processing subunit 7042, a sixth processing subunit 7043, a seventh processing subunit 7044, an eighth processing subunit 7045, and a ninth processing subunit 7046.
A first obtaining subunit 7041, configured to obtain third sub-information, where the third sub-information is historical merchandise normal sales data information and historical merchandise abnormal sales data information;
a fifth processing subunit 7042, configured to divide the third sub-information into a test set and a verification set, and send the test set of the third sub-information to a second aggregation module for processing, so as to obtain historical abnormal commodity sales data;
a sixth processing subunit 7043, configured to compare the historical abnormal commodity sales data with the verification set to obtain verification result information;
a seventh processing subunit 7044, configured to perform gray correlation analysis on the verification result information and all initial parameters in the anomaly detection model, so as to obtain a correlation degree between the verification result information and all the initial parameters;
an eighth processing subunit 7045, configured to adjust the initial parameter based on the verification result information and the association degree, to obtain an anomaly detection model with the adjusted initial parameter, where if the verification result is that the test set is inconsistent with the verification set, the initial parameter with the highest association degree with the verification result is adjusted until the verification result is that the test set is consistent with the verification result;
and the ninth processing subunit 7046 is configured to send the first information to the anomaly detection model after initial parameter adjustment to perform a second anomaly detection, and screen the third information from the data after the second anomaly detection to obtain screened abnormal commodity sales data.
It should be noted that, regarding the apparatus in the above embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiments regarding the method, and will not be described in detail herein.
Example 3
Corresponding to the above method embodiments, the present disclosure further provides an abnormal data detecting apparatus, and an abnormal data detecting apparatus described below and an abnormal data detecting method described above may be referred to correspondingly to each other.
Fig. 3 is a block diagram illustrating an abnormal data detection apparatus 800 according to an exemplary embodiment. As shown in fig. 3, the abnormal data detection apparatus 800 may include: a processor 801, a memory 802. The anomaly data detection device 800 can also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.
Wherein the processor 801 is configured to control the overall operation of the abnormal data detection apparatus 800 to perform all or part of the steps of the abnormal data detection method described above. The memory 802 is used to store various types of data to support operation at the abnormal data detection apparatus 800, such data may include, for example, instructions for any application or method operating on the abnormal data detection apparatus 800, as well as application related data such as contact data, messaging, pictures, audio, video, and the like. The Memory 802 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 803 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is configured to perform wired or wireless communication between the abnormal data detection apparatus 800 and other apparatuses. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near FieldCommunication, NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, the respective communication component 805 may thus comprise: wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the anomaly data detection device 800 can be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processor (DigitalSignal Processor, DSP), digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor, or other electronic components for performing one of the anomaly data detection methods described above.
In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the abnormal data detection method described above. For example, the computer-readable storage medium may be the above-described memory 802 including program instructions executable by the processor 801 of the abnormal data detection apparatus 800 to perform the above-described abnormal data detection method.
Example 4
Corresponding to the above method embodiments, the present disclosure further provides a readable storage medium, and a readable storage medium described below and an abnormal data detection method described above may be referred to correspondingly to each other.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the abnormal data detection method of the above method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, and the like.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (8)
1. An abnormal data detection method, comprising:
acquiring first information, wherein the first information is commodity sales data information of at least one store;
the first information is sent to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information;
the second information is sent to an anomaly detection model to detect anomaly data, so that third information is obtained, and the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice;
the third information is sent to a verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after verification parameters;
the method for detecting the abnormal data by sending the second information to an abnormal detection model to obtain third information comprises the following steps:
transmitting the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
mapping the first clustering information with the sales volume characteristic data information in the second information in a data corresponding manner to obtain second sub-information, wherein the second sub-information comprises the sales volume characteristic data information corresponding to the first clustering information;
the second sub-information is sent to a second aggregation module for processing, so that second aggregation information is obtained, and the second aggregation information is abnormal commodity sales data obtained by carrying out clustering screening on the second information for two times;
the second sub-information is sent to a second aggregation module for processing, so as to obtain second aggregation information, which comprises the following steps:
carrying out data processing based on a preset second initial parameter and data information in the second sub-information, wherein the data information in the second sub-information is converted into coordinate data points in a space coordinate system, and the mutual reachable distance between each coordinate data point is obtained based on each coordinate data point;
generating a weighted distance map based on the mutually reachable distances, and generating a minimum spanning tree of the mutually reachable distances based on the weighted distance map;
converting the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and constructing the hierarchical cluster structure based on the component of the hierarchical cluster structure;
and compressing the hierarchical cluster structure, and classifying the data information in the second sub-information based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information.
2. The abnormal data detection method according to claim 1, wherein the sending the first information to a data preprocessing model to obtain second information, where the second information is information obtained by preprocessing the first information, includes:
carrying out data processing on the commodity sales data information, removing invalid data in the first information, and carrying out average filling on incomplete data in the first information to obtain first sub-information;
calculating to obtain feature data of the first information based on the first sub-information and a preset calculation formula, wherein the feature data of the first information comprises price feature data and sales feature data;
and carrying out normalization processing on the characteristic data of the first information, and carrying out smoothing processing on the characteristic data after normalization processing to obtain the preprocessed first information.
3. The abnormal data detection method according to claim 1, wherein sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, includes:
traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
obtaining at least one cluster feature cluster based on the price cluster feature tree, and obtaining a threshold range corresponding to each cluster feature cluster;
analyzing all the threshold ranges, and taking the minimum threshold range in all the threshold ranges as a normal threshold range for judging normal points;
and determining abnormal points in the price clustering feature tree based on the normal threshold range, and judging abnormal data information in the price feature data information based on the abnormal points.
4. An abnormal data detection apparatus, comprising:
a first acquisition unit configured to acquire first information, the first information being merchandise sales data information of at least one store;
the first processing unit is used for sending the first information to a data preprocessing model to obtain second information, wherein the second information is price characteristic data information and sales characteristic data information obtained by preprocessing the first information;
the second processing unit is used for sending the second information to an anomaly detection model for anomaly data detection to obtain third information, wherein the third information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice;
the third processing unit is used for sending the third information to the verification module for processing to obtain fourth information, wherein the fourth information is abnormal commodity sales data screened by the model after the verification parameters;
wherein the second processing unit includes:
the first clustering subunit is used for sending the price characteristic data information in the second information to a first clustering module for clustering to obtain first clustering information, wherein the first clustering information is abnormal data information in the price characteristic data information;
a fourth processing subunit, configured to perform data mapping on the first cluster information and the sales volume feature data information in the second information, so as to obtain second sub-information, where the second sub-information includes sales volume feature data information corresponding to the first cluster information;
the second aggregation subunit is used for sending the second sub-information to a second aggregation module for processing to obtain second aggregation information, wherein the second aggregation information is abnormal commodity sales data obtained by carrying out clustering screening on the second information twice;
wherein the second clustering subunit includes:
a seventh clustering subunit, configured to perform data processing based on a preset second initial parameter and data information in the second sub-information, where the data information in the second sub-information is converted into coordinate data points in a spatial coordinate system, and based on each coordinate data point, a mutual reachable distance between each coordinate data point is obtained;
an eighth clustering subunit, configured to generate a weighted distance map based on the mutually reachable distances, and generate a minimum spanning tree for the mutually reachable distances based on the weighted distance map;
a ninth clustering subunit, configured to convert the minimum spanning tree into a component of a hierarchical cluster structure according to the mutual reachable distance, and construct the hierarchical cluster structure based on the component of the hierarchical cluster structure;
and a tenth clustering subunit, configured to compress the hierarchical cluster structure, and classify the data information in the second sub-information based on the compressed hierarchical cluster structure, so as to obtain abnormal data in the second sub-information.
5. The abnormal data detection apparatus according to claim 4, wherein the apparatus comprises:
the first processing subunit is used for carrying out data processing on the commodity sales data information, removing invalid data in the first information, and carrying out average filling on incomplete data in the first information to obtain first sub-information;
the second processing subunit is used for calculating and obtaining characteristic data of the first information based on the first sub-information and a preset calculation formula, wherein the characteristic data of the first information comprises price characteristic data and sales characteristic data;
and the third processing subunit is used for carrying out normalization processing on the characteristic data of the first information and carrying out smoothing processing on the characteristic data after normalization processing to obtain the preprocessed first information.
6. The abnormal data detection apparatus according to claim 4, wherein the apparatus comprises:
the third class subunit is used for traversing the price characteristic data information based on preset first initial parameter information, and processing the price characteristic data according to a generation method of a clustering characteristic tree in a BIRCH algorithm to obtain a price clustering characteristic tree;
the fourth clustering subunit is used for obtaining at least one clustering feature cluster based on the price clustering feature tree and obtaining a threshold range corresponding to each clustering feature cluster;
a fifth clustering subunit, configured to analyze all the threshold ranges, and use a minimum threshold range in all the threshold ranges as a normal threshold range for determining a normal point;
and a sixth clustering subunit, configured to determine an abnormal point in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal point.
7. An abnormal data detecting apparatus, characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the abnormal data detection method according to any one of claims 1 to 3 when executing the computer program.
8. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the abnormal data detection method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458381.2A CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210458381.2A CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114708003A CN114708003A (en) | 2022-07-05 |
CN114708003B true CN114708003B (en) | 2023-11-10 |
Family
ID=82177116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210458381.2A Active CN114708003B (en) | 2022-04-27 | 2022-04-27 | Abnormal data detection method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114708003B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809448A (en) * | 2014-12-30 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Account transaction clustering method and system thereof |
US9454785B1 (en) * | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
CN106529968A (en) * | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
KR101834260B1 (en) * | 2017-01-18 | 2018-03-06 | 한국인터넷진흥원 | Method and Apparatus for Detecting Fraudulent Transaction |
CN107918905A (en) * | 2017-11-22 | 2018-04-17 | 阿里巴巴集团控股有限公司 | Abnormal transaction identification method, apparatus and server |
CN109389453A (en) * | 2017-08-11 | 2019-02-26 | 苏宁云商集团股份有限公司 | A kind of price analysis method and device |
CN110046889A (en) * | 2019-03-20 | 2019-07-23 | 腾讯科技(深圳)有限公司 | A kind of detection method, device and the server of abnormal behaviour main body |
CN110400220A (en) * | 2019-07-23 | 2019-11-01 | 上海氪信信息技术有限公司 | A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network |
CN113988148A (en) * | 2020-07-10 | 2022-01-28 | 华为技术有限公司 | Data clustering method, system, computer equipment and storage medium |
CN114077872A (en) * | 2021-11-29 | 2022-02-22 | 税友软件集团股份有限公司 | Data anomaly detection method and related device |
CN114186626A (en) * | 2021-12-09 | 2022-03-15 | 中国建设银行股份有限公司 | Abnormity detection method and device, electronic equipment and computer readable medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11108835B2 (en) * | 2019-03-29 | 2021-08-31 | Paypal, Inc. | Anomaly detection for streaming data |
US20210333280A1 (en) * | 2020-04-23 | 2021-10-28 | YatHing Biotechnology Company Limited | Methods related to the diagnosis of prostate cancer |
CN114548276A (en) * | 2022-02-22 | 2022-05-27 | Oppo广东移动通信有限公司 | Method and device for clustering data, electronic equipment and storage medium |
CN115510982A (en) * | 2022-09-29 | 2022-12-23 | 联想(北京)有限公司 | Clustering method, device, equipment and computer storage medium |
-
2022
- 2022-04-27 CN CN202210458381.2A patent/CN114708003B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809448A (en) * | 2014-12-30 | 2016-07-27 | 阿里巴巴集团控股有限公司 | Account transaction clustering method and system thereof |
US9454785B1 (en) * | 2015-07-30 | 2016-09-27 | Palantir Technologies Inc. | Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data |
CN106529968A (en) * | 2016-09-29 | 2017-03-22 | 深圳大学 | Customer classification method and system thereof based on transaction data |
KR101834260B1 (en) * | 2017-01-18 | 2018-03-06 | 한국인터넷진흥원 | Method and Apparatus for Detecting Fraudulent Transaction |
CN109389453A (en) * | 2017-08-11 | 2019-02-26 | 苏宁云商集团股份有限公司 | A kind of price analysis method and device |
CN107918905A (en) * | 2017-11-22 | 2018-04-17 | 阿里巴巴集团控股有限公司 | Abnormal transaction identification method, apparatus and server |
CN110046889A (en) * | 2019-03-20 | 2019-07-23 | 腾讯科技(深圳)有限公司 | A kind of detection method, device and the server of abnormal behaviour main body |
CN110400220A (en) * | 2019-07-23 | 2019-11-01 | 上海氪信信息技术有限公司 | A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network |
CN113988148A (en) * | 2020-07-10 | 2022-01-28 | 华为技术有限公司 | Data clustering method, system, computer equipment and storage medium |
CN114077872A (en) * | 2021-11-29 | 2022-02-22 | 税友软件集团股份有限公司 | Data anomaly detection method and related device |
CN114186626A (en) * | 2021-12-09 | 2022-03-15 | 中国建设银行股份有限公司 | Abnormity detection method and device, electronic equipment and computer readable medium |
Non-Patent Citations (14)
Title |
---|
A survey of anomaly detection techniques in financial domain;Mohiuddin Ahmed等;Future Generation Computer Systems;第55卷;278-288 * |
Amaretto: An Active Learning Framework for Money Laundering Detection;Danilo Labanca等;IEEE Access;第10卷;41720 - 41739 * |
Anomaly Detection Based on Enhanced DBScan Algorithm;Zhenguo Chen等;SciVerse ScienceDirect;第15卷;178-182 * |
Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions;Thushara Amarasinghe;Proceedings of the 2018 International Conference on Machine Learning Technologies;12-17 * |
DBSCAN Clustering Algorithm Applied to Identify Suspicious Financial Transactions;Yan Yang;2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery;60–65 * |
DESIGN AND SIMULATION OF AN EFFICIENT MODEL FOR CREDIT CARDS FRAUD DETECTION;Ibrahim K. Ogundoyin等;Journal of Engineering and Technology;第16卷(第1期);88-99 * |
基于"多层次分类"方法的异常P2P网贷借款识别;罗钦芳 等;管理工程学报;第31卷(第3期);201-209 * |
基于Spark的层次聚类算法的并行化研究;余胜辉;计算机技术与发展;第30卷(第6期);19-22 * |
基于定性数据聚类的孤立森林算法;陈敏昊;CNKI优秀硕士学位论文全文库;第2022卷(第3期);1-56 * |
基于机器学习的信用卡欺诈检测方案的研究;王红雨;CNKI优秀硕士学位论文全文库;第2019卷(第08期);1-66 * |
基于核的层次聚类算法研究;韩鑫;CNKI优秀硕士学位论文全文库;第2021卷(第9期);1-65 * |
基于过采样的不平衡数据集成分类算法研究;赵学华;CNKI优秀硕士学位论文全文库;第2021卷(第02期);1-79 * |
朱琳.银行交易大数据洗钱挖掘模型及应用研究.中国优秀硕士学位论文全文数据库信息科技辑.2021,(第2期),7-9,16-1 7,40-46. * |
银行交易大数据洗钱挖掘模型及应用研究;朱琳;中国优秀硕士学位论文全文数据库信息科技辑(第2期);7-9,16-1 7,40-46 * |
Also Published As
Publication number | Publication date |
---|---|
CN114708003A (en) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10410292B2 (en) | Method, system, apparatus, and storage medium for realizing antifraud in insurance claim based on consistency of multiple images | |
CN112884092B (en) | AI model generation method, electronic device, and storage medium | |
US20190012553A1 (en) | Diagnostic device, diagnosis method and computer program | |
CN109063920B (en) | Transaction risk identification method and device and computer equipment | |
WO2019062017A1 (en) | Method, device, and self-service checkout counter for performing product recognition on the basis of neural network | |
CN113518011B (en) | Abnormality detection method and apparatus, electronic device, and computer-readable storage medium | |
CN111766253A (en) | Solder paste printing quality detection method, data processing device, and computer storage medium | |
CN112639431A (en) | Abnormality prediction system and abnormality prediction method | |
CN108579094B (en) | User interface detection method, related device, system and storage medium | |
CN111027531A (en) | Pointer instrument information identification method and device and electronic equipment | |
CN111079478A (en) | Unmanned goods selling shelf monitoring method and device, electronic equipment and system | |
CN114708003B (en) | Abnormal data detection method, device, equipment and readable storage medium | |
CN115796846B (en) | Equipment cleaning service recommendation method, device, equipment and readable storage medium | |
CN114240928B (en) | Partition detection method, device and equipment for board quality and readable storage medium | |
CN116304814A (en) | Method and system for analyzing working condition of monitoring object based on classification algorithm | |
WO2023073795A1 (en) | Class boundary detection device, control method, and non-transitory computer-readable medium | |
CN115827496A (en) | Code abnormality detection method and device, electronic equipment and storage medium | |
CN113722485A (en) | Abnormal data identification and classification method, system and storage medium | |
CN114913118A (en) | Industrial visual detection method and device, electronic equipment and storage medium | |
CN112950329A (en) | Commodity dynamic information generation method, device, equipment and computer readable medium | |
Budiman et al. | A handy and accurate device to measure smallest diameter of log to reduce measurement errors | |
CN111798237A (en) | Abnormal transaction diagnosis method and system based on application log | |
CN114047471A (en) | Electric energy meter calibration method and device, electronic equipment and storage medium | |
CN109976934B (en) | Device use abnormality determination method, device and computer storage medium | |
TWI747334B (en) | Fraud measurement detection device, method, program product and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |