[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114708003B - An abnormal data detection method, device, equipment and readable storage medium - Google Patents

An abnormal data detection method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114708003B
CN114708003B CN202210458381.2A CN202210458381A CN114708003B CN 114708003 B CN114708003 B CN 114708003B CN 202210458381 A CN202210458381 A CN 202210458381A CN 114708003 B CN114708003 B CN 114708003B
Authority
CN
China
Prior art keywords
information
data
clustering
abnormal
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210458381.2A
Other languages
Chinese (zh)
Other versions
CN114708003A (en
Inventor
范华琦
刘恒
周杲
蒋挺
向吴优
陈赛
庞苏川
杨柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
Original Assignee
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Jiaoda Big Data Technology Co ltd, Southwest Jiaotong University filed Critical Chengdu Jiaoda Big Data Technology Co ltd
Priority to CN202210458381.2A priority Critical patent/CN114708003B/en
Publication of CN114708003A publication Critical patent/CN114708003A/en
Application granted granted Critical
Publication of CN114708003B publication Critical patent/CN114708003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及数据处理领域,具体而言,涉及一种异常数据检测方法、装置、设备及可读存储介质,所述方法获取第一信息,所述第一信息至少一个商店的商品销售数据信息;将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;将所述第三信息进行校验处理,得到校验参数后的模型筛选到的异常商品销售数据。本申请通过将两种聚类算法的优势进行整合,克服了两种算法的缺点,达到高效、精准的判断异常数据的效果。

The present invention relates to the field of data processing. Specifically, it relates to an abnormal data detection method, device, equipment and readable storage medium. The method obtains first information, and the first information is product sales data information of at least one store; Send the first information to the data preprocessing model to obtain the second information, which is the price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; send the second information Go to the anomaly detection model to detect abnormal data, and obtain the third information. The third information is the abnormal product sales data obtained by performing two clustering screenings on the second information; perform verification processing on the third information, Obtain the abnormal product sales data filtered by the model after verifying the parameters. This application integrates the advantages of two clustering algorithms, overcomes the shortcomings of the two algorithms, and achieves the effect of efficiently and accurately judging abnormal data.

Description

一种异常数据检测方法、装置、设备及可读存储介质An abnormal data detection method, device, equipment and readable storage medium

技术领域Technical field

本发明涉及数据处理领域,具体而言,涉及一种异常数据检测方法、装置、设备及可读存储介质。The present invention relates to the field of data processing, and specifically to an abnormal data detection method, device, equipment and readable storage medium.

背景技术Background technique

近年来,互联网技术发展迅速,电子商务行业也踏上了发展的快车道。“网购”因其方便快捷、省时省力、送货上门的特点越来越受到人们的青睐。在各平台规模不断扩大、商品数不断增加的同时,一些不正当的经营行为,例如虚标价格、刷单行为也随之出现,严重违反了电商法,需要对这类商品数据进行准确识别。针对如此庞大的商品数量,如果单纯人工检查筛选,不仅工作量巨大,还会出现遗漏和错误的情况。现需要一种数据检测方法,能够实现对异常商品的准确定位,减少人工干预成本和降低出错率。In recent years, Internet technology has developed rapidly, and the e-commerce industry has also embarked on the fast lane of development. "Online shopping" is becoming more and more popular among people because of its convenience, time saving and door-to-door delivery. As the scale of each platform continues to expand and the number of products continues to increase, some unfair business practices, such as false price markings and fraudulent orders, have also emerged, seriously violating e-commerce laws, and it is necessary to accurately identify such product data. For such a large number of products, if manual inspection and screening is performed, not only will the workload be huge, but there will also be omissions and errors. What is needed is a data detection method that can accurately locate abnormal products, reduce manual intervention costs and reduce error rates.

发明内容Contents of the invention

本发明的目的在于提供一种异常数据检测方法、装置、设备及可读存储介质,以改善上述问题。为了实现上述目的,本发明采取的技术方案如下:The purpose of the present invention is to provide an abnormal data detection method, device, equipment and readable storage medium to improve the above problems. In order to achieve the above objects, the technical solutions adopted by the present invention are as follows:

一方面,本申请提供了一种异常数据检测方法,所述方法包括:获取第一信息,所述第一信息至少一个商店的商品销售数据信息;将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据。On the one hand, this application provides an abnormal data detection method, which method includes: obtaining first information, which is at least one store's merchandise sales data information; sending the first information to a data preprocessing model , obtain the second information, which is the price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; send the second information to the anomaly detection model for abnormal data detection, and obtain the third information, the third information is the abnormal product sales data obtained by performing two clustering screenings on the second information; the third information is sent to the verification module for processing, and the fourth information is obtained, and the fourth information The information is the abnormal product sales data filtered by the model after verifying the parameters.

第二方面,本申请实施例提供了一种异常数据检测装置,包括:In the second aspect, embodiments of the present application provide an abnormal data detection device, including:

第一获取单元,用于获取第一信息,所述第一信息至少一个商店的商品销售数据信息;A first acquisition unit, configured to acquire first information, which is merchandise sales data information of at least one store;

第一处理单元,用于将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;A first processing unit, configured to send the first information to a data preprocessing model to obtain second information, where the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information;

第二处理单元,用于将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;The second processing unit is used to send the second information to the anomaly detection model for abnormal data detection, and obtain the third information. The third information is the abnormal products obtained by performing two clustering screenings on the second information. sales data;

第三处理单元,用于将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据。The third processing unit is configured to send the third information to the verification module for processing to obtain fourth information. The fourth information is the abnormal product sales data screened by the model after verifying the parameters.

第三方面,本申请实施例提供了一种异常数据检测设备,所述设备包括存储器和处理器。存储器用于存储计算机程序;处理器用于执行所述计算机程序时实现上述异常数据检测方法的步骤。In a third aspect, embodiments of the present application provide an abnormal data detection device, where the device includes a memory and a processor. The memory is used to store the computer program; the processor is used to implement the steps of the abnormal data detection method when executing the computer program.

第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述异常数据检测方法的步骤。In a fourth aspect, embodiments of the present application provide a readable storage medium. A computer program is stored on the readable storage medium. When the computer program is executed by a processor, the steps of the above abnormal data detection method are implemented.

本发明的有益效果为:The beneficial effects of the present invention are:

本申请通过提取商品销售收据的特征,并采用两种不同的聚类算法进行二次聚类,能够对商品准确定位,减少了人工干预和降低了出错率,并且本发明采用将高效率型的聚类方法对数据进行第一次处理,有效降低需要检测的数据数量,进而采用高精确率的聚类方法对第一次聚类的数据进行处理,这样将两种算法的优势进行整合,克服了两种算法的缺点,达到高效、精准的判断异常数据的效果。By extracting the characteristics of product sales receipts and using two different clustering algorithms for secondary clustering, this application can accurately locate the product, reduce manual intervention and reduce the error rate, and the present invention uses a high-efficiency The clustering method processes the data for the first time, effectively reducing the amount of data that needs to be detected, and then uses a high-precision clustering method to process the first clustered data. This integrates the advantages of the two algorithms and overcomes the problem. The shortcomings of the two algorithms are eliminated to achieve the effect of efficiently and accurately judging abnormal data.

本发明的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明实施例了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of embodiments of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

附图说明Description of the drawings

为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.

图1为本发明实施例中所述的一种异常数据检测方法流程示意图;Figure 1 is a schematic flow chart of an abnormal data detection method described in the embodiment of the present invention;

图2为本发明实施例中所述的一种异常数据检测装置结构示意图;Figure 2 is a schematic structural diagram of an abnormal data detection device described in the embodiment of the present invention;

图3是本发明实施例中所述的一种异常数据检测设备结构示意图。Figure 3 is a schematic structural diagram of an abnormal data detection device described in the embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, rather than all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the invention provided in the appended drawings is not intended to limit the scope of the claimed invention, but rather to represent selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本发明的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。It should be noted that similar reference numerals and letters represent similar items in the following figures, therefore, once an item is defined in one figure, it does not need further definition and explanation in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", etc. are only used to differentiate the description and cannot be understood as indicating or implying relative importance.

实施例1Example 1

如图1所示,本实施例提供了一种异常数据检测方法,其所述方法包括步骤S1、步骤S2、步骤S3和步骤S4。As shown in Figure 1, this embodiment provides an abnormal data detection method, which includes step S1, step S2, step S3 and step S4.

步骤S1、获取第一信息,所述第一信息至少一个商店的商品销售数据信息;Step S1: Obtain first information, which is product sales data information of at least one store;

步骤S2、将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;Step S2: Send the first information to the data preprocessing model to obtain second information. The second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information;

步骤S3、将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;Step S3: Send the second information to the anomaly detection model for abnormal data detection to obtain third information. The third information is the abnormal product sales data obtained by performing two clustering screenings on the second information;

步骤S4、将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据。Step S4: Send the third information to the verification module for processing to obtain fourth information. The fourth information is the abnormal product sales data screened by the model after verifying the parameters.

可以理解的是上述异常数据为商品销售数据中的异常数据。It can be understood that the above abnormal data is abnormal data in product sales data.

可以理解的是本申请通过提取商品销售收据的特征,并采用两种不同的聚类算法进行二次聚类,能够对商品准确定位,减少了人工干预和降低了出错率,并且本发明采用将高效率型的聚类方法对数据进行第一次处理,有效降低需要检测的数据数量,进而采用高精确率的聚类方法对第一次聚类的数据进行处理,这样将两种算法的优势进行整合,克服了两种算法的缺点,达到高效、精准的判断异常数据的效果。It can be understood that by extracting the characteristics of product sales receipts and using two different clustering algorithms for secondary clustering, this application can accurately locate the product, reduce manual intervention and reduce the error rate, and the present invention uses A high-efficiency clustering method is used to process the data for the first time, effectively reducing the amount of data that needs to be detected, and then a high-precision clustering method is used to process the first clustered data, thus combining the advantages of the two algorithms. The integration overcomes the shortcomings of the two algorithms and achieves efficient and accurate judgment of abnormal data.

在本公开的一种具体实施方式中,所述步骤S2包括步骤S21、步骤S22和步骤S23。In a specific implementation manner of the present disclosure, step S2 includes step S21, step S22 and step S23.

步骤S21、将所述商品销售数据信息中进行数据处理,清除所述第一信息内的无效数据,并将所述第一信息内的残缺数据进行均值填补,得到第一子信息;Step S21: Perform data processing on the commodity sales data information, clear invalid data in the first information, and average-fill the incomplete data in the first information to obtain the first sub-information;

步骤S22、基于所述第一子信息和预设的计算公式,计算得到所述第一信息的特征数据,所述第一信息的特征数据包括价格特征数据和销量特征数据;Step S22: Calculate the characteristic data of the first information based on the first sub-information and the preset calculation formula. The characteristic data of the first information includes price characteristic data and sales volume characteristic data;

步骤S23、将所述第一信息的特征数据进行归一化处理,并将归一化处理后的特征数据进行平滑处理,得到预处理后的第一信息。Step S23: Normalize the feature data of the first information, and smooth the normalized feature data to obtain pre-processed first information.

可以理解的是本申请通过对商品销售数据预处理,清除无效数据、并对数据进行均值填补,其中填补方法为将其他月份对应的数据求和,然后求均值,将所述均值作为残缺数据的填补值,进而减少特征数据提取时产生的误差值,并增加聚类的准确率。It can be understood that this application preprocesses the product sales data, removes invalid data, and performs mean filling on the data. The filling method is to sum up the data corresponding to other months, and then calculate the average, and use the average as the incomplete data. Filling values, thereby reducing the error value generated during feature data extraction and increasing the accuracy of clustering.

在本公开的一种具体实施方式中,所述步骤S3包括步骤S31、步骤S32和步骤S33。In a specific implementation manner of the present disclosure, step S3 includes step S31, step S32 and step S33.

步骤S31、将所述第二信息内的价格特征数据信息发送至第一聚类模块进行聚类,得到第一聚类信息,所述第一聚类信息为价格特征数据信息中的异常数据信息;Step S31: Send the price characteristic data information in the second information to the first clustering module for clustering to obtain the first clustering information. The first clustering information is the abnormal data information in the price characteristic data information. ;

步骤S32、将所述第一聚类信息与所述第二信息内的销量特征数据信息进行数据对应映射,得到第二子信息,所述第二子信息包括第一聚类信息对应的销量特征数据信息;Step S32: Perform data correspondence mapping between the first clustering information and the sales volume characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the sales volume characteristics corresponding to the first clustering information. Data information;

步骤S33、将所述第二子信息发送至第二聚类模块进行处理,得到第二聚类信息,所述第二聚类信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据。Step S33: Send the second sub-information to the second clustering module for processing to obtain second clustering information. The second clustering information is anomalies obtained by performing two clustering screenings on the second information. Product sales data.

可以理解的上述步骤是将所述价格特征数据信息进行一次聚类,并基于第一次聚类后的数据与销量特征数据进行映射,挑选出第二次聚类的数据,进而进行二次聚类,这样第一次聚类高效的筛选数据,而第二次聚类就只用筛选第一次筛选出的数据对应的数据,这样可以保证高效的情况下,能够将优势进行整合。It can be understood that the above steps are to perform a primary clustering of the price characteristic data information, and map the data after the first clustering with the sales volume characteristic data, select the data for the second clustering, and then perform a second clustering. In this way, the first clustering can efficiently filter the data, and the second clustering can only filter the data corresponding to the data filtered out in the first time. This can ensure that the advantages can be integrated while being efficient.

在本公开的一种具体实施方式中,所述步骤S31包括步骤S311、步骤S312、步骤S313和步骤S314。In a specific implementation manner of the present disclosure, step S31 includes step S311, step S312, step S313 and step S314.

步骤S311、基于预设第一初始参数信息遍历所述价格特征数据信息,并将所述价格特征数据按照BIRCH算法中聚类特征树的生成方法进行处理,得到价格聚类特征树;Step S311: Traverse the price feature data information based on the preset first initial parameter information, and process the price feature data according to the cluster feature tree generation method in the BIRCH algorithm to obtain a price cluster feature tree;

步骤S312、基于所述价格聚类特征树得到至少一个聚类特征簇,并得到计算每个聚类特征簇对应的阈值范围;Step S312: Obtain at least one clustering feature cluster based on the price clustering feature tree, and calculate the threshold range corresponding to each clustering feature cluster;

步骤S313、对所有的所述阈值范围进行分析,将全部的所述阈值范围中最小阈值范围作为判断正常点的正常阈值范围;Step S313: Analyze all the threshold ranges, and use the smallest threshold range among all the threshold ranges as the normal threshold range for determining the normal point;

步骤S314、基于所述正常阈值范围确定所述价格聚类特征树中的异常点,并基于所述异常点判断所述价格特征数据信息中的异常数据信息。Step S314: Determine abnormal points in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal points.

可以理解的是上述步骤将预设的第一初始参数对BIRCH算法进行参数设定,并基于价格特征数据信息生成聚类特征树,然后基于聚类特征树进行对所述价格特征数据信息进行聚类,得到至少一个聚类簇,然后对聚类簇大小的范围进行分析,选择最小的范围作为正常数据的阈值范围,进而反向判断出异常数据的阈值范围,得到异常数据,这样可以高效快速的将数据进行处理,减少第二次聚类时需要处理的数据量。It can be understood that the above steps parameterize the BIRCH algorithm with the preset first initial parameters, generate a clustering feature tree based on the price feature data information, and then cluster the price feature data information based on the clustering feature tree. class, get at least one cluster, then analyze the range of cluster sizes, select the smallest range as the threshold range of normal data, and then reversely determine the threshold range of abnormal data to obtain abnormal data, which can be efficient and fast The data is processed to reduce the amount of data that needs to be processed in the second clustering.

在本公开的一种具体实施方式中,所述步骤S33包括步骤S331、步骤S332、步骤S333和步骤S334。In a specific implementation manner of the present disclosure, step S33 includes step S331, step S332, step S333 and step S334.

步骤S331、基于预设的第二初始参数和所述第二子信息中的数据信息进行数据处理,其中将所述第二子信息中的数据信息转化为空间坐标系内的坐标数据点,基于每个所述坐标数据点,得到每个所述坐标数据点之间的相互可达距离;Step S331: Perform data processing based on the preset second initial parameters and the data information in the second sub-information, where the data information in the second sub-information is converted into coordinate data points in the spatial coordinate system, based on For each of the coordinate data points, obtain the mutual reachable distance between each of the coordinate data points;

步骤S332、基于所述相互可达距离生成加权距离图,并基于所述加权距离图生成相互可达距离的最小生成树;Step S332: Generate a weighted distance map based on the mutual reachability distance, and generate a minimum spanning tree of the mutual reachability distance based on the weighted distance map;

步骤S333、按所述相互可达距离将所述最小生成树转换成层次簇结构的组件,并基于所述聚类层次结构的组件构建层次簇结构;Step S333: Convert the minimum spanning tree into components of a hierarchical cluster structure according to the mutual reachability distance, and build a hierarchical cluster structure based on the components of the clustering hierarchy;

步骤S334、将所述层次簇结构进行压缩,并基于压缩后的层次簇结构对所述第二子信息中的数据信息进行分类,得到第二子信息中的异常数据。Step S334: Compress the hierarchical cluster structure, and classify the data information in the second sub-information based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information.

可以理解的是上述步骤通过将第二初始参数对聚类算法进行参数设定,并将第二子信息进行空间坐标转化,并计算每个坐标点之间的相互可达距离,增加算法对噪声的鲁棒性,然后基于相互可达距离构建最小生成树,进而对坐标点进行聚类,得到第二子信息中的异常数据,这样能够更加精确的确定每个销量特征数据信息的中的异常数据信息。It can be understood that the above steps set the parameters of the clustering algorithm by setting the second initial parameter, transform the second sub-information into spatial coordinates, and calculate the mutual reachability distance between each coordinate point, thereby increasing the algorithm's ability to deal with noise. Robustness, and then build a minimum spanning tree based on the mutual reachability distance, and then cluster the coordinate points to obtain the abnormal data in the second sub-information, so that the anomalies in each sales feature data information can be more accurately determined Data information.

在本公开的一种具体实施方式中,所述步骤S4包括步骤S41、步骤S42、步骤S43、步骤S44、步骤S45和步骤S46。In a specific implementation manner of the present disclosure, step S4 includes step S41, step S42, step S43, step S44, step S45 and step S46.

步骤S41、获取第三子信息,所述第三子信息历史商品的正常销售数据信息、历史商品的异常销售数据信息;Step S41: Obtain third sub-information, which includes normal sales data information of historical commodities and abnormal sales data information of historical commodities;

步骤S42、将所述第三子信息分为测试集和验证集,并将所述第三子信息的测试集发送至第二聚类模块进行处理,得到历史异常商品销售数据;Step S42: Divide the third sub-information into a test set and a verification set, and send the test set of the third sub-information to the second clustering module for processing to obtain historical abnormal product sales data;

步骤S43、将所述历史异常商品销售数据和所述验证集进行对比,得到验证结果信息;Step S43: Compare the historical abnormal product sales data with the verification set to obtain verification result information;

步骤S44、将所述验证结果信息与所述异常检测模型内的所有初始参数进行灰色关联分析,得到所述验证结果信息和所有所述初始参数的关联度;Step S44: Perform gray correlation analysis on the verification result information and all initial parameters in the anomaly detection model to obtain the correlation between the verification result information and all the initial parameters;

步骤S45、基于所述验证结果信息和所述关联度调整所述初始参数,得到调整好初始参数的异常检测模型,其中若验证结果为所述测试集与所述验证集不一致,则调整与所述验证结果关联度最大的初始参数,直至所述验证结果为所述测试集与所述验证集结果一致;Step S45: Adjust the initial parameters based on the verification result information and the correlation degree to obtain an anomaly detection model with adjusted initial parameters. If the verification result is that the test set is inconsistent with the verification set, adjust the The initial parameters with the greatest correlation between the verification results, until the verification results are consistent with the results of the test set and the verification set;

步骤S46、将所述第一信息发送至调整初始参数后的异常检测模型进行第二次异常检测,并将第二次异常检测后的数据对所述第三信息进行筛选,得到筛选到的异常商品销售数据。Step S46: Send the first information to the anomaly detection model after adjusting the initial parameters for a second anomaly detection, and filter the third information with the data after the second anomaly detection to obtain the screened anomalies. Product sales data.

可以理解的是本发明通过对历史数据进行分类,然后基于历史数据的进行发送至异常检测模型内进行检测,将检测结果和初始参数进行灰色关联分析,得到检测结果和初始参数的关联度,并基于所述关联度对第一初始参数和第二初始参数进行调整,得到调整好的异常检测模型,然后进行二次异常数据筛选,得到筛选到的异常商品销售数据。It can be understood that the present invention classifies historical data, and then sends it to the anomaly detection model for detection based on the historical data, performs gray correlation analysis on the detection results and the initial parameters, and obtains the correlation between the detection results and the initial parameters, and The first initial parameter and the second initial parameter are adjusted based on the correlation degree to obtain an adjusted anomaly detection model, and then a secondary abnormal data screening is performed to obtain the screened abnormal product sales data.

实施例2Example 2

如图2所示,本实施例提供了一种异常数据检测装置,所述装置包括第一获取单元701、第一处理单元702、第二处理单元703和第三处理单元704。As shown in Figure 2, this embodiment provides an abnormal data detection device, which includes a first acquisition unit 701, a first processing unit 702, a second processing unit 703, and a third processing unit 704.

第一获取单元701,用于获取第一信息,所述第一信息至少一个商店的商品销售数据信息;The first obtaining unit 701 is used to obtain first information, which is the product sales data information of at least one store;

第一处理单元702,用于将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;The first processing unit 702 is used to send the first information to the data preprocessing model to obtain second information. The second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information. ;

第二处理单元703,用于将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;The second processing unit 703 is configured to send the second information to the anomaly detection model for abnormal data detection to obtain third information. The third information is anomalies obtained by performing two clustering screenings on the second information. Product sales data;

第三处理单元704,用于将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据。The third processing unit 704 is configured to send the third information to the verification module for processing to obtain fourth information. The fourth information is the abnormal product sales data filtered by the model after verifying the parameters.

在本公开的一种具体实施方式中,所述第一处理单元702包括第一处理子单元7021、第二处理子单元7022和第三处理子单元7023。In a specific implementation of the present disclosure, the first processing unit 702 includes a first processing sub-unit 7021, a second processing sub-unit 7022 and a third processing sub-unit 7023.

第一处理子单元7021,用于将所述商品销售数据信息中进行数据处理,清除所述第一信息内的无效数据,并将所述第一信息内的残缺数据进行均值填补,得到第一子信息;The first processing subunit 7021 is used to perform data processing on the commodity sales data information, clear invalid data in the first information, and average-fill the incomplete data in the first information to obtain the first sub-information;

第二处理子单元7022,用于基于所述第一子信息和预设的计算公式,计算得到所述第一信息的特征数据,所述第一信息的特征数据包括价格特征数据和销量特征数据;The second processing sub-unit 7022 is used to calculate the characteristic data of the first information based on the first sub-information and the preset calculation formula. The characteristic data of the first information includes price characteristic data and sales volume characteristic data. ;

第三处理子单元7023,用于将所述第一信息的特征数据进行归一化处理,并将归一化处理后的特征数据进行平滑处理,得到预处理后的第一信息。The third processing sub-unit 7023 is used to normalize the feature data of the first information, and smooth the normalized feature data to obtain pre-processed first information.

在本公开的一种具体实施方式中,所述第二处理单元703包括第一聚类子单元7031、第四处理子单元7032和第二聚类子单元7033。In a specific implementation of the present disclosure, the second processing unit 703 includes a first clustering sub-unit 7031, a fourth processing sub-unit 7032 and a second clustering sub-unit 7033.

第一聚类子单元7031,用于将所述第二信息内的价格特征数据信息发送至第一聚类模块进行聚类,得到第一聚类信息,所述第一聚类信息为价格特征数据信息中的异常数据信息;The first clustering subunit 7031 is used to send the price feature data information in the second information to the first clustering module for clustering to obtain the first clustering information, where the first clustering information is the price feature Abnormal data in the data;

第四处理子单元7032,用于将所述第一聚类信息与所述第二信息内的销量特征数据信息进行数据对应映射,得到第二子信息,所述第二子信息包括第一聚类信息对应的销量特征数据信息;The fourth processing subunit 7032 is used to perform data corresponding mapping between the first cluster information and the sales volume characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the first cluster information. The sales volume characteristic data information corresponding to the category information;

第二聚类子单元7033,用于将所述第二子信息发送至第二聚类模块进行处理,得到第二聚类信息,所述第二聚类信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据。The second clustering subunit 7033 is used to send the second sub-information to the second clustering module for processing to obtain the second clustering information. The second clustering information is performed by performing two operations on the second information. Abnormal product sales data obtained through sub-clustering screening.

在本公开的一种具体实施方式中,所述第一聚类子单元7031包括第三聚类子单元70311、第四聚类子单元70312、第五聚类子单元70313和第六聚类子单元70314。In a specific implementation of the present disclosure, the first clustering subunit 7031 includes a third clustering subunit 70311, a fourth clustering subunit 70312, a fifth clustering subunit 70313, and a sixth clustering subunit. Unit 70314.

第三聚类子单元70311,用于基于预设第一初始参数信息遍历所述价格特征数据信息,并将所述价格特征数据按照BIRCH算法中聚类特征树的生成方法进行处理,得到价格聚类特征树;The third clustering subunit 70311 is used to traverse the price feature data information based on the preset first initial parameter information, and process the price feature data according to the cluster feature tree generation method in the BIRCH algorithm to obtain price clustering. Class feature tree;

第四聚类子单元70312,用于基于所述价格聚类特征树得到至少一个聚类特征簇,并得到计算每个聚类特征簇对应的阈值范围;The fourth clustering subunit 70312 is used to obtain at least one clustering feature cluster based on the price clustering feature tree, and obtain and calculate the threshold range corresponding to each clustering feature cluster;

第五聚类子单元70313,用于对所有的所述阈值范围进行分析,将全部的所述阈值范围中最小阈值范围作为判断正常点的正常阈值范围;The fifth clustering subunit 70313 is used to analyze all the threshold ranges, and use the minimum threshold range among all the threshold ranges as the normal threshold range for judging normal points;

第六聚类子单元70314,用于基于所述正常阈值范围确定所述价格聚类特征树中的异常点,并基于所述异常点判断所述价格特征数据信息中的异常数据信息。The sixth clustering subunit 70314 is configured to determine abnormal points in the price cluster feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal points.

在本公开的一种具体实施方式中,所述第二聚类子单元7033包括第七聚类子单元70331、第八聚类子单元70332、第九聚类子单元70333和第十聚类子单元70334。In a specific implementation of the present disclosure, the second clustering subunit 7033 includes a seventh clustering subunit 70331, an eighth clustering subunit 70332, a ninth clustering subunit 70333, and a tenth clustering subunit. Unit 70334.

第七聚类子单元70331,用于基于预设的第二初始参数和所述第二子信息中的数据信息进行数据处理,其中将所述第二子信息中的数据信息转化为空间坐标系内的坐标数据点,基于每个所述坐标数据点,得到每个所述坐标数据点之间的相互可达距离;The seventh clustering subunit 70331 is used to perform data processing based on the preset second initial parameters and the data information in the second sub-information, where the data information in the second sub-information is converted into a spatial coordinate system coordinate data points within, based on each of the coordinate data points, obtain the mutual reachable distance between each of the coordinate data points;

第八聚类子单元70332,用于基于所述相互可达距离生成加权距离图,并基于所述加权距离图生成相互可达距离的最小生成树;The eighth clustering subunit 70332 is configured to generate a weighted distance map based on the mutual reachability distance, and generate a minimum spanning tree of the mutual reachability distance based on the weighted distance map;

第九聚类子单元70333,用于按所述相互可达距离将所述最小生成树转换成层次簇结构的组件,并基于所述聚类层次结构的组件构建层次簇结构;The ninth clustering subunit 70333 is used to convert the minimum spanning tree into components of a hierarchical cluster structure according to the mutual reachability distance, and build a hierarchical cluster structure based on the components of the clustering hierarchy;

第十聚类子单元70334,用于将所述层次簇结构进行压缩,并基于压缩后的层次簇结构对所述第二子信息中的数据信息进行分类,得到第二子信息中的异常数据。The tenth clustering subunit 70334 is used to compress the hierarchical cluster structure and classify the data information in the second sub-information based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information. .

在本公开的一种具体实施方式中,所述第三处理单元704包括第一获取子单元7041、第五处理子单元7042、第六处理子单元7043、第七处理子单元7044、第八处理子单元7045和第九处理子单元7046。In a specific implementation manner of the present disclosure, the third processing unit 704 includes a first acquisition sub-unit 7041, a fifth processing sub-unit 7042, a sixth processing sub-unit 7043, a seventh processing sub-unit 7044, an eighth processing Subunit 7045 and ninth processing subunit 7046.

第一获取子单元7041,用于获取第三子信息,所述第三子信息历史商品的正常销售数据信息、历史商品的异常销售数据信息;The first acquisition sub-unit 7041 is used to acquire the third sub-information, which is the normal sales data information of historical commodities and the abnormal sales data information of historical commodities;

第五处理子单元7042,用于将所述第三子信息分为测试集和验证集,并将所述第三子信息的测试集发送至第二聚类模块进行处理,得到历史异常商品销售数据;The fifth processing subunit 7042 is used to divide the third sub-information into a test set and a verification set, and send the test set of the third sub-information to the second clustering module for processing to obtain historical abnormal product sales. data;

第六处理子单元7043,用于将所述历史异常商品销售数据和所述验证集进行对比,得到验证结果信息;The sixth processing subunit 7043 is used to compare the historical abnormal product sales data with the verification set to obtain verification result information;

第七处理子单元7044,用于将所述验证结果信息与所述异常检测模型内的所有初始参数进行灰色关联分析,得到所述验证结果信息和所有所述初始参数的关联度;The seventh processing subunit 7044 is used to perform gray correlation analysis on the verification result information and all the initial parameters in the anomaly detection model, and obtain the correlation degree between the verification result information and all the initial parameters;

第八处理子单元7045,用于基于所述验证结果信息和所述关联度调整所述初始参数,得到调整好初始参数的异常检测模型,其中若验证结果为所述测试集与所述验证集不一致,则调整与所述验证结果关联度最大的初始参数,直至所述验证结果为所述测试集与所述验证集结果一致;The eighth processing subunit 7045 is used to adjust the initial parameters based on the verification result information and the correlation degree to obtain an anomaly detection model with adjusted initial parameters, wherein if the verification result is the test set and the verification set If inconsistent, adjust the initial parameter with the greatest correlation with the verification result until the verification result is that the test set and the verification set results are consistent;

第九处理子单元7046,用于将所述第一信息发送至调整初始参数后的异常检测模型进行第二次异常检测,并将第二次异常检测后的数据对所述第三信息进行筛选,得到筛选到的异常商品销售数据。The ninth processing subunit 7046 is used to send the first information to the anomaly detection model after adjusting the initial parameters for the second anomaly detection, and filter the third information with the data after the second anomaly detection. , to obtain the screened abnormal product sales data.

需要说明的是,关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。It should be noted that, regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

实施例3Example 3

相应于上面的方法实施例,本公开实施例还提供了一种异常数据检测设备,下文描述的一种异常数据检测设备与上文描述的一种异常数据检测方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present disclosure also provide an abnormal data detection device. An abnormal data detection device described below and an abnormal data detection method described above may be mutually referenced.

图3是根据一示例性实施例示出的一种异常数据检测设备800的框图。如图3所示,该异常数据检测设备800可以包括:处理器801,存储器802。该异常数据检测设备800还可以包括多媒体组件803,输入/输出(I/O)接口804,以及通信组件805中的一者或多者。FIG. 3 is a block diagram of an abnormal data detection device 800 according to an exemplary embodiment. As shown in Figure 3, the abnormal data detection device 800 may include: a processor 801 and a memory 802. The anomaly data detection device 800 may also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.

其中,处理器801用于控制该异常数据检测设备800的整体操作,以完成上述的异常数据检测方法中的全部或部分步骤。存储器802用于存储各种类型的数据以支持在该异常数据检测设备800的操作,这些数据例如可以包括用于在该异常数据检测设备800上操作的任何应用程序或方法的指令,以及应用程序相关的数据,例如联系人数据、收发的消息、图片、音频、视频等等。该存储器802可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。多媒体组件803可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器802或通过通信组件805发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口804为处理器801和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件805用于该异常数据检测设备800与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near FieldCommunication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件805可以包括:Wi-Fi模块,蓝牙模块,NFC模块。The processor 801 is used to control the overall operation of the abnormal data detection device 800 to complete all or part of the steps in the above-mentioned abnormal data detection method. The memory 802 is used to store various types of data to support operations on the anomaly data detection device 800. These data may include, for example, instructions for any application program or method operating on the anomaly data detection device 800, as well as application programs. Related data, such as contact data, messages sent and received, pictures, audio, video, etc. The memory 802 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM for short), Programmable Read-Only Memory (PROM for short), read-only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. Multimedia components 803 may include screen and audio components. The screen may be a touch screen, for example, and the audio component is used to output and/or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in memory 802 or sent via communication component 805 . The audio component also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules. The other interface modules may be keyboards, mice, buttons, etc. These buttons can be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the abnormal data detection device 800 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or one or a combination of them, so the corresponding communication component 805 can include: Wi -Fi module, Bluetooth module, NFC module.

在一示例性实施例中,异常数据检测设备800可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(DigitalSignal Processor,简称DSP)、数字信号处理设备(Digital Signal ProcessingDevice,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述的一种异常数据检测方法。In an exemplary embodiment, the abnormal data detection device 800 may be configured by one or more application specific integrated circuits (Application Specific Integrated Circuits, ASICs for short), digital signal processors (Digital Signal Processors, DSPs for short), digital signal processing devices ( Digital Signal Processing Device (DSPD for short), Programmable Logic Device (PLD for short), Field Programmable Gate Array (FPGA for short), controller, microcontroller, microprocessor or other electronic components Implementation is used to perform one of the above-mentioned anomaly data detection methods.

在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的异常数据检测方法的步骤。例如,该计算机可读存储介质可以为上述包括程序指令的存储器802,上述程序指令可由异常数据检测设备800的处理器801执行以完成上述的异常数据检测方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided. When the program instructions are executed by a processor, the steps of the above-mentioned abnormal data detection method are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 802 including program instructions, and the above-mentioned program instructions can be executed by the processor 801 of the abnormal data detection device 800 to complete the above-mentioned abnormal data detection method.

实施例4Example 4

相应于上面的方法实施例,本公开实施例还提供了一种可读存储介质,下文描述的一种可读存储介质与上文描述的一种异常数据检测方法可相互对应参照。Corresponding to the above method embodiments, embodiments of the present disclosure also provide a readable storage medium. The readable storage medium described below and the abnormal data detection method described above may be mutually referenced.

一种可读存储介质,可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例的异常数据检测方法的步骤。A readable storage medium. A computer program is stored on the readable storage medium. When the computer program is executed by a processor, the steps of the abnormal data detection method of the above method embodiment are implemented.

该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。The readable storage medium can specifically be a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk that can store program codes. readable storage media.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention. should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (8)

1.一种异常数据检测方法,其特征在于,包括:1. An abnormal data detection method, characterized by including: 获取第一信息,所述第一信息至少一个商店的商品销售数据信息;Obtain first information, which is merchandise sales data information of at least one store; 将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;Send the first information to the data preprocessing model to obtain second information, where the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; 将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;Send the second information to an anomaly detection model for abnormal data detection to obtain third information, where the third information is abnormal product sales data obtained by performing two clustering screenings on the second information; 将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据;Send the third information to the verification module for processing to obtain fourth information, where the fourth information is the abnormal product sales data screened by the model after verifying the parameters; 其中,将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,包括:Wherein, the second information is sent to the anomaly detection model for abnormal data detection, and the third information is obtained, including: 将所述第二信息内的价格特征数据信息发送至第一聚类模块进行聚类,得到第一聚类信息,所述第一聚类信息为价格特征数据信息中的异常数据信息;Send the price characteristic data information in the second information to the first clustering module for clustering to obtain the first clustering information, where the first clustering information is the abnormal data information in the price characteristic data information; 将所述第一聚类信息与所述第二信息内的销量特征数据信息进行数据对应映射,得到第二子信息,所述第二子信息包括第一聚类信息对应的销量特征数据信息;Perform data correspondence mapping between the first clustering information and the sales volume characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the sales volume characteristic data information corresponding to the first clustering information; 将所述第二子信息发送至第二聚类模块进行处理,得到第二聚类信息,所述第二聚类信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;Send the second sub-information to the second clustering module for processing to obtain second clustering information. The second clustering information is the abnormal product sales data obtained by performing two clustering screenings on the second information. ; 其中,将所述第二子信息发送至第二聚类模块进行处理,得到第二聚类信息,包括:Wherein, the second sub-information is sent to the second clustering module for processing to obtain the second clustering information, including: 基于预设的第二初始参数和所述第二子信息中的数据信息进行数据处理,其中将所述第二子信息中的数据信息转化为空间坐标系内的坐标数据点,基于每个所述坐标数据点,得到每个所述坐标数据点之间的相互可达距离;Data processing is performed based on the preset second initial parameters and the data information in the second sub-information, where the data information in the second sub-information is converted into coordinate data points in the spatial coordinate system, based on each According to the coordinate data points, the mutual reachable distance between each of the coordinate data points is obtained; 基于所述相互可达距离生成加权距离图,并基于所述加权距离图生成相互可达距离的最小生成树;Generate a weighted distance map based on the mutual reachability distance, and generate a minimum spanning tree of the mutual reachability distance based on the weighted distance map; 按所述相互可达距离将所述最小生成树转换成层次簇结构的组件,并基于聚类层次结构的组件构建层次簇结构;Convert the minimum spanning tree into components of a hierarchical cluster structure according to the mutual reachability distance, and build a hierarchical cluster structure based on the components of the clustering hierarchy; 将所述层次簇结构进行压缩,并基于压缩后的层次簇结构对所述第二子信息中的数据信息进行分类,得到第二子信息中的异常数据。The hierarchical cluster structure is compressed, and the data information in the second sub-information is classified based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information. 2.根据权利要求1所述的异常数据检测方法,其特征在于,将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理后的信息,包括:2. The abnormal data detection method according to claim 1, characterized in that, the first information is sent to a data preprocessing model to obtain the second information, and the second information is obtained after preprocessing the first information. information, including: 将所述商品销售数据信息中进行数据处理,清除所述第一信息内的无效数据,并将所述第一信息内的残缺数据进行均值填补,得到第一子信息;Perform data processing on the commodity sales data information, clear invalid data in the first information, and average-fill the incomplete data in the first information to obtain the first sub-information; 基于所述第一子信息和预设的计算公式,计算得到所述第一信息的特征数据,所述第一信息的特征数据包括价格特征数据和销量特征数据;Based on the first sub-information and the preset calculation formula, calculate the characteristic data of the first information, where the characteristic data of the first information includes price characteristic data and sales volume characteristic data; 将所述第一信息的特征数据进行归一化处理,并将归一化处理后的特征数据进行平滑处理,得到预处理后的第一信息。The feature data of the first information is normalized, and the normalized feature data is smoothed to obtain the preprocessed first information. 3.根据权利要求1所述的异常数据检测方法,其特征在于,将所述第二信息内的价格特征数据信息发送至第一聚类模块进行聚类,得到第一聚类信息,包括:3. The abnormal data detection method according to claim 1, characterized in that the price characteristic data information in the second information is sent to the first clustering module for clustering to obtain the first clustering information, including: 基于预设第一初始参数信息遍历所述价格特征数据信息,并将所述价格特征数据按照BIRCH算法中聚类特征树的生成方法进行处理,得到价格聚类特征树;Traverse the price feature data information based on the preset first initial parameter information, and process the price feature data according to the cluster feature tree generation method in the BIRCH algorithm to obtain a price cluster feature tree; 基于所述价格聚类特征树得到至少一个聚类特征簇,并得到计算每个聚类特征簇对应的阈值范围;Obtain at least one clustering feature cluster based on the price clustering feature tree, and calculate the threshold range corresponding to each clustering feature cluster; 对所有的所述阈值范围进行分析,将全部的所述阈值范围中最小阈值范围作为判断正常点的正常阈值范围;Analyze all the threshold ranges, and use the minimum threshold range among all the threshold ranges as the normal threshold range for judging the normal point; 基于所述正常阈值范围确定所述价格聚类特征树中的异常点,并基于所述异常点判断所述价格特征数据信息中的异常数据信息。Determine abnormal points in the price clustering feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal points. 4.一种异常数据检测装置,其特征在于,包括:4. An abnormal data detection device, characterized in that it includes: 第一获取单元,用于获取第一信息,所述第一信息至少一个商店的商品销售数据信息;A first acquisition unit, configured to acquire first information, which is merchandise sales data information of at least one store; 第一处理单元,用于将所述第一信息发送至数据预处理模型,得到第二信息,所述第二信息为对第一信息进行预处理得到的价格特征数据信息和销量特征数据信息;A first processing unit, configured to send the first information to a data preprocessing model to obtain second information, where the second information is price characteristic data information and sales volume characteristic data information obtained by preprocessing the first information; 第二处理单元,用于将所述第二信息发送至异常检测模型进行异常数据检测,得到第三信息,所述第三信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;The second processing unit is used to send the second information to the anomaly detection model for abnormal data detection, and obtain the third information. The third information is the abnormal products obtained by performing two clustering screenings on the second information. sales data; 第三处理单元,用于将所述第三信息发送至校验模块进行处理,得到第四信息,所述第四信息为校验参数后的模型筛选到的异常商品销售数据;A third processing unit, configured to send the third information to the verification module for processing to obtain fourth information, where the fourth information is the abnormal product sales data screened by the model after verifying the parameters; 其中,所述第二处理单元包括:Wherein, the second processing unit includes: 第一聚类子单元,用于将所述第二信息内的价格特征数据信息发送至第一聚类模块进行聚类,得到第一聚类信息,所述第一聚类信息为价格特征数据信息中的异常数据信息;The first clustering subunit is used to send the price characteristic data information in the second information to the first clustering module for clustering to obtain the first clustering information, where the first clustering information is the price characteristic data. Abnormal data information in the information; 第四处理子单元,用于将所述第一聚类信息与所述第二信息内的销量特征数据信息进行数据对应映射,得到第二子信息,所述第二子信息包括第一聚类信息对应的销量特征数据信息;The fourth processing subunit is used to perform data corresponding mapping between the first cluster information and the sales volume characteristic data information in the second information to obtain second sub-information, where the second sub-information includes the first cluster Sales volume characteristic data information corresponding to the information; 第二聚类子单元,用于将所述第二子信息发送至第二聚类模块进行处理,得到第二聚类信息,所述第二聚类信息为对所述第二信息进行两次聚类筛选得到的异常商品销售数据;The second clustering subunit is used to send the second sub-information to the second clustering module for processing to obtain the second clustering information. The second clustering information is performed twice on the second information. Abnormal product sales data obtained through clustering screening; 其中,所述第二聚类子单元包括:Wherein, the second clustering subunit includes: 第七聚类子单元,用于基于预设的第二初始参数和所述第二子信息中的数据信息进行数据处理,其中将所述第二子信息中的数据信息转化为空间坐标系内的坐标数据点,基于每个所述坐标数据点,得到每个所述坐标数据点之间的相互可达距离;The seventh clustering subunit is used to perform data processing based on the preset second initial parameters and the data information in the second sub-information, wherein the data information in the second sub-information is converted into a spatial coordinate system coordinate data points, and based on each of the coordinate data points, obtain the mutual reachable distance between each of the coordinate data points; 第八聚类子单元,用于基于所述相互可达距离生成加权距离图,并基于所述加权距离图生成相互可达距离的最小生成树;An eighth clustering subunit is configured to generate a weighted distance map based on the mutual reachability distance, and generate a minimum spanning tree of the mutual reachability distance based on the weighted distance map; 第九聚类子单元,用于按所述相互可达距离将所述最小生成树转换成层次簇结构的组件,并基于聚类层次结构的组件构建层次簇结构;The ninth clustering subunit is used to convert the minimum spanning tree into components of a hierarchical cluster structure according to the mutual reachability distance, and build a hierarchical cluster structure based on the components of the clustering hierarchy; 第十聚类子单元,用于将所述层次簇结构进行压缩,并基于压缩后的层次簇结构对所述第二子信息中的数据信息进行分类,得到第二子信息中的异常数据。The tenth clustering subunit is used to compress the hierarchical cluster structure and classify the data information in the second sub-information based on the compressed hierarchical cluster structure to obtain abnormal data in the second sub-information. 5.根据权利要求4所述的异常数据检测装置,其特征在于,所述装置包括:5. The abnormal data detection device according to claim 4, characterized in that the device includes: 第一处理子单元,用于将所述商品销售数据信息中进行数据处理,清除所述第一信息内的无效数据,并将所述第一信息内的残缺数据进行均值填补,得到第一子信息;The first processing subunit is used to perform data processing on the commodity sales data information, clear invalid data in the first information, and average-fill the incomplete data in the first information to obtain the first sub-unit. information; 第二处理子单元,用于基于所述第一子信息和预设的计算公式,计算得到所述第一信息的特征数据,所述第一信息的特征数据包括价格特征数据和销量特征数据;The second processing subunit is configured to calculate the characteristic data of the first information based on the first sub-information and the preset calculation formula. The characteristic data of the first information includes price characteristic data and sales volume characteristic data; 第三处理子单元,用于将所述第一信息的特征数据进行归一化处理,并将归一化处理后的特征数据进行平滑处理,得到预处理后的第一信息。The third processing subunit is used to normalize the characteristic data of the first information and smooth the normalized characteristic data to obtain the preprocessed first information. 6.根据权利要求4所述的异常数据检测装置,其特征在于,所述装置包括:6. The abnormal data detection device according to claim 4, characterized in that the device includes: 第三聚类子单元,用于基于预设第一初始参数信息遍历所述价格特征数据信息,并将所述价格特征数据按照BIRCH算法中聚类特征树的生成方法进行处理,得到价格聚类特征树;The third clustering subunit is used to traverse the price feature data information based on the preset first initial parameter information, and process the price feature data according to the cluster feature tree generation method in the BIRCH algorithm to obtain price clustering feature tree; 第四聚类子单元,用于基于所述价格聚类特征树得到至少一个聚类特征簇,并得到计算每个聚类特征簇对应的阈值范围;The fourth clustering subunit is used to obtain at least one clustering feature cluster based on the price clustering feature tree, and obtain and calculate the threshold range corresponding to each clustering feature cluster; 第五聚类子单元,用于对所有的所述阈值范围进行分析,将全部的所述阈值范围中最小阈值范围作为判断正常点的正常阈值范围;The fifth clustering subunit is used to analyze all the threshold ranges, and use the minimum threshold range among all the threshold ranges as the normal threshold range for judging normal points; 第六聚类子单元,用于基于所述正常阈值范围确定所述价格聚类特征树中的异常点,并基于所述异常点判断所述价格特征数据信息中的异常数据信息。A sixth clustering subunit is configured to determine abnormal points in the price cluster feature tree based on the normal threshold range, and determine abnormal data information in the price feature data information based on the abnormal points. 7.一种异常数据检测设备,其特征在于,包括:7. An abnormal data detection device, characterized by including: 存储器,用于存储计算机程序;Memory, used to store computer programs; 处理器,用于执行所述计算机程序时实现如权利要求1至3任一项所述异常数据检测方法的步骤。A processor, configured to implement the steps of the abnormal data detection method according to any one of claims 1 to 3 when executing the computer program. 8.一种可读存储介质,其特征在于:所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至3任一项所述异常数据检测方法的步骤。8. A readable storage medium, characterized in that: a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the abnormal data detection method according to any one of claims 1 to 3 is implemented. A step of.
CN202210458381.2A 2022-04-27 2022-04-27 An abnormal data detection method, device, equipment and readable storage medium Active CN114708003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210458381.2A CN114708003B (en) 2022-04-27 2022-04-27 An abnormal data detection method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210458381.2A CN114708003B (en) 2022-04-27 2022-04-27 An abnormal data detection method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114708003A CN114708003A (en) 2022-07-05
CN114708003B true CN114708003B (en) 2023-11-10

Family

ID=82177116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210458381.2A Active CN114708003B (en) 2022-04-27 2022-04-27 An abnormal data detection method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114708003B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
US9454785B1 (en) * 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
CN106529968A (en) * 2016-09-29 2017-03-22 深圳大学 Customer classification method and system thereof based on transaction data
KR101834260B1 (en) * 2017-01-18 2018-03-06 한국인터넷진흥원 Method and Apparatus for Detecting Fraudulent Transaction
CN107918905A (en) * 2017-11-22 2018-04-17 阿里巴巴集团控股有限公司 Abnormal transaction identification method, apparatus and server
CN109389453A (en) * 2017-08-11 2019-02-26 苏宁云商集团股份有限公司 A kind of price analysis method and device
CN110046889A (en) * 2019-03-20 2019-07-23 腾讯科技(深圳)有限公司 A kind of detection method, device and the server of abnormal behaviour main body
CN110400220A (en) * 2019-07-23 2019-11-01 上海氪信信息技术有限公司 A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network
CN113988148A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Data clustering method, system, computer equipment and storage medium
CN114077872A (en) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 Data anomaly detection method and related device
CN114186626A (en) * 2021-12-09 2022-03-15 中国建设银行股份有限公司 Abnormity detection method and device, electronic equipment and computer readable medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11108835B2 (en) * 2019-03-29 2021-08-31 Paypal, Inc. Anomaly detection for streaming data
WO2021213494A1 (en) * 2020-04-23 2021-10-28 YatHing Biotechnology Company Limited Methods related to the diagnosis of prostate cancer
CN114548276A (en) * 2022-02-22 2022-05-27 Oppo广东移动通信有限公司 Method and device for clustering data, electronic equipment and storage medium
CN115510982A (en) * 2022-09-29 2022-12-23 联想(北京)有限公司 Clustering method, device, equipment and computer storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809448A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Account transaction clustering method and system thereof
US9454785B1 (en) * 2015-07-30 2016-09-27 Palantir Technologies Inc. Systems and user interfaces for holistic, data-driven investigation of bad actor behavior based on clustering and scoring of related data
CN106529968A (en) * 2016-09-29 2017-03-22 深圳大学 Customer classification method and system thereof based on transaction data
KR101834260B1 (en) * 2017-01-18 2018-03-06 한국인터넷진흥원 Method and Apparatus for Detecting Fraudulent Transaction
CN109389453A (en) * 2017-08-11 2019-02-26 苏宁云商集团股份有限公司 A kind of price analysis method and device
CN107918905A (en) * 2017-11-22 2018-04-17 阿里巴巴集团控股有限公司 Abnormal transaction identification method, apparatus and server
CN110046889A (en) * 2019-03-20 2019-07-23 腾讯科技(深圳)有限公司 A kind of detection method, device and the server of abnormal behaviour main body
CN110400220A (en) * 2019-07-23 2019-11-01 上海氪信信息技术有限公司 A kind of suspicious transaction detection method of intelligence based on semi-supervised figure neural network
CN113988148A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Data clustering method, system, computer equipment and storage medium
CN114077872A (en) * 2021-11-29 2022-02-22 税友软件集团股份有限公司 Data anomaly detection method and related device
CN114186626A (en) * 2021-12-09 2022-03-15 中国建设银行股份有限公司 Abnormity detection method and device, electronic equipment and computer readable medium

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
A survey of anomaly detection techniques in financial domain;Mohiuddin Ahmed等;Future Generation Computer Systems;第55卷;278-288 *
Amaretto: An Active Learning Framework for Money Laundering Detection;Danilo Labanca等;IEEE Access;第10卷;41720 - 41739 *
Anomaly Detection Based on Enhanced DBScan Algorithm;Zhenguo Chen等;SciVerse ScienceDirect;第15卷;178-182 *
Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions;Thushara Amarasinghe;Proceedings of the 2018 International Conference on Machine Learning Technologies;12-17 *
DBSCAN Clustering Algorithm Applied to Identify Suspicious Financial Transactions;Yan Yang;2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery;60–65 *
DESIGN AND SIMULATION OF AN EFFICIENT MODEL FOR CREDIT CARDS FRAUD DETECTION;Ibrahim K. Ogundoyin等;Journal of Engineering and Technology;第16卷(第1期);88-99 *
基于"多层次分类"方法的异常P2P网贷借款识别;罗钦芳 等;管理工程学报;第31卷(第3期);201-209 *
基于Spark的层次聚类算法的并行化研究;余胜辉;计算机技术与发展;第30卷(第6期);19-22 *
基于定性数据聚类的孤立森林算法;陈敏昊;CNKI优秀硕士学位论文全文库;第2022卷(第3期);1-56 *
基于机器学习的信用卡欺诈检测方案的研究;王红雨;CNKI优秀硕士学位论文全文库;第2019卷(第08期);1-66 *
基于核的层次聚类算法研究;韩鑫;CNKI优秀硕士学位论文全文库;第2021卷(第9期);1-65 *
基于过采样的不平衡数据集成分类算法研究;赵学华;CNKI优秀硕士学位论文全文库;第2021卷(第02期);1-79 *
朱琳.银行交易大数据洗钱挖掘模型及应用研究.中国优秀硕士学位论文全文数据库信息科技辑.2021,(第2期),7-9,16-1 7,40-46. *
银行交易大数据洗钱挖掘模型及应用研究;朱琳;中国优秀硕士学位论文全文数据库信息科技辑(第2期);7-9,16-1 7,40-46 *

Also Published As

Publication number Publication date
CN114708003A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
JP6697584B2 (en) Method and apparatus for identifying data risk
CN113379123A (en) Fault prediction method, device, server and storage medium based on digital twin
CN109101989B (en) Merchant classification model construction and merchant classification method, device and equipment
WO2018028546A1 (en) Key point positioning method, terminal, and computer storage medium
CN110874778A (en) Abnormal order detection method and device
CN109376631B (en) Loop detection method and device based on neural network
WO2019019628A1 (en) Test method, apparatus, test device and medium for mobile application
WO2019148729A1 (en) Luxury goods identification method, electronic device, and storage medium
CN109242307B (en) Anti-fraud policy analysis method, server, electronic device and storage medium
CN110674873B (en) Image classification method, device, mobile terminal and storage medium
CN114708003B (en) An abnormal data detection method, device, equipment and readable storage medium
WO2020151315A1 (en) Method and device for generating face recognition fusion model
CN111784053A (en) Transaction risk detection method, device and readable storage medium
CN114387089A (en) Customer credit risk assessment method, device, equipment and storage medium
CN112839047A (en) A method, device, device and medium for asset vulnerability scanning on a cloud platform
CN117474881A (en) System and method for detecting on-line quality of medicinal glass bottle
CN117151855A (en) Fraud risk prediction method, apparatus, computer device, and readable storage medium
CN113111734B (en) Watermark classification model training method and device
CN112700013B (en) Parameter configuration method, device, equipment and storage medium based on federated learning
WO2022199078A1 (en) Method and apparatus for generating item dynamic information, device, and computer-readable medium
CN115048996A (en) Quality assessment model training and using method, equipment and storage medium
CN109165962B (en) Data transmission control method and agricultural trade system based on e-commerce
CN114820003A (en) Pricing information abnormity identification method and device, electronic equipment and storage medium
CN112766391B (en) Method, system, equipment and medium for making document
CN109740671B (en) An image recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant