CN110555051B - Product test abnormal behavior detection system based on behavior sequence analysis - Google Patents
Product test abnormal behavior detection system based on behavior sequence analysis Download PDFInfo
- Publication number
- CN110555051B CN110555051B CN201810456933.XA CN201810456933A CN110555051B CN 110555051 B CN110555051 B CN 110555051B CN 201810456933 A CN201810456933 A CN 201810456933A CN 110555051 B CN110555051 B CN 110555051B
- Authority
- CN
- China
- Prior art keywords
- data
- sequence
- module
- similarity
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A system for detecting abnormal behavior of a product test based on behavior sequence analysis, comprising: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset. According to the invention, the similarity difference of the data sequences is analyzed, a conventional data recording behavior model is established, and the abnormality in the data recording process is detected, so that the reliability evaluation of the product quality detection data is obtained.
Description
Technical Field
The invention relates to a technology in the field of information processing, in particular to a product test abnormal behavior detection system based on behavior sequence analysis.
Background
In the manufacturing industry, if a inspector does not actually test a product, but based on some of the actual test results, certain strategies are adopted to forge data, so that the forged data are also within reasonable error range, the false data are difficult to find, but the quality inspection result becomes unreliable. The existing abnormal behavior detection method comprises the steps of learning the characteristics of abnormal behaviors under the condition of a large number of labels, and detecting whether the known abnormal behaviors exist in new data according to the characteristics; and when the abnormal mode cannot be confirmed and represented by the characteristics, establishing a conventional behavior model, and finding abnormal behaviors by detecting deviations from the conventional behavior model. The strategies adopted by different testers in forging false data may be different, and it is difficult to build a model for each abnormal behavior when the labeled data set is less.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a product testing abnormal behavior detection system based on behavior sequence analysis, which is used for establishing a conventional data recording behavior model by analyzing similarity differences of data sequences and detecting the abnormality in the data recording process, so that the reliability of product quality detection data is evaluated.
The invention is realized by the following technical scheme:
the invention comprises the following steps: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset.
Technical effects
Compared with the prior art, the method and the device realize the process of data reliability assessment by modeling the data recording behavior based on quality test data and by utilizing the sequence information of the data and by a behavior model analysis method. Calculating sequence similarity for different sub-data segments to obtain the highest similarity sub-sequence score existing in each sub-sequence segment; for the same data sequence, the difference between the highest sequence similarity scores among different subsets can reflect the strategy change before and after the data sequence generation process; clustering the internal sequence similarity differences of the data sequences, wherein the largest cluster can represent the conventional behavior; the reliability of the batch data can be predicted from deviations from the conventional behavioral model.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
fig. 2 is a block diagram of an embodiment of the present invention.
Detailed Description
As shown in fig. 2, the present embodiment includes: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module.
The data preprocessing module acquires quality detection data record files, wherein each file is a product inspection result of a certain batch, analyzes the data file, extracts data information in the data file, and converts a sequence of the data record into an ordered list for representation, and specifically comprises the following steps:
each element in the list represents the test result of one product in the batch, then the data line with the missing value or obvious abnormal value is removed through data cleaning, the list is uniformly divided into a plurality of non-repeated sub-data segments according to the set segmentation number, and the structured data is stored in a storage module and is simultaneously sent to a sequence model building module for further processing.
The sequence model construction module receives structured segment data information D, wherein D represents a data list obtained after processing a data file, and D comprises n sequences D with equal length 1 ,d 2 ,…,d n The method comprises the steps of carrying out a first treatment on the surface of the The sequence model construction module constructs each segmented data sequence d i Sub-sequence division is carried out, and two sub-sequences obtained after four groups of division are used for carrying out local sequence by using a dynamic programming algorithmAfter comparing and calculating the sequence similarity score matrix, taking the maximum value in the obtained similarity matrix to represent the maximum similarity subsequence score s in the segment i I.e. the highest sequence similarity among them.
The division, considering that the data segment pasted after modification may be adjacent to the original data segment or have an interval, is divided into two cases, one is to divide the sequence into two sequences a, b directly from the middle, the other is to assume that the length of the sub-sequence to be divided is L, take an interval value as gap_length, splice [0, gap_length ], [2 x gap_length,3 x gap_length) … into a sub-sequence a g The rest part is spliced into another subsequence b g 。
Considering that the pasted data segment may be in the opposite sequence relation with the original data segment, the two sub-sequences are respectively obtained by taking one sub-sequence for the two divisionsb and->b g 。
The dynamic programming algorithm calculates the sequence similarity score matrix in the following manner
Wherein: a is a sequence similarity scoring matrix, a, b respectively represent two subsequences to be compared, s (a i ,b j ) Representing the similarity between the ith element in the sequence a and the jth element in the sequence b, A ij Representing the alignment of two sequences from front to back to element a i ,b j C represents a gap penalty, n and m are the lengths of the sequences a, b, respectively. In the calculation process, firstly, initializing a matrix, wherein A is i,0 =A 0,j =0, (0.ltoreq.i.ltoreq.n, 0.ltoreq.j.ltoreq.m). For each subsequent A i,j Are calculated from the element scores that have been calculated previously.
By introducing a gap penalty mechanism, this algorithm can also find sequences with high similarity in addition to matching identical sequences.
The mechanism of gap penalty refers to: for two sequence segments, if one of the skipped intervals or if several elements are repeated the same as the other, then this interval is penalized. In the algorithm, the interval is 1 penalty of c.
The clustering is divided into four groups of the data list D obtained after the data file processing, namely, all s are taken respectively i The coefficient of variation and the relative difference are calculated, namely: coefficient of variation: CV = σ/μ, relatively very poor: rr= (max-min)/μ, four groups (CV, RR) are used to represent differences of sequence similarity inside each batch of data D, and based on a large number of different batches of test data, the (CV, RR) values are clustered, wherein the largest cluster represents normal behavior, and the cluster center of the cluster is selected as a normal behavior model and stored in the storage module.
The prediction module receives four groups of value pairs representing the difference of the similarity of the internal sequence of any batch of data calculated by the sequence model construction module, takes out a conventional behavior model from the storage module, and calculates the distance between the two. By mapping the distance between 0 and 1 with the tanh function, a deviation of the batch data in percent from the conventional behavior model is obtained. And selecting the maximum value from the four offsets of one batch of data, comparing the maximum value with a set threshold value, and judging that the quality data of the batch of products has fake behaviors in the recording process if the maximum value exceeds the threshold value, otherwise, judging that the quality data of the batch of products does not have the non-compliant behaviors.
The storage module is provided with a database for storing the processed structured data, if parameters are adjusted, the model can be recalculated, and the file system stores model files obtained in the modeling process and is used for judging the category of the given batch data and analyzing the reliability of the given batch data by extracting the model from the prediction module.
The system specifically detects abnormal behaviors by the following modes: the data preprocessing module reads the product quality data file, analyzes the product quality data file to obtain a batch of product testing original data, and transmits the structured data obtained through data cleaning and segmentation processing to the sequence model construction module, and simultaneously stores a part of the structured data into the storage module; the sequence model construction module calculates the internal highest sequence similarity of the segmented data, represents sequence similarity difference of different segments by using a variation coefficient and relative extremely difference, finds out the largest cluster to represent conventional behavior through clustering, uses a cluster center as a conventional behavior model, and stores the conventional behavior model into the storage module and the prediction module; and the prediction module obtains the reliability of the batch of product data according to the deviation of the similarity difference between the conventional behavior model and the internal sequence of the data to be detected.
The comparison of the technical indexes of the work and the invention effects of similar products at home and abroad is shown in Table 1
TABLE 1 comparison of inventive effects
Compared with the prior art, the invention does not need to collect other information as an aid in the process of collecting the product quality data, but directly uses the obtained product quality data for analysis. In the analysis process, data characteristics caused by a behavior sequence are presumed from possible irregular behaviors of a data recorder, and a conventional behavior model is established by analyzing unlabeled data so as to compare with the behavior of a new data sequence, thereby finding data fake behaviors and realizing analysis of data reliability. The method does not depend on expert opinion and does not need to collect additional information, so that the method solves the problem of checking the product quality data and provides thought for reliability analysis of more types of data.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.
Claims (3)
1. A system for detecting abnormal behavior of a product test based on behavior sequence analysis, comprising: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset;
the analysis refers to: converting the sequence of the data records into an ordered list representation, each element in the list representing a test result of a product in the batch; removing data lines with missing values or obvious abnormal values in the sequence table through data cleaning; uniformly dividing the sequence table into a plurality of non-repeated sub-data segments, namely structured data;
the sequence similarity of each group of data refers to: each segmented data sequence D of the data list D obtained after processing the data file i Sub-sequence division is carried out, the two sub-sequences obtained after division are subjected to local sequence comparison by using a dynamic programming algorithm, and after the sequence similarity score matrix is calculated, the maximum value in the obtained similarity matrix is taken to represent the maximum similarity sub-sequence score s in the segment i I.e., the highest sequence similarity therein;
the sequence similarity scoring matrix is as follows:
wherein: a is a sequence similarity scoring matrix, a, b respectively represent two subsequences to be compared, s (a i ,b j ) Representing the similarity between the ith element in the sequence a and the jth element in the sequence b, A ij Representing the alignment of two sequences from front to back to element a i ,b j Highest subsequence at the time of (a)Similarity score, c represents gap penalty;
the clustering is divided into four groups of the data list D obtained after the data file processing, namely, all s are taken respectively i The coefficient of variation and the relative difference are calculated, namely: coefficient of variation: CV = σ/μ, relatively very poor: rr= (max-min)/μ, using four groups (CV, RR) to represent differences of sequence similarity inside each batch of data D, clustering (CV, RR) values of the four groups based on a large number of different batches of test data, wherein the largest cluster represents a conventional behavior, selecting a cluster center of the cluster as a conventional behavior model, and storing the cluster center into a storage module;
the dividing comprises the following steps:
(1) dividing the sequence directly from the middle into two sequences a, b;
(2) when the length of the subsequence to be divided is L, taking an interval value as gap_length, splicing [0, gap_length ], [2, 3 ] gap_length) … into a subsequence a g The rest part is spliced into another subsequence b g The method comprises the steps of carrying out a first treatment on the surface of the The two divisions are respectively carried out one sub-sequence and the other sub-sequence is obtainedb and->b g ;
The gap penalty is: for two sequence segments, if one of the skipped intervals or if several elements are repeated the same as the other, then a penalty is placed on this interval;
the comparison offset is as follows: the prediction module receives four groups of value pairs representing the difference of the internal sequence similarity of any batch of data calculated by the sequence model construction module, takes out the conventional behavior model from the storage module, calculates the distance between the two, obtains the offset of the batch of data expressed by percentage and the conventional behavior model by mapping the distance between 0 and 1 by using a tanh function, and selects the maximum value from the four offsets of one batch of data to compare with a set threshold value.
2. The system of claim 1, wherein the storage module has a database for storing the processed structured data.
3. The abnormal behavior detection method based on the system of claim 1 or 2, characterized in that a product quality data file is read through a data preprocessing module, analyzed to obtain a batch of product test original data, and the structured data obtained through data cleaning and segmentation processing is transmitted to a sequence model construction module and stored in a storage module; the sequence model construction module calculates the internal highest sequence similarity of the segmented data, represents sequence similarity difference of different segments by using a variation coefficient and relative extremely difference, finds out the largest cluster to represent conventional behavior through clustering, uses a cluster center as a conventional behavior model, and stores the conventional behavior model into the storage module and the prediction module; and the prediction module obtains the reliability of the batch of product data according to the deviation of the similarity difference between the conventional behavior model and the internal sequence of the data to be detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456933.XA CN110555051B (en) | 2018-05-14 | 2018-05-14 | Product test abnormal behavior detection system based on behavior sequence analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456933.XA CN110555051B (en) | 2018-05-14 | 2018-05-14 | Product test abnormal behavior detection system based on behavior sequence analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110555051A CN110555051A (en) | 2019-12-10 |
CN110555051B true CN110555051B (en) | 2023-04-28 |
Family
ID=68733648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810456933.XA Active CN110555051B (en) | 2018-05-14 | 2018-05-14 | Product test abnormal behavior detection system based on behavior sequence analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110555051B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561878A (en) * | 2009-05-31 | 2009-10-21 | 河海大学 | Unsupervised anomaly detection method and system based on improved CURE clustering algorithm |
JP5342708B1 (en) * | 2013-06-19 | 2013-11-13 | 株式会社日立パワーソリューションズ | Anomaly detection method and apparatus |
-
2018
- 2018-05-14 CN CN201810456933.XA patent/CN110555051B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561878A (en) * | 2009-05-31 | 2009-10-21 | 河海大学 | Unsupervised anomaly detection method and system based on improved CURE clustering algorithm |
JP5342708B1 (en) * | 2013-06-19 | 2013-11-13 | 株式会社日立パワーソリューションズ | Anomaly detection method and apparatus |
Non-Patent Citations (6)
Title |
---|
一种基于聚类的无监督异常检测方法;杨斌等;《计算机工程与应用》;20080101(第01期);全文 * |
中空吹塑类聚乙烯产品质量稳定性评价;王立东等;《石化技术》;20180428;全文 * |
基于K-Means聚类的农产品价格异常数据检测;韩琳等;《计算机系统应用》;20170315(第03期);全文 * |
基于序列匹配的作业相似度检测系统;王晓英等;《计算机工程》;20121220(第24期);正文第3节 * |
基于数据关联性分析的飞轮异常检测;龚学兵等;《航空学报》;20140701(第03期);全文 * |
基于聚类的入侵检测研究综述;肖敏等;《计算机应用》;20080615;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110555051A (en) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113838054B (en) | Mechanical part surface damage detection method based on artificial intelligence | |
US20190360942A1 (en) | Information processing method, information processing apparatus, and program | |
US20220245402A1 (en) | Ai-based pre-training model determination system, and ai-based vision inspection management system using same for product production lines | |
KR102387886B1 (en) | Method and apparatus for refining clean labeled data for artificial intelligence training | |
CN112529109A (en) | Unsupervised multi-model-based anomaly detection method and system | |
JP7276488B2 (en) | Estimation program, estimation method, information processing device, relearning program and relearning method | |
CN110837874A (en) | Service data abnormity detection method based on time series classification | |
CN114139624A (en) | Method for mining time series data similarity information based on integrated model | |
CN110555051B (en) | Product test abnormal behavior detection system based on behavior sequence analysis | |
CN118211882B (en) | Product quality management system and method based on big data | |
CN112598666B (en) | Cable tunnel anomaly detection method based on convolutional neural network | |
CN114662613A (en) | Abnormal battery detection system and method based on elastic time series similarity network | |
CN117591986B (en) | Real-time automobile data processing method based on artificial intelligence | |
CN117540369B (en) | Numerical behavior safety base line generation method and device for safety analysis | |
CN112906672A (en) | Steel rail defect identification method and system | |
CN107067034B (en) | Method and system for rapidly identifying infrared spectrum data classification | |
Flotzinger et al. | Building inspection toolkit: Unified evaluation and strong baselines for damage recognition | |
CN114155914B (en) | Detection and correction system based on metagenome splicing errors | |
CN117714175A (en) | HTTP request smuggling hole detection method based on crowdsourcing test feedback improvement | |
CN114818116B (en) | Aircraft engine failure mode identification and life prediction method based on joint learning | |
CN115659271A (en) | Sensor abnormality detection method, model training method, system, device, and medium | |
KR102072894B1 (en) | Abnormal sequence identification method based on intron and exon | |
CN114076680B (en) | Engine assembly detection method, engine assembly detection system, storage medium and electronic equipment | |
CN114037941A (en) | Method and device for performing algorithm multi-data cross validation completion aiming at video target attributes | |
CN115240065A (en) | Unsupervised mismatching detection method based on reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |