CN115510302A

CN115510302A - Intelligent factory data classification method based on big data statistics

Info

Publication number: CN115510302A
Application number: CN202211432355.9A
Authority: CN
Inventors: 冯璟煕; 陈柏林; 乔迁
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2022-12-23
Anticipated expiration: 2042-11-16
Also published as: CN115510302B

Abstract

The invention relates to the field of data processing, in particular to an intelligent factory data classification method based on big data statistics.

Description

Intelligent factory data classification method based on big data statistics

Technical Field

The application relates to the field of data processing, in particular to an intelligent factory data classification method based on big data statistics.

Background

With the continuous development of intelligent technology, intelligent monitoring and intelligent management are vigorously developed for various industries, for example, in various large-scale plants, digital intelligent monitoring is realized for the operation monitoring of the plants, that is, the operation abnormality of the plants is reflected by the abnormality of monitoring data. However, due to the fact that the operation data of the plant is increasing due to long-term operation of the plant, a large amount of data analysis is needed in the data analysis of the operation monitoring, and therefore in order to facilitate quick acquisition of abnormal data in the plant operation monitoring, data classification needs to be performed according to the abnormality of the original data, namely, the abnormal data and the normal data are classified and stored, so that the data are needed to be analyzed in an abnormal manner.

In the abnormal analysis of the data, the difference between the data and the distribution density of the data are mainly utilized, for example, in the existing clustering algorithm, but the clustering only aims at the size difference of the data, and the abnormality of the data with the change trend cannot be reflected well, so that the abnormality of the plant operation data cannot be judged accurately. The abnormal degree of the data is determined by respectively determining the overall data distribution relation and the difference of the data on the time sequence, wherein the overall abnormal score of the data is analyzed by using a CBLOF algorithm, but the conventional CBLOF algorithm clustering excessively depends on the distinguishing of the size clusters, the characteristic of the clustering is neglected, the abnormal score of the data is single and the reliability is not high, so the final abnormal score is determined by combining the size of the clustering and the time span of the data in the clustering.

Disclosure of Invention

In order to solve the technical problem, the invention provides an intelligent factory data classification method based on big data statistics, which comprises the following steps:

acquiring an intelligent factory data sequence formed by intelligent factory data, and obtaining a plurality of clusters according to the intelligent factory data sequence;

obtaining the time span of the cluster to which each intelligent factory data belongs according to each cluster, and obtaining the abnormal score of each intelligent factory data according to a plurality of clusters, the time span and the number of data contained in the clusters; obtaining a plurality of time windows according to the intelligent factory data sequence; obtaining the difference of each smart factory data relative to the time window according to the difference of the adjacent data in the time window and the abnormal score; obtaining a first abnormal degree of each smart factory data according to the difference of each smart factory data relative to each time window; obtaining a second abnormal degree of each intelligent factory data according to the first abnormal degree and the abnormal score of the intelligent factory data;

and obtaining an abnormal data set and a normal data set according to the intelligent factory data sequence and the second abnormal degree of each intelligent factory data, and performing distributed storage on the abnormal data set and the normal data set.

Preferably, the method for obtaining the abnormal score of each smart plant data according to the plurality of clusters, the time span and the number of data included in the cluster includes:

acquiring the number of data contained in the cluster to which the intelligent factory data belongs, and recording the number of the data as the first number of the clusters to which the intelligent factory data belongs; acquiring the number of data contained in each cluster and recording the number as a second number, and acquiring the maximum value of the second number of all clusters and recording the maximum number; acquiring the distance between the data of each intelligent factory in each cluster and the center of the cluster to which the data belongs, and recording the distance as a first distance;

and obtaining the abnormal score of each intelligent factory data according to the time span, the first number, the maximum number and the first distance of the cluster to which each intelligent factory data belongs.

Preferably, the formula for obtaining the abnormal score of each smart factory data according to the time span, the first number, the maximum number and the first distance of the cluster to which each smart factory data belongs is as follows:

wherein,

the maximum number is represented by the number of the cells,

denotes the first

A first number of clusters to which the individual smart factory data belongs,

is shown as

The time span of the cluster to which the individual smart factory data belongs,

is shown as

A first distance of the intelligent factory data,

is shown as

Abnormal scores of intelligent plant data.

Preferably, the method for obtaining the difference of each smart plant data relative to the time window according to the difference of the neighboring data in the time window and the abnormal score includes:

the method comprises the steps of obtaining a plurality of time windows of each intelligent factory data, calculating a time difference value between each intelligent factory data and each data in each time window, obtaining a plurality of adjacent data of each intelligent factory data in each time window, obtaining a standard deviation of each intelligent factory data in each time window according to the time difference value, and obtaining a difference of each intelligent factory data relative to each time window according to the adjacent data of each time window, an abnormal score of each adjacent data and the standard deviation of each time window of each intelligent factory data, namely the difference of each intelligent factory data relative to each time window.

Preferably, the formula for obtaining the difference of each smart plant data relative to each belonging time window according to the respective neighboring data of each belonging time window of each smart plant data, the anomaly score of each neighboring data and the standard deviation of each belonging time window of each smart plant data is as follows:

wherein,

is shown as

The data of the intelligent factory is stored in the database,

is shown as

The first of the smart factory data

To the first of the time windows

The number of the adjacent data is one,

is shown as

The first of the smart factory data

To the first of the time windows

A normalized value of the anomaly score for each of the neighboring data,

is shown as

The first of the smart factory data

The number of adjacent data is contained in each time window,

is shown as

The first of the intelligent factory data

The standard deviation of the individual time windows to which it belongs,

denotes the first

The smart factory data relative to

The variability of the individual time windows.

Preferably, the method for obtaining the first abnormal degree of each smart factory data according to the difference of each smart factory data relative to each time window includes:

the method comprises the steps of obtaining a plurality of time affiliated time windows of each smart factory data, obtaining a standard deviation of each affiliated time window of each smart factory data by using data in each affiliated time window of each smart factory data, and obtaining a first abnormal degree of each smart factory data according to the difference of each smart factory data relative to each time window and the standard deviation of each affiliated time window of each smart factory data.

Preferably, the formula for obtaining the first abnormal degree of each smart plant data according to the difference of each smart plant data relative to each time window and the standard deviation of each smart plant data in each time window is as follows:

wherein,

is shown as

The intelligent factory data relative to the first

The difference of the time windows to which it belongs,

is shown as

The number of the time windows of the intelligent factory data,

is shown as

Standard deviation of the variability of the individual smart factory data over all time windows,

denotes the first

A first degree of anomaly of the intelligent plant data.

Preferably, the method for obtaining the time span of the cluster to which each smart plant data belongs according to each cluster includes:

and forming a data pair by any two intelligent factory data in the cluster to which the intelligent factory data belong, calculating the time difference of the two intelligent factory data in each data pair, and obtaining the time span of the cluster to which each intelligent factory data belongs according to the time difference of all the data pairs in the cluster to which the intelligent factory data belong.

The embodiment of the invention at least has the following beneficial effects: firstly, reflecting the possibility of the cluster itself having abnormality according to the size of the cluster, and highlighting the influence of the cluster size on the data abnormality; and then, judging the influence relationship among the data of the same cluster according to the time sequence span of the data contained in the cluster, namely, considering the influence of the data time sequence relationship on data abnormity judgment, and performing more accurate data abnormity judgment.

Then, the relative difference of the data is determined according to the difference of the time sequence data and the relative relation of the window data in the calculation window on the time sequence, the data difference abnormity caused by the larger difference between the trend change data is avoided, the influence of the abnormal score of other data on the window calculation is considered in the window calculation, and the influence of abnormal data in the window on the abnormal judgment of other data is avoided.

Moreover, in the abnormal degree of the data time sequence, the influence of the data abnormal score on the judgment of the data time sequence abnormality is introduced, the common influence of the data abnormal score and the judgment of the data time sequence abnormality is strengthened, the final abnormal degree of the data is obtained and is used as the basis for data abnormality classification, namely, the data is classified more accurately.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart of an intelligent plant data classification method based on big data statistics according to the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description, structures, features and effects of the method for classifying intelligent plant data based on big data statistics according to the present invention are provided with the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the intelligent factory data classification method based on big data statistics in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart illustrating steps of a big data statistics-based intelligent plant data classification method according to an embodiment of the present invention is shown, where the method includes the following steps:

and S001, acquiring data to obtain an intelligent factory data sequence, and obtaining a plurality of clusters according to the intelligent factory data sequence.

1. Collecting data:

in order to control the running state of the factory in time in the intelligent factory, real-time running monitoring needs to be carried out, and then in order to facilitate data analysis, the generated data needs to be transmitted to a unified data management platform for analysis management. The data that this scheme was gathered are the operation monitoring data of wisdom mill for the data that the unified data management platform of mill related.

Arranging the collected intelligent factory operation monitoring in time sequence to obtain an intelligent factory data sequence, and calling each data in the intelligent factory data sequence as intelligent factory data, such as vibration data of equipment engine, temperature data of equipment, etc

2. Obtaining a plurality of clusters according to the intelligent factory data sequence:

clustering each data in the intelligent factory data sequence by using a K-Means clustering algorithm to obtain

Cluster, in the scheme

And taking 10, and taking the average value of all data in each cluster as the central data of each cluster.

And step S002, obtaining the abnormal score of each intelligent factory data according to the clusters.

The intelligent factory operation data is mainly used for monitoring and analyzing the intelligent factory operation abnormity, the intelligent factory operation abnormity is mainly reflected in data abnormity, and abnormal data needs to be frequently called in the factory abnormity monitoring analysis, so that the abnormal data is called for convenience in abnormity analysis.

First, the amount of the intelligent factory data contained in all clusters is obtained

And K represents the number of clusters, then the current cluster number is arranged from small to large, the first W clusters in the cluster sequence are selected as small clusters, W =3 is set in the invention, and the rest clusters are large clusters.

The obtained small clusters are respectively represented as

The large clusters are respectively represented as

Where n represents the number of small clusters,

indicating the number of large clusters.

Will find the smart factory data in the sequence

Forming a data pair by any two intelligent factory data in a cluster to which the intelligent factory data belongs, calculating the time difference of the two intelligent factory data in each data pair, and calculating the time difference of the two intelligent factory data in the data pair

The maximum value of the time difference of all data pairs in the cluster to which the intelligent factory data belongs is taken as the first

Time span of clustering to which intelligent factory data belongs

。

Get the first

The number of data contained in the cluster to which the smart factory data belongs is recorded as

First number of clusters to which smart factory data belongs

(ii) a Acquiring the number of data contained in each cluster and recording the number as a second number, and acquiring the maximum value of the second numbers of all clusters and recording the maximum number

(ii) a When it comes to

When the cluster of the intelligent factory data is a big cluster, the first cluster is

The Euclidean distance between the data of the intelligent factory and the data of the cluster center is recorded as the first

First distance of smart factory data

(ii) a When it comes to

When the cluster to which the data of the smart factory belongs is a small cluster, the first cluster is obtained

The Euclidean distance between the data of the intelligent factory and the data of each large cluster center is recorded as the first

First distance of smart factory data

。

According to the first

Time span of cluster to which smart factory data belongs

First number of

Maximum number of cells

And a first step of

First distance of intelligent factory data

To obtain the first

Number of intelligent plantsAccording to the abnormal score:

wherein,

the maximum number is indicated by the number of bits,

denotes the first

A first number of clusters to which the smart factory data belongs,

is shown as

The relative size of the number of the cluster data to which the smart factory data belongs indicates the number of the cluster data

The smaller the number of the cluster data to which the intelligent factory data belongs, the higher the possibility of reflecting the abnormality of the cluster itself to which the data belongs, so

The larger the anomaly score of the individual intelligent plant data,

denotes the first

The larger the value of the time span of the cluster to which the intelligent factory data belongs, the larger the time sequence span of the data in the cluster to which the data belongs, the smaller the influence relationship among the data, and the larger the possibility of the data in the cluster to which the data belongs to have abnormity, namely the abnormal score of the dataThe larger the number is,

denotes the first

A first distance of the intelligent factory data, the larger the value is, the more the first distance is

The greater the distance of the individual smart factory data from the cluster center, i.e., the first

The data of an intelligent factory is different from most of the data, thereby

The larger the anomaly score of the individual intelligent plant data,

is shown as

Abnormal scores of individual smart factory data.

Obtaining the abnormal score of each intelligent factory data, and when analyzing the abnormal score of each intelligent factory data, firstly reflecting the possibility of the cluster itself having abnormality according to the size of the cluster aiming at the abnormal score, and highlighting the influence of the cluster size on the data abnormality; and then, judging the influence relationship among the same cluster data according to the time sequence span of the data contained in the cluster, namely, considering the influence of the data time sequence relationship on data abnormity judgment, and performing more accurate data abnormity judgment, namely obtaining more accurate data abnormity scores, thereby performing more accurate classification on the factory data.

Step S003, a second abnormal degree of each intelligent factory data is obtained according to the plurality of clusters and the abnormal score of each intelligent factory data.

1. Calculate the variance of each smart plant data against each time window:

the intelligent factory data may have stable and unchangeable data and may also have data with a certain trend change, so that the abnormal degree of the final data needs to be determined according to the change relation of the data on the time sequence at the moment, and the abnormal degree is used as a classification basis for the final abnormal data.

Is set to a size of

Time window of (2), the scheme

And taking 40, sliding in the intelligent factory data sequence by using the time window with the step length of 1, wherein each sliding corresponds to one time window, and a plurality of time windows are obtained in the sliding process.

Obtaining includes the first

Personal wisdom factory data

All time windows of (2) are denoted as

A plurality of time windows of the intelligent factory data;

calculate the first

Personal intelligent factory data

And a first

The time difference between each data in the belonged time window is

Each data in each time window is according to time differenceThe values are arranged from small to large, and the time difference value is obtained before

Data as the first

The first of the smart factory data

A plurality of adjacent data of the time window, in the scheme

Taking 10;

by using the first

Personal intelligent factory data

And a first

Calculating standard deviation of all data in the time window as the second

The smart factory data and

standard deviation of the associated time window

。

According to the first

A plurality of time windows, the first of the intelligent factory data

Number of intelligent plantsAccording to a plurality of adjacent data and standard deviation of each time window

Variability of individual smart factory data versus time window:

wherein,

denotes the first

The data of the intelligent factory is stored in the database,

is shown as

The first of the smart factory data

To the first of the time windows

The number of the adjacent data is one,

denotes the first

The smart factory data and

the first of the smart factory data

To the first of the time windows

A difference between adjacent data, the larger the value is, the more the second

The difference between the value of the smart factory data and the value of the data adjacent to the time sequence is large,

is shown as

The first of the smart factory data

To which the second time window belongs

A normalized value of the anomaly score of the neighboring data, a larger value indicating a larger anomaly score of the neighboring data, and using the data as a reference

When the intelligent factory data is analyzed, the smaller the reference value of the data,

is shown as

The first of the intelligent factory data

The number of adjacent data is contained in each time window,

is shown as

An intelligent factoryFirst of data

The standard deviation of each time window is larger, the larger the standard deviation value is, the larger the data difference in the window is

The less the variance of the smart factory data with respect to the window,

denotes the first

The intelligent factory data relative to the first

The variability of the individual time windows.

2. Calculating a first degree of anomaly of each smart plant data:

data in time series, each smart factory data may exist in a plurality of associated time windows, i.e. data

With multiple relative disparities with respect to the window data, now for

Relative differences in a plurality of associated time windows are determined

First degree of abnormality in time series:

wherein,

denotes the first

The intelligent factory data relative to the first

The difference of the time windows, the larger the value, the second indication

The greater the degree of abnormality of the individual intelligent plant data,

denotes the first

The number of the time windows of the intelligent factory data,

to represent

In that

Relative difference mean of each belonged time window is integrally reflected

The relative difference in the time series of the samples,

the larger the size of the tube is,

the greater the degree of abnormality in the time series,

is shown as

Standard deviation of the variance of the individual smart factory data for all time windows,

is shown as

A first degree of anomaly of the smart factory data.

When the first abnormal degree is determined, the relative difference of the data is determined according to the difference of time sequence data and the relative relation of window data in a time window on a time sequence, the abnormal data difference caused by the large difference of trend change data is avoided, the influence of abnormal scores of other data on window calculation is considered during window operation, and the influence of abnormal score data in the window on the abnormal judgment of other data is avoided; and finally, obtaining the abnormal degree of the final data on the time sequence through the relative difference of a plurality of calculation windows where the data are positioned, wherein the local abnormality of the data is further reflected by considering the difference of the relative difference of the plurality of windows.

3. Calculating a second abnormal degree of each intelligent factory data:

is combined with

Abnormal score of individual smart factory data and

the first abnormal degree of the intelligent factory data is judged

The second degree of anomaly of the smart factory data is:

denotes the first

The abnormal score of the data of the intelligent factory,

is shown as

The greater the first degree of abnormality of the intelligent factory data,

second degree of abnormality of

The larger.

And obtaining a second abnormal degree of the data of each intelligent factory, obtaining a data abnormal score and the abnormal degree of the data on a time sequence by combining the CBLOF algorithm and the data time sequence change during the abnormal degree analysis, and judging the data abnormality from the overall data distribution and the data time sequence. And in the abnormal degree of the data time sequence, the influence of the data abnormal score on the abnormal judgment of the data time sequence is introduced, the common influence of the data abnormal score and the abnormal judgment of the data time sequence is strengthened, the final abnormal degree of the data is obtained and is used as the basis for data abnormal classification, namely, the data is classified more accurately.

And step S004, obtaining an abnormal data set and a normal data set according to the second abnormal degree of each intelligent factory data, and performing distributed storage on the abnormal data set and the normal data set.

Arranging the intelligent factory data according to the second abnormal degree from large to small

The set formed by the intelligent factory data is used as an abnormal data set, and the intelligent factory data sequence is processedAnd the set formed by the remaining intelligent factory data in the column is used as a normal data set.

And the abnormal data set and the normal data set are stored in a distributed manner, so that the abnormal data can be inquired and called quickly.

In summary, the embodiment of the present invention provides an intelligent factory data classification method based on big data statistics, which reflects the possibility that a cluster itself has an abnormality according to the size of the cluster, and highlights the influence of the cluster size on data abnormality; and then, judging the influence relationship among the data of the same cluster according to the time sequence span of the data contained in the cluster, namely considering the influence of the data time sequence relationship on the data abnormity judgment, and performing more accurate data abnormity judgment. Then, the relative difference of the data is determined according to the difference of the time sequence data and the relative relation of the window data in the calculation window on the time sequence, the data difference abnormity caused by the larger difference between the trend change data is avoided, the influence of the abnormal score of other data on the window calculation is considered in the window calculation, and the influence of abnormal data in the window on the abnormal judgment of other data is avoided. And in the abnormal degree of the data time sequence, the influence of the data abnormal score on the abnormal judgment of the data time sequence is introduced, the common influence of the data abnormal score and the abnormal judgment of the data time sequence is strengthened, the final abnormal degree of the data is obtained and is used as the basis for data abnormal classification, namely, the data is classified more accurately.

It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. And specific embodiments thereof have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit of the present invention are intended to be included therein.

Claims

1. The intelligent factory data classification method based on big data statistics is characterized by comprising the following steps:

2. The intelligent plant data classification method based on big data statistics as claimed in claim 1, wherein the method for obtaining the abnormal score of each intelligent plant data according to the plurality of clusters, the time span and the number of data included in the cluster comprises:

acquiring the number of data contained in the cluster to which the intelligent factory data belongs, and recording the number of the data as the first number of the clusters to which the intelligent factory data belongs; acquiring the number of data contained in each cluster and recording the number as a second number, and acquiring the maximum value of the second number of all clusters and recording the maximum number; acquiring the distance between each intelligent factory data in each cluster and the center of the cluster to which the intelligent factory data belongs, and recording the distance as a first distance;

3. The intelligent plant data classification method based on big data statistics as claimed in claim 2, wherein the formula for obtaining the abnormal score of each intelligent plant data according to the time span, the first number, the maximum number and the first distance of the cluster to which each intelligent plant data belongs is as follows:

wherein,

the maximum number is indicated by the number of bits,

denotes the first

A first number of clusters to which the smart factory data belongs,

is shown as

The time span of the cluster to which the intelligent factory data belongs,

is shown as

A first distance of the smart factory data,

is shown as

Abnormal scores of individual smart factory data.

4. The intelligent big data statistics-based plant data classification method of claim 1, wherein the method for obtaining the variance of each intelligent plant data with respect to a time window according to the variance of neighboring data in the time window and the anomaly score comprises:

5. The intelligent big data statistics-based plant data classification method according to claim 4, wherein the formula for obtaining the difference of each smart plant data relative to each belonging time window according to the respective neighboring data of each belonging time window of each smart plant data, the abnormal score of the respective neighboring data and the standard deviation of each belonging time window of each smart plant data is as follows:

wherein,

is shown as

The data of the intelligent factory is stored in the database,

is shown as

The first of the intelligent factory data

To the first of the time windows

The number of the adjacent data is one,

is shown as

The first of the smart factory data

To the first of the time windows

A normalized value of the anomaly score for each of the neighboring data,

denotes the first

The first of the smart factory data

The number of adjacent data is contained in each time window,

is shown as

The first of the smart factory data

The standard deviation of the individual time windows to which it belongs,

is shown as

The smart factory data relative to

The variability of the individual time windows.

6. The method of claim 1, wherein the method for obtaining the first abnormal degree of each smart plant data according to the difference of each smart plant data with respect to each time window comprises:

the method comprises the steps of obtaining a plurality of time affiliated time windows of each intelligent factory data, obtaining the standard deviation of each affiliated time window of each intelligent factory data by utilizing data in each affiliated time window of each intelligent factory data, and obtaining the first abnormal degree of each intelligent factory data according to the difference of each intelligent factory data relative to each time window and the standard deviation of each affiliated time window of each intelligent factory data.

7. The intelligent plant data classification method based on big data statistics as claimed in claim 6, wherein the formula for obtaining the first anomaly degree of each smart plant data according to the variance of each smart plant data with respect to each time window and the standard deviation of each smart plant data belonging to each time window is:

wherein,

denotes the first

The smart factory data relative to

The difference of the time windows to which each belongs,

is shown as

The number of the time windows of the intelligent factory data,

denotes the first

is shown as

A first degree of anomaly of the smart factory data.

8. The intelligent big data statistics-based plant data classification method of claim 1, wherein the method for obtaining the time span of the cluster to which each intelligent plant data belongs according to each cluster comprises: