CN112529031A

CN112529031A - Microseismic signal clustering method and device based on improved K-means

Info

Publication number: CN112529031A
Application number: CN202010734874.5A
Authority: CN
Inventors: 丁琳琳; 张明; 潘一山; 孙明馨; 陈泽; 刘媛媛; 刘丽; 侯俊敏
Original assignee: Xinwen Mining Group Co Ltd
Current assignee: Xinwen Mining Group Co Ltd
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2021-03-19
Anticipated expiration: 2040-07-28
Also published as: CN112529031B

Abstract

The invention discloses a microseismic signal clustering method and a microseismic signal clustering device based on improved K-means, wherein the method comprises the following steps: s1, transmitting the microseismic signal into a K-means controller to generate sample data and an initial clustering center; s2, calculating the DTW distance between the sample data and the initial clustering center; s3, comparing the DTW distance, labeling labels, performing first clustering, and transmitting the result obtained by the first clustering to a DBA updater to obtain a new clustering center; s4, comparing whether the initial clustering center is equal to the updated clustering center, if so, executing a step S5, and if not, executing a step S6; s5, outputting a clustering result; s6, returning to the steps S3-S4, and carrying out next clustering. The method is more suitable for microseismic signal clustering analysis, retains the characteristics of the waveform to the greatest extent, and has high processing efficiency and good reliability.

Description

Microseismic signal clustering method and device based on improved K-means

Technical Field

The invention relates to the technical field of data mining, in particular to a microseismic signal clustering method and device based on improved K-means.

Background

In the coal mine production process, the top and bottom plates are usually broken, and pressure relief and tunneling blasting are usually carried out, so that the rock body is influenced by external force and releases energy to generate vibration lasting for a period of time. Usually, the vibration wave with the vibration frequency less than 100Hz and the event energy of 103-1011J is called microseism. The microseismic signal generated in the deformation and fracture process of the coal rock has the characteristics of short time and quick mutation. The interference noise of external environment factors such as mine blasting, mechanical vibration and the like can also influence the waveform form of the microseismic signal to a certain extent, so that the signal distortion is caused. In summary, the propagation process of the microseismic signal is interfered by many factors, so it needs to be studied in depth.

With the increase of mining depth and the enlargement of the area of a goaf, mine microseisms occur more and more frequently. The microseismic monitoring system is widely applied to the mine field and mainly comprises an indoor cabinet type microseismic monitoring system, a field suspension type microseismic real-time monitoring system and a surface shallow WiFi microseismic real-time monitoring and positioning system. Microseismic monitoring systems continuously generate a large number of microseismic signals, often accompanied by a variety of noise signals. Due to the fact that the microseismic events are interfered more and present different types, the types of the microseismic events need to be judged no matter the scheme of processing the microseismic events or the scheme of reworking and reworking after the occurrence of the accident is formulated. At present, researchers at home and abroad gradually and deeply research microseismic signals, wavelet analysis and empirical mode decomposition methods have great progress in the aspect of extracting and researching the characteristics of the microseismic signals through the nonlinear non-stable characteristics of the microseismic signals, and microseismic waves can be realized by methods such as a sliding average method, an STA \ LTA method, a fractal method and the like. The proposed clustering device enables an efficient clustering of microseismic signals. The microseismic signal clustering analysis is supported by the theory of a clustering algorithm, so that not only can the potential characteristics of the microseismic event be found, but also the subjectivity of manual division can be effectively reduced, and the activity rule and the distribution characteristics of the microseismic event can be found. Because the microseismic signal is a non-stationary nonlinear signal, the potential distribution characteristics of the microseismic signal are relatively difficult to find, and the traditional K-means algorithm can only analyze and cluster the data point set, the DTW (Dynamic Time Warping) distance for processing Time series data is combined with the K-means in the proposed device to cluster microseismic events. In summary, it has become a research hotspot and difficulty to mine useful information from a large number of microseismic signals and make a predictive judgment on microseismic events.

Disclosure of Invention

The invention mainly solves the technical problems in the prior art, and provides the microseismic signal clustering method and device based on the improved K-means, which are suitable for microseismic signal clustering analysis, high in processing efficiency and good in reliability.

The technical problem of the invention is mainly solved by the following technical scheme:

the invention provides a microseismic signal clustering method based on improved K-means, which comprises the following steps:

s1, transmitting the microseismic signal into a K-means controller to generate sample data and an initial clustering center;

s2, calculating the DTW distance between the sample data and the initial clustering center;

s3, comparing the DTW distance, labeling labels, performing first clustering, and transmitting the result obtained by the first clustering to a DBA updater to obtain a new clustering center;

s4, comparing whether the initial clustering center is equal to the updated clustering center, if so, executing a step S5, and if not, executing a step S6;

s5, outputting a clustering result;

s6, returning to the steps S3-S4, and carrying out next clustering.

Further, the step S1 includes the steps of:

and S0, carrying out manual labeling on the microseismic signal, wherein the manual labeling mode comprises changing the file name format.

Further, the step S1 includes: randomly generating parameters in advance by a K-means controller, wherein the parameters comprise the clustering number m and the incoming microseismic signal sequence X₁-X_n。

Further, the step S2 includes:

s21 all microseismic signal sequence X of K-means controller by DTW vector machine₁-X_nVectorization is carried out;

s22, determining an initial clustering center according to the number of clusters, and vectorizing the initial clustering center;

s23, K-means controller controls each sample microseismic signal X_iVector and initial clustering center T_iThe vector is transmitted to a DTW stripper; the DTW stripper strips the X_iVector sum T_iThe vectors form a distance matrix Y_ijWherein the distance matrix Y_ijThe calculation formula of the euclidean distance d of each element in (a) is as follows:

d＝(X_i,i-T_i,j)²

in the formula, X_iiIs X_iThe ith point of the vector; t is_ijIs T_iThe jth point of the vector.

Further, the step S3 includes:

s31, DTW stripper distance matrix Y_ijAnd transmitting the distance to a DTW distance calculator, wherein the DTW distance calculator calculates an accumulated distance Y (i, j) on the premise that the best matching path is satisfied, and the calculation formula of the accumulated distance Y (i, j) is as follows:

Y(i,j)＝d(X_ii,T_ij)+min{Y(i-1,j-1),Y(i-1,j),Y(i,j-1)}

s32, obtaining Euclidean distances d between all sample microseismic signals and a clustering center by a K-means controller; sequencing according to the distance to obtain a first clustering effect;

s33, the controller of K-means will cluster the i-th class label data X_i(x_i1,x_i2,x_i3,x_i4,…,x_im) And let one of the sequences X_imAs an average sequenceTransmitting to the DBA updater;

s34, passage and X_imComparing, updating other label data to new sequence, averaging the new sequence as new clustering center, and repeating the above steps in DBA updater to obtain a relatively stable new clustering center T_iAnd new clustering center T_iAnd returning the data to the K-means controller.

Further, the best matching path includes:

a. from the distance matrix Y_ijIs matched to the end point Y (n, m);

b. distance matrix Y_ijThe front and the back points after the middle matching are in certain adjacent relation;

c. in the matching process, if the point Y (i, j) matches the next point, the next matching point is one of Y (i +1, j), Y (i, j +1) or Y (i +1, j +1), and the next matching point is selected by: respectively calculating Euclidean distances d between Y (i +1, j), Y (i, j +1), Y (i +1, j +1) and Y (i, j), and selecting the point with the minimum Euclidean distance d.

Further, in step S4, the comparing whether the initial cluster center and the updated cluster center are equal includes:

the K-means controller transmits the clustering center obtained by the DBA updater to the clustering generator for storage each time;

the clustering generator obtains the next clustering center T_iWith the last cluster center T_i-1And comparing to judge whether the two clustering centers are completely consistent.

The invention provides a microseismic signal clustering device based on improved K-means, which comprises: the system comprises a K-means controller, a DTW vector machine, a DTW stripper, a DTW distance calculator, a DBA updater and a cluster generator, wherein the K-means controller is respectively connected with the DTW vector machine, the DTW stripper, the DTW distance calculator, the DBA updater and the cluster generator; wherein,

the DTW vector machine is used for converting the sample data input from the K-means controller and the clustering center sequence into vectorization;

DTW stripper for stripping each sampleThe microseismic signal X_iVector and initial clustering center T_iVector transmission forms a distance matrix Y_ij；

A DTW distance calculator for calculating the distance matrix Y on the premise of satisfying the optimal matching path_ijThe cumulative distance Y (i, j);

the DBA updater is used for updating a new clustering center according to the clustering result;

and the clustering generator is used for outputting a clustering result.

And the K-means controller is respectively connected with the DTW vector machine, the DTW stripper, the DTW distance calculator, the DBA updater and the clustering generator.

The invention has the beneficial effects that: by considering the characteristic that the microseismic signals have time sequences, adopting DTW distance as similarity measurement between samples, improving a cluster center updating strategy of a K-means algorithm, and adopting a DBA (direct binary array) centroid updating method combined with DTW characteristics, the updating process of the cluster center is iteratively optimized, so that the microseismic signal cluster analysis method is more suitable for microseismic signal cluster analysis, waveform characteristics are retained to the greatest extent, and the microseismic signal cluster analysis method is high in processing efficiency and good in reliability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a graph of the clustering results of the improved K-means based microseismic signal clustering method of the present invention;

FIG. 2 is a flow chart of a microseismic signal clustering method based on improved K-means;

FIG. 3 is a schematic structural diagram of the improved K-means-based microseismic signal clustering device of the present invention;

FIG. 4 is an exemplary waveform of a set of microseismic signals of various features of the present invention;

FIG. 5 is a schematic structural diagram of a microseismic signal clustering device based on improved K-means according to a first embodiment of the present invention;

FIG. 6 is a flowchart of a microseismic signal clustering method based on improved K-means according to a first embodiment of the present invention;

fig. 7 is a schematic diagram of DTW stripping vectors according to a first embodiment of the invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.

Mine microseisms are vibrations of internal energy build-up caused by mine rock fractures or fluid disturbances. Referring to fig. 1, the first category is microseismic events generated by the clustering device, and the second category is random noise generated by organic and man-made objects during the mining area production process.

Referring to fig. 2, the microseismic signal clustering method based on the improved K-means of the present invention comprises the following steps:

s5, outputting a clustering result;

s6, returning to the steps S3-S4, and carrying out next clustering.

According to the method, the characteristic that the microseismic signals have time sequences is considered, the DTW distance is used as similarity measurement between samples, the cluster center updating strategy of a K-means algorithm is improved, and the DBA centroid updating method combined with the DTW characteristic is adopted to iteratively optimize the updating process of the cluster center, so that the method is more suitable for microseismic signal cluster analysis, the characteristics of waveforms are retained to the greatest extent, and the method is high in processing efficiency and good in reliability.

Specifically, step S1 of the present invention includes the steps of:

and S0, carrying out manual labeling on the microseismic signal, wherein the manual labeling mode comprises changing the file name format. This allows a better confirmation of the clustering result. For example, the starting name of the file name is named according to the clustering result, the naming mode is only used for obtaining the clustering result for comparison, and the clustering of the K-means controller has no other influence.

Step S1 of the present invention includes: using the principle of K-means, a K-means controller randomly generates parameters in advance, wherein the parameters comprise the clustering number m and the transmitted microseismic signal sequence X₁-X_n(X1,X₂,X₃,…,X_n)。

Step S2 of the present invention includes:

s22, determining an initial clustering center according to the number of clusters, and vectorizing the initial clustering center; specifically, according to the number m of clusters, m sample microseismic signals are used as initial clustering centers by adopting a manual appointed method, so that the clustering average accuracy of the microseismic signals can be improved, and the initial clustering centers are also subjected to vectorization through a DTW (delay tolerant W) vector machine.

S23, K-means controller controls each sample microseismic signal X_iVector and initial clustering center T_iThe vector is transmitted to a DTW stripper; DTW stripper for removing X_iVector sum T_iThe vectors form a distance matrix Y_ijWherein the distance matrix Y_ijThe euclidean distance d of each element in (a) is calculated as follows:

d＝(X_i,i-T_i,j)²

Step S3 of the present invention includes:

s31, DTW stripper distance matrix Y_ijAnd transmitting the distance to a DTW distance calculator, wherein the DTW distance calculator calculates the accumulated distance Y (i, j) on the premise of meeting the best matching path, and the calculation formula of the accumulated distance Y (i, j) is as follows:

Y(i,j)＝d(X_ii,T_ij)+min{Y(i-1,j-1),Y(i-1,j),Y(i,j-1)}

s33, the controller of K-means will cluster the i-th class label data X_i(x_i1,x_i2,x_i3,x_i4,…,x_im) And let one of the sequences X_imAs an average sequence to the DBA updater;

s34, passage and X_imComparing, updating other label data to new sequence, averaging the new sequence as new clustering center, and repeating the above steps in DBA updater to obtain a stable new clustering center T_iAnd new clustering center T_iAnd returning the data to the K-means controller.

The best matching path in the invention comprises:

a. boundary criterion: assuming that the two vectors to be matched are (n, m) in length, respectively, it is necessary to derive from the distance matrix Y_ijIs matched to the end point Y (n, m);

b. continuity criterion: distance matrix Y_ijThe front and the back points after the matching in the step (b) must be in an adjacent relationship (the adjacent relationship includes a diagonal relationship); such that the entire matching best path is a continuous path.

c. Monotonicity principle: in the matching process, if the point Y (i, j) matches the next point, the next matching point is one of Y (i +1, j), Y (i, j +1) or Y (i +1, j +1), and the next matching point is selected by: respectively calculating Euclidean distances d between Y (i +1, j), Y (i, j +1), Y (i +1, j +1) and Y (i, j), and selecting the point with the minimum Euclidean distance d.

In step S4 of the present invention, comparing whether the initial cluster center and the updated cluster center are equal includes:

obtaining the next cluster center T in the cluster generator_i(T_i,1,T_i,2,…,T_i,m) Time of day and last clustering center T_i-1(T_i-1,1,T_i-1,2,…,T_i-1,m) And comparing whether the two clustering centers are completely consistent, finishing clustering if the two clustering centers are completely consistent, generating a clustering result by the clustering generator, and transmitting the result to the K-means controller.

Referring to fig. 3, the microseismic signal clustering device based on the improved K-means of the present invention comprises: the K-means controller is respectively connected with the DTW vector machine, the DTW stripper, the DTW distance calculator, the DBA updater and the cluster generator; wherein,

a DTW stripper for stripping each sample microseismic signal X_iVector and initial clustering center T_iVector transmission forms a distance matrix Y_ij；

and the clustering generator is used for outputting a clustering result.

The first embodiment is as follows:

referring to fig. 4-6, first, 200 microseismic signals are randomly extracted from the mine area, and the waveform diagram of each microseismic signal is similar to that shown in fig. 4. Each microseismic signal waveform is actually a time series of thousands of samples combined.

Each microseismic signal is converted into a vector form through a DTW vector machine and stored, and the storage form is X₁,X₂,X₃,…,X_n. The DTW vector machine will send these vectors back to the K-means controller for storage. According to the clustering number K, K sample microseismic signal vectors are appointed to be used as primary clustering center vectors T₁,T₂,…,T_k。

Then, each microseismic signal X is respectively transmitted_iWith each cluster center T_kThe DTW distance matrix is calculated by the DTW stripper, as shown in FIG. 7, by X₁,X₂,X₃,…,X_nAnd the clustering center T₁For example, the following steps are carried out: respectively obtaining distance matrixes Y₁₁,Y₂₁,…,Y_n1. In matrix Y₁₁For example, each point Y in the matrix₁₁(i, j) are both microseismic signals X₁The ith value and the cluster center T₁In the case of the microseismic signal X, the Euclidean distance d between the j-th values in (1)₁Is {1,2,1,3,4}, cluster center T₁Is {4,2,6,2,1}, the distance is calculated according to the principle of DTW, and the obtained Y is₁₁The matrix form is:

after all the distance matrixes Y are obtained, the distance matrixes are transmitted into a DTW distance calculator, and the DTW distance between corresponding sample data and the clustering center is calculated by a method of matching the optimal path.

With Y₁₁The matrix is taken as an example: from Y₁₁The first element in the upper left corner starts matching, and the smallest element is selected from the neighboring elements {0,1,4} for matching, at which point the cumulative distance V₁₁9+ 0-9. Then, matching is continued from this position, matching is performed in the adjacent elements {1,25,16}, andat this time, the accumulated distance V₁₁9+0+1 is 10. Then, the matching is continued from the position, and the matching is performed in the adjacent elements {1,25,9}, and then the distance V is accumulated₁₁9+0+1+1 11. Then, the matching is continued from the position, and the matching is performed in the adjacent elements {4,4,9}, and then the distance V is accumulated₁₁9+0+1+ 4 15. Since the best match path is followed here, the closest path, i.e. the location 4 of the object line, is selected. Matching until the final accumulated distance is V₁₁9+0+1+1+4+4+ 9-28. Matching out the best path according to the DTW principle, and calculating by Y'₁₁Shown in the figure:

all DTW distances obtained by the method are transmitted into a K-means controller, and the first clustering classification is carried out according to the principle of proximity. And forming a new time sequence on the basis of the optimal matching relation obtained by DTW, obtaining an average sequence in the cluster on the basis, taking the average sequence as a new intra-class center, repeating the operation on the samples in the cluster, and updating the cluster center through iteration.

The idea of DBA update of the invention is as follows:

(1) according to the principle of matching the optimal path of the DTW, the obtained sequences A, B and C are the optimal matching sequences obtained by the DTW distance calculator, if A is the initial designated average sequence, B and C are the sequences of the samples in the cluster.

Firstly, B and C are respectively matched with A according to the best matching idea.

(2) After the optimal matching relation is obtained, the optimal matching relation is merged and split according to the ideas of one-to-many and many-to-one, and since A is an average sequence, A does not change, and only the sequences of B and C are changed.

It is stated here that the matching distance between a and C is used, and <2> in a corresponds to <1,1> in C before matching, so that the <1,1> in C needs to be merged to obtain <1> according to the one-to-many principle, and for <3> in C corresponding to <0,0> in a, the <3> in C is split to obtain <3,3> according to the one-to-many principle.

(3) After the B and C sequences are updated, the operation of updating the average sequence (the average sequence here is the cluster center) is performed next, that is, the average of all the corresponding elements of the sequences is performed to obtain a new average sequence Arg <1/3,4/3,21/3,11/9,1 >. The operations in 1 and 2 are then repeated to match a, B and C to Arg, respectively, to update the average sequence until the cluster center convergence no longer changes.

A＝<0,2,10,0,0>

B＝<1,1,9,2/3,0>

C＝<0,1,2,3,3>

Arg＝<1/3,4/3,21/3,11/9,1>

(4) At this moment, the first clustering is finished, but the clustering effect does not meet the requirement, so after a new clustering center is obtained, the new clustering center is put into a K-means controller to repeat the cycle, the initial operation is carried out, until the clustering center obtained in the DBA updater does not change, the clustering end is marked, and at this moment, the clustering result is output by the clustering generator.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not thought of through the inventive work should be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope defined by the claims.

Claims

1. A microseismic signal clustering method based on improved K-means is characterized by comprising the following steps:

s5, outputting a clustering result;

s6, returning to the steps S3-S4, and carrying out next clustering.

2. The improved K-means based microseismic signal clustering method of claim 1 wherein the step S1 is preceded by the steps of:

3. The improved K-means based microseismic signal clustering method of claim 1 wherein the step S1 comprises: randomly generating parameters in advance by a K-means controller, wherein the parameters comprise the clustering number m and the incoming microseismic signal sequence X₁-X_n。

4. The improved K-means based microseismic signal clustering method of claim 3 wherein the step S2 comprises:

s23, K-means controller will everyOne sample microseismic signal X_iVector and initial clustering center T_iThe vector is transmitted to a DTW stripper; the DTW stripper strips the X_iVector sum T_iThe vectors form a distance matrix Y_ijWherein the distance matrix Y_ijThe calculation formula of the euclidean distance d of each element in (a) is as follows:

d＝(X_i,i-T_i,j)²

5. The improved K-means based microseismic signal clustering method of claim 4 wherein the step S3 comprises:

Y(i,j)＝d(X_ii,T_ij)+min{Y(i-1,j-1),Y(i-1,j),Y(i,j-1)}

6. The improved K-means based microseismic signal clustering method of claim 5 wherein the best matching path comprises:

a. from the distance matrix Y_ijIs matched to the end point Y (n, m);

7. The improved K-means based microseismic signal clustering method of claim 6 wherein the step of comparing S4 whether the initial cluster center and the updated cluster center are equal comprises:

8. An apparatus employing the improved K-means based microseismic signal clustering method of any one of claims 1-7 comprising: the system comprises a K-means controller, a DTW vector machine, a DTW stripper D, TW distance calculator, a DBA updater and a cluster generator, wherein the K-means controller is respectively connected with the DTW vector machine, the DTW stripper, the DTW distance calculator, the DBA updater and the cluster generator; wherein,

DTW distance calculator for being fullOn the premise of sufficiently matching the path, calculating a distance matrix Y_ijThe cumulative distance Y (i, j);

a clustering generator for outputting a clustering result;

and the K-means controller is electrically connected with the DTW vector machine, the DTW stripper, the DTW distance calculator, the DBA updater and the clustering generator respectively.