CN118035885A

CN118035885A - Multi-feature fusion-based aliasing signal modulation identification method

Info

Publication number: CN118035885A
Application number: CN202410116526.XA
Authority: CN
Inventors: 王敬超; 张恬; 邓博于; 高伟
Original assignee: Institute of Systems Engineering of PLA Academy of Military Sciences
Current assignee: Institute of Systems Engineering of PLA Academy of Military Sciences
Priority date: 2024-01-26
Filing date: 2024-01-26
Publication date: 2024-05-14
Anticipated expiration: 2044-01-26
Also published as: CN118035885B

Abstract

The invention belongs to the technical field of electromagnetic signal processing, and particularly relates to an electromagnetic data analysis method based on deep learning. An aliasing signal modulation identification method based on multi-feature fusion comprises the following steps: A. preprocessing data; B. extracting artificial features; C. automatic feature extraction; D. feature fusion; E. classifying; according to the method, the characteristics that the electromagnetic signals generated by a low-rail electromagnetic system are more and mostly aliased signals are combined, an aliased signal identification task is converted into a multi-label classification task, the characteristics are fused by combining a manual characteristic extraction method with deep characteristics of the extracted signals of a deep neural network, corresponding weights are given to different characteristics by an attention mechanism, the most reliable classification result is obtained according to different signal to noise ratios, and the modulation type identification of the aliased signals is completed. Compared with the traditional signal modulation recognition scheme, the method is more suitable for practical application scenes and can obtain higher recognition accuracy.

Description

Multi-feature fusion-based aliasing signal modulation identification method

Technical Field

The invention belongs to the technical field of electromagnetic signal processing, and particularly relates to an electromagnetic data analysis method based on deep learning.

Background

In recent years, with the development of communication technology, global communication networks have become more complex. The high density of device deployment and the tension in spectrum resources make the problem of aliasing interference of signals within the same channel unavoidable. In this case, the signals in the co-channel interfere with each other, so that the receiver cannot accurately separate and identify the single signal during demodulation. Such interference results in time-frequency overlapping and spatial interleaving in spatial communications, while further exacerbating the complexity of the electromagnetic environment, thereby presenting significant challenges for the design and optimization of the communication system. Modulation identification is critical to efficiently managing such complex environments. In addition, modulation identification has important applications in radio spectrum management, scout monitoring, radio wave interference analysis, and the like. Thus, modulation identification of aliased electromagnetic signals is an important issue in wireless communications. The traditional method performs poorly when facing complex electromagnetic environment, and the artificial intelligent algorithm is used for signal processing at present, so that the method is an effective solution to the problems of low recognition efficiency, long calculation time and the like.

Disclosure of Invention

The purpose of the invention is that: based on the technical framework of a low-rail electromagnetic system and combining the characteristics of electromagnetic data, the method suitable for processing complex aliasing signals is provided.

The technical scheme of the invention is as follows: an aliasing signal modulation identification method based on multi-feature fusion comprises the following steps:

A. Preprocessing data;

A1. for data tags: the modulation mode is represented by multi-heat coding;

A2. For data values: converting the original data into standard normal distribution by adopting Z-score standardization;

B. Extracting artificial features;

selecting instantaneous features, power spectrum features and high-order cumulant features of signals from time domain features, frequency domain features and statistical domain features as manual extraction features; wherein:

expressing the instantaneous characteristics of the signal in three aspects of instantaneous amplitude, instantaneous frequency and instantaneous phase;

Obtaining the spectrum characteristics of the signals through Welch spectrum analysis;

Extracting a fourth-order cumulant of the signal from the high-order cumulant feature;

C. Automatic feature extraction;

for the original data and amplitude phase, extracting shallow automatic features by adopting a convolution layer, and extracting advanced features by an attention mechanism, a capsule layer and BiLSTM;

D. Feature fusion;

D1. manually extracting feature combination;

B, converting the instantaneous characteristic, the power spectrum characteristic and the high-order cumulant characteristic obtained in the step B into three characteristic vectors, namely Feature1, feature2 and Feature3, splicing the three characteristic vectors, and extracting the three characteristic vectors from the three characteristic vectors by adopting a deep neural network to obtain F ₂;

D2. Automatic feature combination of the original data and the amplitude phase;

b, expanding the shallow automatic features of the original data and the amplitude phase extracted by the convolution layer into a one-dimensional array, and then splicing to form F ₀;

Splicing the high-level features extracted from the original data and the amplitude phase in the step C, inputting the high-level features into a capsule network and a BiLSTM network to extract higher-level features so as to form a high-dimensional automatic feature F ₁;

D3. Multi-mode feature fusion;

Splicing and fusing the extracted artificial feature F ₂, the shallow automatic feature F ₀ and the high-dimensional automatic feature F ₁ into F _C;

D4. Learning a corresponding weight vector Q _c by adopting an attention mechanism to fuse F _C to form F _a;

E. Classifying;

And obtaining the probability of each category through a sigmoid classifier, selecting and outputting the optimal threshold value according to F1-score under different signal to noise ratios, and finally outputting the final classification result.

The beneficial effects are that: according to the invention, by combining the characteristics that the electromagnetic signals generated by a low-rail electromagnetic system are more and mostly aliased signals, the aliased signal identification task is converted into a multi-label classification task, the characteristics are fused by combining a manual extraction characteristic method with deep characteristics of the extracted signals of a deep neural network, corresponding weights are given to different characteristics by an attention mechanism, the most reliable classification result is obtained according to different signal-to-noise ratios, and the modulation type identification of the aliased signals is completed. Under the condition of signal aliasing, the modulation type identification of the signal is realized, and compared with the traditional signal modulation identification scheme, the method is more suitable for practical application scenes and can obtain higher identification accuracy.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a flow chart of step D1 of the present invention;

FIG. 3 is a flow chart of step D2 of the present invention;

FIG. 4 is a flow chart of step D3 of the present invention;

FIG. 5 is a flow chart of step D4 of the present invention;

fig. 6-8 are graphs comparing performance at various signal-to-noise ratios using the present method in the examples.

Detailed Description

Example 1: referring to fig. 1, an aliasing signal modulation identification method based on multi-feature fusion comprises the following steps:

A. Preprocessing data;

A1. For data tags: assuming that there are n modulation methods in total, the set tag is a one-dimensional array composed of 0 and 1, each index corresponds to one modulation method, and if the data contains a certain modulation method, the corresponding index value is 1. For example, there are 11 total modulation schemes for the received data, with possible values including ("BPSK", "QPSK", "8 PSK", "16 QAM"). In this case, multiple thermal codes are used to represent these modulation schemes, as in [1,0,0,0,1,0,1,0,0,1].

A2. For data values: preprocessing of data values is an important step in data mining and machine learning processes, the purpose of which is to convert raw data into a format more suitable for algorithmic processing. Since it is necessary to extract a plurality of features of a signal, there is a difference between the dimensions and values of the respective features. In the process of feature fusion, if the scale difference of the features is too large, the features with large scale may have an excessive influence on the result of the model. In this example, the Z-score normalization method was used during the data preprocessing.

Z-score Standardization (Standard): the Z-score normalization converts the raw data to a standard normal distribution, i.e., 0 as the mean and 1 as the standard deviation.

B. Extracting artificial features;

Selecting instantaneous features, power spectrum features and high-order cumulant features of signals from time domain features, frequency domain features and statistical domain features as manual extraction features; the specific implementation mode is as follows:

Given a signal sequence a (i), i=1, 2., N, N _s is the number of samples,

Average value of signal sequence:

Normalization of signal sequences: a _n(i)＝a(i)/m_a of the total number of the components,

Centering or shifting the already normalized signal sequence: a _cn(i)＝a_n (i) -1.

(1) Transient characteristics

The scheme expresses the instantaneous characteristics of the signal from three aspects of instantaneous amplitude, instantaneous frequency and instantaneous phase, and an expression formula of the instantaneous characteristics is given below.

① Normalized center instantaneous amplitude absolute standard deviation:

② Instantaneous phase center nonlinear component absolute value criterion:

Wherein, Is a nonlinear component of the instantaneous phase, C is/>A _t is the determination threshold of the weak signal.

③ Nonlinear-centered absolute standard deviation of instantaneous frequency:

where f _N(i)＝(f(i)-E[f(i)])/R_s,R_s is the symbol rate.

(2) High order cumulant features

The higher order cumulant is a measure of non-gaussian in the data set and is mainly used to extract the non-gaussian and non-linear characteristics of the signal in signal processing. Since the calculation amount of the excessively high-order accumulation amount is too large, the present scheme considers the fourth-order accumulation amount of the extracted signal, and its formula is as follows:

Wherein,

(3) Power spectral features

The power spectral signature is a spectral signature representation commonly used in signal processing to analyze the energy distribution of a signal at different frequencies, describing the energy density of the signal in the frequency domain. Advantages of the power spectral features include: the power spectrum characteristics reflect the distribution characteristics of the signals in the frequency domain, and can provide frequency component information of the signals; compared with the original signal, the power spectrum characteristic is more stable and is not easily influenced by instantaneous fluctuation; the power spectral features may emphasize the strong frequency components in the signal, i.e. those with larger energy, which is advantageous for distinguishing between different types of signals.

The welch spectrum is a commonly used power spectrum estimation method for calculating the power spectral density of a signal. It estimates the energy distribution of a signal in the frequency domain by segmenting the signal and fourier transforming each segment. Through the welch spectrum analysis, information such as the intensity of frequency components, the spectral shape, the band energy distribution, etc. of the signal can be obtained.

C. Automatic feature extraction;

and selecting a proper deep learning model to extract the characteristics of the electromagnetic signals according to the specific requirements of the task and the characteristics of the data. For the original data and amplitude phase, the convolution layer is adopted to extract the shallow automatic characteristics in the example, and then the advanced characteristics are extracted through the attention mechanism, the capsule layer and BiLSTM; the specific procedure is given below:

The convolution layer is used for extracting features in the data, the first layer of convolution layer takes the preprocessed data as input, and convolution operation is carried out between the input and the weight set. Let the convolution layer be l, assume that the input three-dimensional data d _l[x,y,c].K_l is a convolution kernel of l layers, represented by four-dimensional data [ k _x,k_y,c_l, c ], and 0.ltoreq.k _x≤k_x-1,0≤k_y≤k_y-1. The relationship of the output d _l+1 of the convolutional layer to the input data can be expressed as:

Attention mechanisms are used to focus on important parts of the sequence data, where the present scheme uses channel attention and spatial attention in combination to focus on the channel and spatial dimensions of the data. Channel attention attempts to find the most important channel. Channel attention calculates the average a _avg[c] and maximum a _max[c] for each channel by global average pooling (Global Average Pooling, GAP) and global maximum pooling (Global Max Pooling, GMP), respectively:

a_max[c]＝max(∑_x∑_yd₁₊₁[x，y，c])

Next, the results of GAP and GMP are concatenated and processed through the full concatenation layer and activation function to get channel attention weights. Assuming that the weights and biases of the fully connected layers are W ₁,W₂,b₁,b₂, the activation functions are ReLU and sigmoid, respectively, the process can be expressed as:

FC_Con＝concat(a_avg,a_max)

FC1＝ReLU(W₁*FC_Con+b₁)

FC2＝σ(W₂*FC1+b₂)

where concat represents a join operation, x represents a matrix multiplication, + represents a matrix addition, and σ represents a sigmoid function.

Finally, the channel attention weight obtained by FC2 is applied to the input data d _l+1, resulting in an attention weighted output f _c [ x, y, c ]:

f_c[x,y,c]＝FC2*d_l+1[x，y，c]。

For the spatial attention mechanism, first, the scheme requires calculating the average value AP [ x, y ] and the maximum value MP [ x, y ] of the input feature map f _c[x,y,c ] along the channel direction (i.e., the c-dimension):

Where C is the number of channels, Indicating that the maximum is taken in the c-dimension. Then, the AP and MP are superimposed together to obtain M [ x, y ]:

M[x，y]＝AP[x，y]+MP[x，y]

And processing M through a convolution layer and an activation function to obtain the spatial attention weight. The size of the convolution layer is 1x1, the convolution kernel is W _s, and the bias term is b _s:

a_s[x,y]＝σ(W_s*M[x,y]+b_s)

* Representing convolution operations

The input feature map is weighted by using the spatial attention weight, and an output fs _[x,y,c] after spatial attention processing is obtained:

f_s[x,y,c]＝a_s[x,y]×f_c[x,y,c]

X represents the multiplication operation at the element level.

And f _[x,y,c] is obtained after the output processed by the attention mechanism passes through a layer of convolution layer, the feature extracted by the convolution layer is converted into a vector by the capsule layer, so that the information loss caused by pooling operation in the traditional convolution neural network is avoided, and the high-dimensional spatial feature is further learned.

First, a set of vector elements of size 1 XC ₁ is obtained by the primary capsule layer, specifically, the output vector for the capsule i of the first layerAnd weight c _ij can be expressed as:

Wherein, Is the input vector for capsule i of layer i, b _ij is the linear combination of capsule i and capsule i of layer i+1, the formula is squash, which is a nonlinear activation function.

The capsule is recorded as:

W _u represents the primary capsule layer convolution kernel weight parameter, N1 represents the number of capsules output by the primary capsule layer, C ₁ is 16, and the squaring function is a nonlinear activation function of the capsules. Since each capsule is composed of a set of vector neurons, it has the ability to represent more characteristic information of an object.

And (3) carrying out weighted fusion on the primary capsules to finally obtain N groups of capsules, wherein the length of each capsule represents the probability that the target belongs to different categories. The input is primary capsule ui, the output is N capsules:

v_j＝[v₁,v₂,…,v_N]，j＝1，2，…，N，

The advanced capsule is obtained from the primary capsule, and is mainly determined by a dynamic routing algorithm, which is also the core of a capsule network, and the specific flow is as follows:

each advanced capsule v _i is a weighted sum of the output vectors u _i of all primary capsules, where the weight c _ij represents the contribution of the ith primary capsule to the jth advanced capsule:

S _i is then converted to v _i by a squash function to ensure that the length of v _i does not exceed 1:

v_j＝squash(s_j)

The weight c _ij is dynamically determined by a "routing" process. Initially, all weights c _ij are initialized to equal values. These weights are then updated by a number of iterations, based on the consistency of the predictions for each primary capsule with the actual output of each advanced capsule. Specifically, the present scheme calculates a "consistency" score for each primary capsule i and each advanced capsule j:

a_ij＝u_i·v_j

where·represents the dot product of the vector.

The present solution then uses a "softmax" function to update the weights c _ij:

This process may be iterated multiple times until the weights c _ij converge, increasing the number of iterations may slightly improve performance, but the improvement in effect is not significant. Thus, the present scheme selects 5 iterations to find a balance between maintaining reasonable computational complexity and model performance.

The output N capsules also contain rich information, while LSTM can be used to handle long-term dependencies in the sequence data. BiLSTM can be expressed as:

Forward propagation: for the previous output capsule v _j＝[v₁,v₂,…,v_N, forward LSTM calculates the hidden state at each time step t (F represents forward):

Wherein LSTM _forward is a forward LSTM computation function whose inputs are input v _t at the current time and the hidden state at the previous time

Back propagation: reverse LSTM calculates hidden states at each time step t(B represents backward):

Wherein LSTM _backward is a calculation function of the reverse LSTM whose inputs are the input v _t at the current time and the hidden state at the next time

Output calculation: finally, the forward and reverse hidden states are spliced to obtain a final output y _t:

here [; and represents concatenation of vectors.

Aiming at the manually extracted characteristics, the scheme continues to learn the characteristics further by adopting a full connection layer.

D. Feature fusion;

The multi-modal feature fusion based on the attention mechanism is a method for fusing a plurality of feature sources and weighting by using the attention mechanism so as to realize more accurate classification tasks.

D1. and (5) manually extracting feature combination.

As shown in fig. 2, the instantaneous Feature, the power spectrum Feature and the high-order cumulant Feature obtained in the step B are converted into three Feature vectors Feature1, feature2 and Feature3, and the three Feature vectors are spliced together and extracted from the three Feature vectors by using a deep neural network to obtain F ₂.

D2. Automatic feature combining of raw data and amplitude phase.

And B, expanding the shallow automatic features of the original data and the amplitude phase extracted by the convolution layer into a one-dimensional array, and then splicing to form F ₀.

As shown in fig. 3, the advanced features extracted from the raw data and amplitude phase in step C are stitched together and input into the capsule network and BiLSTM network to extract the higher-order features to form a high-dimensional automatic feature F ₁.

D3. Multi-mode feature fusion;

The purpose of feature fusion is to combine information from different sources or different modalities together in order to obtain a more comprehensive representation. Stitching is a common way of connecting different features together in sequence, preserving the complete information of each feature, and providing more ways of feature combination. The splicing fusion has flexibility and simplicity, can be combined with various machine learning and deep learning models, and simultaneously considers multi-mode information. Therefore, in this scheme, a splice fusion scheme is used.

As shown in fig. 4, the extracted artificial feature F ₂, the shallow automatic feature F ₀, and the high-dimensional automatic feature F ₁ are splice-fused into F _C.

D4. Not all features in Fc contribute to classification, however, in a few scenarios, there may be some immature and unfavorable features due to insufficient learning. Therefore, the scheme adopts the attention mechanism to learn the corresponding weight vector Q _c, the weight vector Q _c is obtained by two fully-connected layers, the number of neurons of the layer structure is the same as the number of features in Fc, and the activation function is Tanh, and the formula is as follows.

P₁＝Tanh(W₁F_c+b₁)

P₂＝Tanh(W₂P₁+b₂)

Representing the corresponding multiplication of two one-dimensional arrays;

As shown in fig. 5, F _C is fused by learning a corresponding weight vector Q _c using an attention mechanism to form F _a.

E. Classifying;

And obtaining the probability of each category through a sigmoid classifier, selecting and outputting the optimal threshold value according to F1-score under different signal to noise ratios, and finally outputting the final classification result. The specific process is as follows:

(1) The probability of each category is obtained through a sigmoid classifier: the sigmoid function is an activation function whose output ranges between 0 and 1. This makes it an ideal activation function in a classification task, as it can represent the probability that a class exists. The result is a predictive probability for each category.

(2) Selecting the threshold with the best output under different signal to noise ratios according to F1-score: is an index for evaluating the performance of classified tasks, and is a harmonic mean of accuracy and recall. Ranging from 0 to 1,1 representing a perfect classification and 0 representing a completely inaccurate classification. The model's F1-score is tested over different SNR's to determine the best decision threshold at each SNR level.

(3) And finally, outputting a final classification result: based on the selected optimal threshold, the model will output a final class label for each class. For example, if the predicted probability of a class is greater than a determined threshold, the model may mark that class as present (or 1), otherwise it is marked as absent (or 0).

Example 2, the procedure described in example 1 was verified:

an aliased signal dataset was created based on rml 2016.10a. Rml2016.10a is a public data set widely used in the fields of machine learning and digital signal processing, and a total of 220000 data samples include 11 modulation modes, specifically 8 digital modulation modes of 8PSK, BPSK, CPFSK, GFSK, PAM, QAM16, QAM64 and QPSK and 3 analog modulation modes of WBFM, AM-DSB and AM-SSB. Each modulation scheme has 20 signal-to-noise ratio (SNR) levels (from-20 dB to 18 dB), each level has 1000 IQ samples, each IQ sample is composed of two channels (real and imaginary parts), and there are 128 sampling points in total.

In this scenario, the scheme sets 3 independent transmitters based on each signal-to-noise ratio of the dataset, each transmitter being capable of generating signals corresponding to the signals in the RML2016.10A dataset, and these signals being independently selectable and controllable. Furthermore, the solution provides a receiver whose task is to receive the signals emitted by these transmitters. The signals actually received by the receivers tend to be a superposition of the signals sent by these transmitters, taking into account the effects of the physical environment and the propagation characteristics of the electromagnetic waves. However, due to physical distance between the transmitter and the receiver, speed differences between the transmitter and the receiver, etc., the signal from each transmitter may have some time delay and frequency delay when reaching the receiver. Finally, overlapping different signals with time delay and frequency delay, wherein 100 ten thousand groups of data are obtained, and the groups of data are divided into a training set, a verification set and a test set according to the proportion of 6:2:2.

The data processing is carried out on the group of data through the method, and the recognition result is obtained as follows:

The performance of this example at each snr, i.e., F1-score, accuracy and recall at each snr optimum threshold, is shown in fig. 6. At high signal-to-noise ratio, the total recognition accuracy of each modulation mode can reach 80%, as shown in fig. 7. Figure 8 shows the confusion matrix for each category at-12 dB, 0dB, 6dB, 12 dB.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The aliasing signal modulation identification method based on multi-feature fusion is characterized by comprising the following steps of:

A. Preprocessing data;

A1. for data tags: the modulation mode is represented by multi-heat coding;

B. Extracting artificial features;

C. Automatic feature extraction;

D. Feature fusion;

D1. manually extracting feature combination;

D2. Automatic feature combination of the original data and the amplitude phase;

D3. Multi-mode feature fusion;

E. Classifying;

2. The method for identifying modulation of aliasing signals based on multi-feature fusion according to claim 1, wherein in the step B:

Given a signal sequence a (i), i=1, 2., N, N _s is the number of samples,

Average value of signal sequence:

Normalization of signal sequences: a _n(i)＝a(i)/m_a;

Centering or shifting the already normalized signal sequence: a _cn(i)＝a_n (i) -1;

the expression formula of the transient characteristics is as follows:

normalized center instantaneous amplitude absolute standard deviation:

instantaneous phase center nonlinear component absolute value criterion:

Wherein, Is a nonlinear component of the instantaneous phase, C is/>A _t is a determination threshold value of a weak signal, which is the number of samples of a _n(i)＞a_t;

nonlinear-centered absolute standard deviation of instantaneous frequency:

wherein f _N(i)＝(f(i)-E[f(i)])/R_s,R_s is the symbol rate;

The fourth-order cumulant formula of the high-order cumulant feature extraction signal is as follows:

Wherein,

3. The method for identifying the modulation of the aliasing signal based on the multi-feature fusion according to claim 1 or 2, wherein the following method is adopted in the step C:

The first layer of convolution layer takes the preprocessed original data and amplitude phase as input, and carries out convolution operation between the input and the weight set; the convolution layer is denoted as l, and assuming that the input three-dimensional data d _l[x,y,c],K_l is a convolution kernel of l layers, the convolution kernel is denoted by four-dimensional data [ k _x,k_y,c_l, c ], and 0 is equal to or less than k _x≤k_x-1,0≤k_y≤k_y-1; the output d _i+1 of the convolutional layer versus the input data is expressed as:

channel attention and spatial attention are used in combination to focus on the channel dimension and spatial dimension of the data;

Channel attention attempts to find the most important channel; channel attention is separately calculated by global average pooling, GAP, and global maximum pooling, GMP, average a _avg c and maximum a _max c for each channel:

a_max[c]＝max(∑_x∑_yd_l+1[x，y，c])

connecting the results of GAP and GMP and processing the results through a full connection layer and an activation function to obtain a channel attention weight;

Assuming that the weights and biases of the full connection layer are W ₁,W₂,b₁,b₂, and the activation functions are ReLU and sigmoid, the process is expressed as follows:

FC_Con＝concat(a_avg,a_max)

FC1＝ReLU(W₁*FC_con+b₁)

FC2＝σ(W₂*FC1+b₂)

wherein concat represents a join operation, x represents a matrix multiplication, + represents a matrix addition, and σ represents a sigmoid function;

The channel attention weight obtained by FC2 is applied to the input data d _l+1, resulting in an attention weighted output f _c[x,yc]:

f_c[x,y,c]＝FC2*d_l+1[x，yc]

for the spatial attention mechanism, the average value AP [ x, y ] and the maximum value MP [ x, y ] of the input feature map f _c[x,y,c] along the channel direction, i.e., the c-dimension, are first calculated:

Where C is the number of channels, Representing a maximum in the c-dimension;

The AP and MP are overlapped together to obtain M [ x, y ]:

M[x，y]＝AP[x，y]+MP[x，y]

Processing M through a convolution layer and an activation function to obtain a spatial attention weight; the size of the convolution layer is 1x1, the convolution kernel is W _s, and the bias term is b _s:

a_s[x,y]＝σ(W_s*M[x,y]+b_s)

Wherein, represents convolution operation

The input feature map is weighted by using the spatial attention weight, and the output f _s[x,y,c] after spatial attention processing is obtained:

f_s[x,y,c]＝a_s[x,y]×f_c[x,y,c]

where x represents the multiplication operation at element level;

and f _[x,y,c] is obtained after the output processed by the attention mechanism passes through a layer of convolution layer, and the capsule layer converts the features extracted by the convolution layer into vectors from scalar quantities:

First, a group of vector elements with the size of 1 XC ₁ is obtained through the primary capsule layer, and the output vector of the capsule i of the first layer is obtained And weight c _ij is expressed as:

Wherein, Is the input vector of the capsule i of the first layer, b _ij is the linear combination of the capsule i and the capsule i of the first +1 layer, the formula is squash function, is a nonlinear activation function;

The capsule is recorded as:

Wherein W _u represents a primary capsule layer convolution kernel weight parameter, N1 represents the number of capsules output by a primary capsule layer, C1 is 16, and the squaring function is a nonlinear activation function of the capsules;

Weighting and fusing the primary capsules to finally obtain N groups of capsules, wherein the length of each capsule represents the probability that the target belongs to different categories; the input is primary capsule ui, the output is N capsules:

v_j＝[v₁,v₂,…,v_N]，j＝1，2，…，N，

v_i＝squash(s_i)

Initially, all weights c _ij are initialized to equal values, and then updated by a number of iterations, based on the consistency of the predictions for each primary capsule with the actual outputs of each advanced capsule; for each primary capsule i and each advanced capsule j, a "consistency" score is calculated:

a_ij＝u_i·v_j

Wherein, represents the dot product of the vector;

the weights c _ij are updated using a "softmax" function:

Iterating for a plurality of times until the weight c _ij converges;

BiLSTM the calculation process is expressed as:

Forward propagation: for the previous output capsule v _j＝[v₁,v₂,…,v_N, forward LSTM calculates the hidden state at each time step t F represents forward:

Back propagation: reverse LSTM calculates hidden states at each time step tB represents backsaward:

Wherein LSTM _backward is a calculation function of inverse LSTM whose inputs are input v _t at the current time and hidden state at the next time

Splicing the forward hidden state and the reverse hidden state to obtain a final output y _t:

Wherein, [; and represents concatenation of vectors.

4. The method for identifying modulation of aliasing signals based on multi-feature fusion of claim 3, wherein in step D4:

The weight vector Q _c is obtained by two fully connected layers, the number of neurons of the layer structure is the same as the number of features in F _C, the activation function is Tanh, and the formula is as follows:

P₁＝Tanh(W₁F_c+b₁)

P₂＝Tanh(W₂P₁+b₂)

Wherein, Representing the corresponding multiplication of two one-dimensional arrays.