CN109817276B

CN109817276B - A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

Info

Publication number: CN109817276B
Application number: CN201910085554.9A
Authority: CN
Inventors: 周树森; 邹海林; 柳婵娟; 臧睦君; 刘通
Original assignee: Ludong University
Current assignee: Jiangxi Qixin Raincoat Manufacturing Co ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2023-05-23
Anticipated expiration: 2039-01-29
Also published as: CN109817276A

Abstract

The invention relates to a protein secondary structure prediction method based on a deep neural network. According to the method, the interdependent characteristics of protein sequences can be fused through a plurality of different layers of convolutional neural networks, single amino acid characteristics are extracted at the same time, then the characteristics are input into the convolutional neural network for further fusion, and a mapping relation between the characteristics and 8 categories of proteins is established through a full-connection layer. Finally, the RMSProp optimizer is used for training the deep neural network based on the cross entropy error between labels and logits, so that the accuracy of protein secondary structure prediction is effectively improved.

Description

A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

技术领域technical field

本发明涉及一种基于深度神经网络的蛋白质二级结构预测方法，其中包括卷积神经网络、循环神经网络和蛋白质结构预测等技术。The invention relates to a protein secondary structure prediction method based on a deep neural network, which includes technologies such as a convolutional neural network, a recurrent neural network, and protein structure prediction.

背景技术Background technique

蛋白质有4个水平的结构，其中二级结构指的是蛋白质第一个水平的折叠，是部分蛋白质链折叠形成的通用结构。蛋白质链形成的结构完全取决于其氨基酸序列，但是至今我们并没有完全了解蛋白质序列的折叠规则。蛋白质的结构在分析其功能和药物学等研究中具有重要作用，所以如何基于氨基酸序列预测蛋白质的结构，是生物信息学面临的挑战之一。精准的蛋白质结构和功能预测，部分基于蛋白质二级结构预测的准确率。Proteins have four levels of structure, among which the secondary structure refers to the first level of folding of the protein, which is a general structure formed by the folding of part of the protein chain. The structure formed by a protein chain depends entirely on its amino acid sequence, but so far we have not fully understood the folding rules of protein sequences. The structure of protein plays an important role in the analysis of its function and pharmacology, so how to predict the structure of protein based on amino acid sequence is one of the challenges faced by bioinformatics. Accurate protein structure and function prediction, partly based on the accuracy of protein secondary structure prediction.

蛋白质二级结构预测方法已经被广泛研究，神经网络、隐马尔科夫模型和支持向量机等相关方法已经被成功的应用于预测二级结构的3种状态，少量方法预测其8种状态。实现蛋白质8种状态二级结构的预测，可以提供更详细的蛋白质结构信息，但是也增加了预测难度。Protein secondary structure prediction methods have been extensively studied, and related methods such as neural network, hidden Markov model, and support vector machine have been successfully applied to predict three states of secondary structure, and a few methods predict eight states. Realizing the prediction of the secondary structure of the eight states of the protein can provide more detailed protein structure information, but it also increases the difficulty of prediction.

近年来，基于层次化的概念构建多层架构学习复杂概念的深度学习方法，在计算机视觉、语音处理、自然语言处理和生物信息学等多个领域取得了突破性进展。其中，卷积神经网络作为一种专门处理具有类似网络结构数据的神经网络，在时间序列数据和图像数据上表现优异。循环神经网络作为专门处理序列数据的神经网络，在时间序列数据上表现优异。In recent years, deep learning methods based on hierarchical concepts to construct multi-layer architectures to learn complex concepts have made breakthroughs in many fields such as computer vision, speech processing, natural language processing, and bioinformatics. Among them, the convolutional neural network, as a neural network that specializes in processing data with a similar network structure, performs well on time series data and image data. Recurrent neural network, as a neural network that specializes in processing sequence data, performs well on time series data.

发明内容Contents of the invention

本发明解决的技术问题是：现有的预测蛋白质8种状态的方法比较少，预测的准确率较低，不能满足日常应用需求。The technical problem solved by the present invention is: there are relatively few existing methods for predicting the eight states of proteins, and the prediction accuracy is low, which cannot meet the daily application requirements.

本发明解决现有技术中存在的问题所采用的技术方案为：提供一种基于深度神经网络的蛋白质二级结构预测方法，可以通过卷积神经网络融合蛋白质序列相互依赖特征，同时提取单个氨基酸特征，然后将这些特征输入循环神经网络进一步融合，最后通过全连接层建立与蛋白质8个类别的映射关系。本发明具体技术方案包括如下步骤：The technical solution adopted by the present invention to solve the problems existing in the prior art is to provide a protein secondary structure prediction method based on a deep neural network, which can fuse protein sequence interdependence features through a convolutional neural network and extract single amino acid features at the same time , and then input these features into the recurrent neural network for further fusion, and finally establish the mapping relationship with the 8 categories of proteins through the fully connected layer. Concrete technical scheme of the present invention comprises the following steps:

提取蛋白质序列特征：网络的输入特征是氨基酸的序列和结构信息，数据是从PDB下载并用DSSP系统标注；其中，氨基酸序列信息的特征个数是21，氨基酸结构信息的特征个数也是21，每个氨基酸有42个特征被用来预测其对应的二级结构；Extract protein sequence features: the input features of the network are amino acid sequence and structural information, and the data is downloaded from PDB and marked with the DSSP system; among them, the number of features of amino acid sequence information is 21, and the number of features of amino acid structure information is also 21, each 42 features of each amino acid were used to predict its corresponding secondary structure;

多层组合卷积神经网络特征提取：使用多个不同层次的卷积神经网络分别提取特征，并与原始特征组合后传给下一层；Multi-layer combined convolutional neural network feature extraction: Use multiple convolutional neural networks at different levels to extract features separately, and combine them with the original features and pass them to the next layer;

双向长短期记忆神经网络特征提取：使用双向长短期记忆神经网络提取特征并传给下一层；全连接层建立特征映射：使用全连接层将特征映射到蛋白质8个不同的类别；Bidirectional long-term short-term memory neural network feature extraction: use bidirectional long-term short-term memory neural network to extract features and pass them to the next layer; fully connected layer to establish feature mapping: use fully connected layer to map features to 8 different categories of proteins;

训练深度神经网络：使用RMSProp优化器基于labels和logits之间的交叉熵误差训练前面步骤中构建的深度神经网络，并用L2正则化方法防止过拟合。Train the deep neural network: Use the RMSProp optimizer to train the deep neural network built in the previous step based on the cross-entropy error between labels and logits, and use the L2 regularization method to prevent overfitting.

本发明的进一步技术方案是：在多层组合卷积神经网络特征提取中，使用多个不同长度的一维卷积核模拟不同大小的窗口来提取氨基酸序列特征，具体组合卷积神经网络的构建方式，包括如下步骤：The further technical solution of the present invention is: in the feature extraction of multi-layer combined convolutional neural network, use a plurality of one-dimensional convolution kernels of different lengths to simulate windows of different sizes to extract amino acid sequence features, and specifically combine the construction of convolutional neural network method, including the following steps:

1层一维卷积神经网络构建：分别用卷积核结点数为3,5,7,9,11的一维卷积网对蛋白质中的氨基酸序列进行卷积运算提取特征，卷积网的输入通道数为42，对应一个氨基酸的42个特征，输出通道数为50，然后将卷积网的输出用ReLU函数激活，最后用分批归一化函数防止模型过拟合；One-layer one-dimensional convolutional neural network construction: use one-dimensional convolutional networks with convolution kernel nodes of 3, 5, 7, 9, and 11 to perform convolution operations on amino acid sequences in proteins to extract features. The number of input channels is 42, corresponding to 42 features of an amino acid, the number of output channels is 50, and then the output of the convolutional network is activated with the ReLU function, and finally the batch normalization function is used to prevent the model from overfitting;

k层一维卷积神经网络构建：将前面所述的1层一维卷积神经网络构建方法迭代运行k次，即可构建k层一维卷积神经网络，其中，从第2层开始，卷积网的输入通道数和输出通道数均为50；K-layer one-dimensional convolutional neural network construction: run the above-mentioned one-layer one-dimensional convolutional neural network construction method iteratively k times to construct a k-layer one-dimensional convolutional neural network, wherein, starting from the second layer, The number of input channels and the number of output channels of the convolutional network are both 50;

原始特征与多个卷积神经网络输出特征组合输出：本发明中，分别用1层、2层和3层一维卷积神经网络对氨基酸序列的特征进行提取，并与原始特征组合输出，则共有42+50×3＝792个特征输出到双向长短期记忆神经网络中继续处理。Combined output of original features and multiple convolutional neural network output features: In the present invention, the features of the amino acid sequence are extracted with 1-layer, 2-layer and 3-layer one-dimensional convolutional neural networks, and combined with the original features for output, then A total of 42+50×3=792 features are output to the bidirectional long-short-term memory neural network for further processing.

本发明的技术效果是：本发明涉及一种基于深度神经网络的蛋白质二级结构预测方法，通过多层组合卷积神经网络和双向长短期记忆神经网络相结合的方法，让系统自动完成蛋白质二级结构预测，解决了传统预测方法准确率低的问题。在多层组合卷积神经网络中，利用多个不同长度的一维卷积核模拟不同大小的窗口来提取氨基酸序列特征，避免了经典方法不能有效提取氨基酸序列间有效信息的问题，使系统能够同时提取氨基酸序列间和序列内的特征，进一步提升了深层架构的特征提取能力。The technical effect of the present invention is: the present invention relates to a method for predicting protein secondary structure based on deep neural network, through the method of combining multi-layer combined convolutional neural network and bidirectional long-term and short-term memory neural network, the system can automatically complete protein secondary structure The hierarchical structure prediction solves the problem of low accuracy of traditional prediction methods. In the multi-layer combined convolutional neural network, multiple one-dimensional convolution kernels of different lengths are used to simulate windows of different sizes to extract amino acid sequence features, which avoids the problem that the classical method cannot effectively extract effective information between amino acid sequences, and enables the system to Simultaneously extract features between amino acid sequences and within sequences, further improving the feature extraction capabilities of deep architectures.

附图说明Description of drawings

图1为本发明的流程图。Fig. 1 is a flowchart of the present invention.

图2为本发明的多层组合卷积神经网络特征提取流程图。Fig. 2 is a flow chart of feature extraction of multi-layer combined convolutional neural network of the present invention.

图3为本发明的1层一维卷积神经网络流程图。Fig. 3 is a flow chart of the one-layer one-dimensional convolutional neural network of the present invention.

具体实施方式Detailed ways

下面结合具体实施例，对本发明技术方案进一步说明。The technical solutions of the present invention will be further described below in conjunction with specific embodiments.

如图1所示，本发明的具体实施方式是：提供一种基于深度神经网络的蛋白质二级结构预测方法，包括如下步骤：As shown in Figure 1, the specific embodiment of the present invention is: provide a kind of protein secondary structure prediction method based on deep neural network, comprise the steps:

步骤100：提取蛋白质序列特征，网络的输入特征中，氨基酸序列信息的特征个数是21，氨基酸结构信息的特征个数也是21，每个氨基酸有42个特征被用来预测其对应的二级结构。每个蛋白质序列最多包含700个氨基酸，所以每个蛋白质可以用700×42的矩阵来表示，若一个蛋白质中的氨基酸不足700个，则用0填充后面的氨基酸序列特征。一个蛋白质序列可以表示为：Step 100: Extract protein sequence features. Among the input features of the network, the number of features of amino acid sequence information is 21, the number of features of amino acid structure information is also 21, and 42 features of each amino acid are used to predict its corresponding secondary structure. Each protein sequence contains a maximum of 700 amino acids, so each protein can be represented by a 700×42 matrix. If a protein has less than 700 amino acids, fill the amino acid sequence features with 0. A protein sequence can be represented as:

其中L是蛋白质序列的长度，D是每个氨基酸的特征个数。X的每一列是一个氨基酸x。一个拥有所有特征的氨基酸可以看作是空间

中的一个向量，其中第j个坐标对应第j个特征。在本发明中，L＝700，D＝42。where L is the length of the protein sequence and D is the number of features per amino acid. Each column of X is an amino acid x. An amino acid with all the features can be seen as the space

A vector in , where the jth coordinate corresponds to the jth feature. In the present invention, L=700, D=42.

Y是与L个氨基酸对应的标签数据集，可以表示为：Y is a label dataset corresponding to L amino acids, which can be expressed as:

其中C是数据集中的类别数。在本发明中，C＝8。Y的每一列是一个在空间

中的向量，其中第j个坐标对应第j个类别。where C is the number of categories in the dataset. In the present invention, C=8. Each column of Y is a space

A vector in , where the jth coordinate corresponds to the jth category.

本发明将用深层架构使用L个氨基酸训练构建X→Y的映射函数。训练后，当一个新的氨基酸x输入时，深层架构可以使用映射函数确定x对应的标签y。The present invention will use the deep framework to use L amino acid training to construct the mapping function of X→Y. After training, when a new amino acid x is input, the deep architecture can use the mapping function to determine the label y corresponding to x.

步骤200：多层组合卷积神经网络特征提取，使用多个不同层次的卷积神经网络分别提取特征，并与原始特征组合后传给下一层。Step 200: Multi-layer combined convolutional neural network feature extraction, using multiple convolutional neural networks at different levels to extract features respectively, and combining with the original features to pass to the next layer.

如图2所示，在多层组合卷积神经网络特征提取步骤中，包括如下步骤：As shown in Figure 2, in the multi-layer combined convolutional neural network feature extraction step, the following steps are included:

步骤210：1层一维卷积神经网络特征提取，如图3所示，分别用卷积核结点数为3,5,7,9,11的一维卷积网对蛋白质中的氨基酸序列进行卷积运算提取特征，卷积网的输入通道数为42，对应一个氨基酸的42个特征，输出通道数为50。Step 210: 1-layer one-dimensional convolutional neural network feature extraction, as shown in Figure 3, respectively use the one-dimensional convolutional network with the number of convolution kernel nodes as 3, 5, 7, 9, 11 to carry out the amino acid sequence in the protein The convolution operation extracts features. The number of input channels of the convolutional network is 42, corresponding to 42 features of an amino acid, and the number of output channels is 50.

以卷积核结点数为3的一维卷积网为例，其功能是对输入数据进行特征提取，其内部包含50个卷积核，组成卷积核的每个元素都对应3个权重系数和一个偏差量，类似于一个前馈神经网络的神经元。对于输入X的每一行特征，卷积核会依次对每个氨基酸及其相邻的氨基酸做矩阵元素乘法求和；卷积运算后基于得到的每一列特征加权求和并叠加偏差量：Take a one-dimensional convolutional network with 3 convolution kernel nodes as an example. Its function is to extract features from the input data. It contains 50 convolution kernels inside, and each element that makes up the convolution kernel corresponds to 3 weight coefficients. and a bias, similar to the neurons of a feed-forward neural network. For each row feature of the input X, the convolution kernel will perform matrix element multiplication and summation on each amino acid and its adjacent amino acids in turn; after the convolution operation, the weighted summation based on each column feature obtained and superimposed deviation:

i∈{0,1,…,L₁}i∈{0,1,…,L ₁ }

其中b⁰是偏差量。h⁰和h¹表示卷积的输入和输出，

等于输入X的第k个氨基酸。L₁是输出通道数，K₀是特征图的通道数，f是卷积核大小。在这里L₁＝50，K₀＝42，f＝3。where b ⁰ is the amount of bias. h ⁰ and h ¹ represent the input and output of the convolution,

Equal to the kth amino acid of input X. L ₁ is the number of output channels, K ₀ is the number of channels of the feature map, and f is the size of the convolution kernel. Here L ₁ =50, K ₀ =42, f=3.

然后将卷积网的输出用ReLU函数激活：Then activate the output of the convolutional network with the ReLU function:

最后用分批归一化函数防止模型过拟合：Finally, use the batch normalization function to prevent the model from overfitting:

卷积核结点数为5,7,9,11的1层一维卷积网与上述结构相同，分别将f的值设为5,7,9,11即可。则1层一维卷积神经网络提取的特征为

The one-layer one-dimensional convolutional network with the number of convolution kernel nodes of 5, 7, 9, and 11 is the same as the above structure, and the value of f is set to 5, 7, 9, and 11 respectively. Then the features extracted by the 1-layer one-dimensional convolutional neural network are

步骤220：2层一维卷积神经网络特征提取，比1层一维卷积神经网络特征提取多了1次卷积、激活和分批归一化运算。以卷积核结点数为3的一维卷积网为例，对于输入X依次进行2次卷积、激活和分批归一化运算：Step 220: 2-layer one-dimensional convolutional neural network feature extraction, one more operation of convolution, activation and batch normalization than the one-layer one-dimensional convolutional neural network feature extraction. Taking a one-dimensional convolutional network with a convolution kernel node number of 3 as an example, two convolution, activation, and batch normalization operations are performed on the input X in sequence:

i∈{0,1,…,L₁}i∈{0,1,…,L ₁ }

i∈{0,1,…,L₂}i∈{0,1,…,L ₂ }

其中b¹是偏差量。d¹是1次卷积、激活和分批归一化运算的输出，h²表示卷积的输入和输出。L₂是输出通道数，K₁是特征图的通道数，f是卷积核大小。在这里L₂＝50，K₁＝50，f＝3。where ^b1 is the amount of bias. d ¹ is the output of 1 convolution, activation and batch normalization operation, and h ² represents the input and output of convolution. L ₂ is the number of output channels, K ₁ is the number of channels of the feature map, and f is the size of the convolution kernel. Here L ₂ =50, K ₁ =50, f=3.

卷积核结点数为5,7,9,11的2层一维卷积网与上述结构相同，分别将f的值设为5,7,9,11即可。则2层一维卷积神经网络提取的特征为

The 2-layer one-dimensional convolutional network with convolution kernel nodes of 5, 7, 9, and 11 is the same as the above structure, and the value of f is set to 5, 7, 9, and 11 respectively. Then the features extracted by the 2-layer one-dimensional convolutional neural network are

步骤230：3层一维卷积神经网络特征提取，比2层一维卷积神经网络特征提取多了1次卷积、激活和分批归一化运算。对于输入X依次进行3次卷积、激活和分批归一化运算，第3次卷积运算的输出通道数L₃＝50。3层一维卷积神经网络提取的特征为

Step 230: 3-layer one-dimensional convolutional neural network feature extraction, one more operation of convolution, activation and batch normalization than 2-layer one-dimensional convolutional neural network feature extraction. For the input X, three convolution, activation and batch normalization operations are performed sequentially, and the number of output channels of the third convolution operation is L ₃ =50. The features extracted by the three-layer one-dimensional convolutional neural network are

步骤240：原始特征与多个卷积神经网络输出特征组合输出，将输入X与步骤210、步骤220、步骤230提取的特征Step 240: Combining the original features with multiple convolutional neural network output features and outputting the input X with the features extracted in step 210, step 220, and step 230

组合在一起，输出到步骤300。步骤210、步骤220和步骤230各输出50×5个特征，则共有42+50×5×3＝792个特征输出到双向长短期记忆神经网络中继续处理。combined and output to step 300. Step 210, step 220 and step 230 each output 50×5 features, then a total of 42+50×5×3=792 features are output to the bidirectional long-short-term memory neural network for further processing.

步骤300：双向长短期记忆神经网络特征提取，分别由前向长短期记忆神经网络和后向长短期记忆神经网络提取特征并组合而成。Step 300: Bidirectional long-short-term memory neural network feature extraction, respectively extracting features from the forward long-term short-term memory neural network and the backward long-term short-term memory neural network and combining them.

其中，

表示长短期记忆神经网络基于前t-1个氨基酸特征的在第t个位置提取到的特征表示。/>

表示长短期记忆神经网络基于后L-t个氨基酸特征的在第t个位置提取到的特征表示。长短期记忆神经网络隐藏层的结点数为800，则前向和后向长短期记忆神经网络整合后输出的特征个数为1600。in,

Indicates the feature representation extracted by the long short-term memory neural network based on the first t-1 amino acid features at the tth position. />

Indicates the feature representation extracted by the long-short-term memory neural network based on the last Lt amino acid features at the tth position. The number of nodes in the hidden layer of the long-term short-term memory neural network is 800, and the number of features output after the integration of the forward and backward long-term short-term memory neural networks is 1600.

步骤400：全连接层建立特征映射，使用全连接层将步骤300提取的特征g_t映射到蛋白质8个不同的类别。Step 400: The fully connected layer establishes a feature map, and uses the fully connected layer to map the feature g _t extracted in step 300 to 8 different categories of proteins.

其中w_st是连接第s个特征到第t个类别的权重，b_t是第t个类别的偏差量。g_st是步骤300输出的第s个氨基酸第t个特征。N是g_st的特征个数。在这里N＝1600。where w _st is the weight connecting the sth feature to the tth category, and _bt is the bias of the tth category. g _st is the t-th feature of the s-th amino acid output in step 300 . N is the number of features of g _st . Here N=1600.

步骤500：训练深度神经网络，使用L个标注数据来优化参数空间W，从而使深层架构拥有更好的区分能力。这个任务可以转化为一个优化问题：Step 500: Train the deep neural network, use L labeled data to optimize the parameter space W, so that the deep architecture has better discrimination ability. This task can be transformed into an optimization problem:

其中in

T表示损失函数。在本发明中T是交叉熵误差函数，使用RMSProp优化器基于T训练前面步骤中构建的深度神经网络，并用L2正则化方法防止过拟合。T represents the loss function. In the present invention, T is a cross-entropy error function, and the RMSProp optimizer is used to train the deep neural network constructed in the previous steps based on T, and an L2 regularization method is used to prevent overfitting.

本发明提出一种基于深度神经网络的蛋白质二级结构预测方法，利用多层卷积神经网络的特征提取、多个不同层次特征的组合和双向长短期记忆神经网络特征提取，组合实现了一个用于蛋白质二级结构预测的深层架构，有效提升了蛋白质二级结构预测的正确率。The present invention proposes a protein secondary structure prediction method based on a deep neural network, using the feature extraction of a multi-layer convolutional neural network, the combination of multiple different-level features and the feature extraction of a bidirectional long-short-term memory neural network to achieve a combination of Based on the deep structure of protein secondary structure prediction, it effectively improves the accuracy of protein secondary structure prediction.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims

1. A protein secondary structure prediction method based on a deep neural network comprises the following steps:

extracting protein sequence characteristics: the input features of the network are the sequence and structure information of amino acids, and the data is downloaded from PDB and marked by DSSP system; wherein the number of the characteristics of the amino acid sequence information is 21, the number of the characteristics of the amino acid structure information is 21, and 42 characteristics of each amino acid are used for predicting the corresponding secondary structure;

multilayer combined convolutional neural network feature extraction: respectively extracting features by using a plurality of convolutional neural networks with different layers, combining the features with the original features, and transmitting the combined features to the next layer;

two-way long-short-term memory neural network feature extraction: extracting features by using a two-way long-short-term memory neural network and transmitting the features to the next layer; the number of nodes of the hidden layer of the long-short-period memory neural network is 800, and the number of features output after the integration of the forward and backward long-short-period memory neural networks is 1600;

the full connection layer establishes a feature map: mapping features to 8 different classes of proteins using the full-ligation layer;

training the deep neural network: the deep neural network constructed in the previous step was trained based on cross entropy errors between labels and logits using an RMSProp optimizer, and an L2 regularization method was used to prevent overfitting.

2. The protein secondary structure prediction method based on deep neural network according to claim 1, wherein in the multilayer combined convolutional neural network feature extraction, a plurality of one-dimensional convolution kernels with different lengths are used for simulating windows with different sizes to extract amino acid sequence features, and the construction mode of the combined convolutional neural network specifically comprises the following steps:

1 layer one-dimensional convolution neural network construction: the method comprises the steps of performing convolution operation on amino acid sequences in proteins by using one-dimensional convolution networks with the number of convolution kernel nodes of 3,5,7,9 and 11 respectively to extract characteristics, wherein the number of input channels of the convolution network is 42, the number of output channels of the convolution network corresponds to 42 characteristics of one amino acid, the number of output channels of the convolution network is 50, activating output of the convolution network by using a ReLU function, and finally preventing model overfitting by using a batch normalization function;

2-layer one-dimensional convolutional neural network construction: the characteristic extraction of the one-dimensional convolution neural network is more than that of the 1-layer one-dimensional convolution neural network by 1 time, and the convolution, activation and batch normalization operations are performed; sequentially performing convolution, activation and batch normalization operation on the input X for 2 times;

3-layer one-dimensional convolution neural network construction: the characteristic extraction of the one-dimensional convolution neural network is more than that of the 2-layer one-dimensional convolution neural network by 1 convolution, activation and batch normalization operation; sequentially carrying out convolution, activation and batch normalization operation on the input X for 3 times;

the original features are combined with a plurality of convolutional neural network output features to output: extracting amino acid sequence features by using 1 layer, 2 layer and 3 layer one-dimensional convolutional neural networks respectively, and combining with original features to output, so as to share

The characteristic is output to the two-way long-short-term memory neural network for continuous processing.