[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109817276B - A Method for Protein Secondary Structure Prediction Based on Deep Neural Network - Google Patents

A Method for Protein Secondary Structure Prediction Based on Deep Neural Network Download PDF

Info

Publication number
CN109817276B
CN109817276B CN201910085554.9A CN201910085554A CN109817276B CN 109817276 B CN109817276 B CN 109817276B CN 201910085554 A CN201910085554 A CN 201910085554A CN 109817276 B CN109817276 B CN 109817276B
Authority
CN
China
Prior art keywords
neural network
features
layer
convolution
amino acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910085554.9A
Other languages
Chinese (zh)
Other versions
CN109817276A (en
Inventor
周树森
邹海林
柳婵娟
臧睦君
刘通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Qixin Raincoat Manufacturing Co ltd
Original Assignee
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ludong University filed Critical Ludong University
Priority to CN201910085554.9A priority Critical patent/CN109817276B/en
Publication of CN109817276A publication Critical patent/CN109817276A/en
Application granted granted Critical
Publication of CN109817276B publication Critical patent/CN109817276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a protein secondary structure prediction method based on a deep neural network. According to the method, the interdependent characteristics of protein sequences can be fused through a plurality of different layers of convolutional neural networks, single amino acid characteristics are extracted at the same time, then the characteristics are input into the convolutional neural network for further fusion, and a mapping relation between the characteristics and 8 categories of proteins is established through a full-connection layer. Finally, the RMSProp optimizer is used for training the deep neural network based on the cross entropy error between labels and logits, so that the accuracy of protein secondary structure prediction is effectively improved.

Description

一种基于深度神经网络的蛋白质二级结构预测方法A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

技术领域technical field

本发明涉及一种基于深度神经网络的蛋白质二级结构预测方法,其中包括卷积神经网络、循环神经网络和蛋白质结构预测等技术。The invention relates to a protein secondary structure prediction method based on a deep neural network, which includes technologies such as a convolutional neural network, a recurrent neural network, and protein structure prediction.

背景技术Background technique

蛋白质有4个水平的结构,其中二级结构指的是蛋白质第一个水平的折叠,是部分蛋白质链折叠形成的通用结构。蛋白质链形成的结构完全取决于其氨基酸序列,但是至今我们并没有完全了解蛋白质序列的折叠规则。蛋白质的结构在分析其功能和药物学等研究中具有重要作用,所以如何基于氨基酸序列预测蛋白质的结构,是生物信息学面临的挑战之一。精准的蛋白质结构和功能预测,部分基于蛋白质二级结构预测的准确率。Proteins have four levels of structure, among which the secondary structure refers to the first level of folding of the protein, which is a general structure formed by the folding of part of the protein chain. The structure formed by a protein chain depends entirely on its amino acid sequence, but so far we have not fully understood the folding rules of protein sequences. The structure of protein plays an important role in the analysis of its function and pharmacology, so how to predict the structure of protein based on amino acid sequence is one of the challenges faced by bioinformatics. Accurate protein structure and function prediction, partly based on the accuracy of protein secondary structure prediction.

蛋白质二级结构预测方法已经被广泛研究,神经网络、隐马尔科夫模型和支持向量机等相关方法已经被成功的应用于预测二级结构的3种状态,少量方法预测其8种状态。实现蛋白质8种状态二级结构的预测,可以提供更详细的蛋白质结构信息,但是也增加了预测难度。Protein secondary structure prediction methods have been extensively studied, and related methods such as neural network, hidden Markov model, and support vector machine have been successfully applied to predict three states of secondary structure, and a few methods predict eight states. Realizing the prediction of the secondary structure of the eight states of the protein can provide more detailed protein structure information, but it also increases the difficulty of prediction.

近年来,基于层次化的概念构建多层架构学习复杂概念的深度学习方法,在计算机视觉、语音处理、自然语言处理和生物信息学等多个领域取得了突破性进展。其中,卷积神经网络作为一种专门处理具有类似网络结构数据的神经网络,在时间序列数据和图像数据上表现优异。循环神经网络作为专门处理序列数据的神经网络,在时间序列数据上表现优异。In recent years, deep learning methods based on hierarchical concepts to construct multi-layer architectures to learn complex concepts have made breakthroughs in many fields such as computer vision, speech processing, natural language processing, and bioinformatics. Among them, the convolutional neural network, as a neural network that specializes in processing data with a similar network structure, performs well on time series data and image data. Recurrent neural network, as a neural network that specializes in processing sequence data, performs well on time series data.

发明内容Contents of the invention

本发明解决的技术问题是:现有的预测蛋白质8种状态的方法比较少,预测的准确率较低,不能满足日常应用需求。The technical problem solved by the present invention is: there are relatively few existing methods for predicting the eight states of proteins, and the prediction accuracy is low, which cannot meet the daily application requirements.

本发明解决现有技术中存在的问题所采用的技术方案为:提供一种基于深度神经网络的蛋白质二级结构预测方法,可以通过卷积神经网络融合蛋白质序列相互依赖特征,同时提取单个氨基酸特征,然后将这些特征输入循环神经网络进一步融合,最后通过全连接层建立与蛋白质8个类别的映射关系。本发明具体技术方案包括如下步骤:The technical solution adopted by the present invention to solve the problems existing in the prior art is to provide a protein secondary structure prediction method based on a deep neural network, which can fuse protein sequence interdependence features through a convolutional neural network and extract single amino acid features at the same time , and then input these features into the recurrent neural network for further fusion, and finally establish the mapping relationship with the 8 categories of proteins through the fully connected layer. Concrete technical scheme of the present invention comprises the following steps:

提取蛋白质序列特征:网络的输入特征是氨基酸的序列和结构信息,数据是从PDB下载并用DSSP系统标注;其中,氨基酸序列信息的特征个数是21,氨基酸结构信息的特征个数也是21,每个氨基酸有42个特征被用来预测其对应的二级结构;Extract protein sequence features: the input features of the network are amino acid sequence and structural information, and the data is downloaded from PDB and marked with the DSSP system; among them, the number of features of amino acid sequence information is 21, and the number of features of amino acid structure information is also 21, each 42 features of each amino acid were used to predict its corresponding secondary structure;

多层组合卷积神经网络特征提取:使用多个不同层次的卷积神经网络分别提取特征,并与原始特征组合后传给下一层;Multi-layer combined convolutional neural network feature extraction: Use multiple convolutional neural networks at different levels to extract features separately, and combine them with the original features and pass them to the next layer;

双向长短期记忆神经网络特征提取:使用双向长短期记忆神经网络提取特征并传给下一层;全连接层建立特征映射:使用全连接层将特征映射到蛋白质8个不同的类别;Bidirectional long-term short-term memory neural network feature extraction: use bidirectional long-term short-term memory neural network to extract features and pass them to the next layer; fully connected layer to establish feature mapping: use fully connected layer to map features to 8 different categories of proteins;

训练深度神经网络:使用RMSProp优化器基于labels和logits之间的交叉熵误差训练前面步骤中构建的深度神经网络,并用L2正则化方法防止过拟合。Train the deep neural network: Use the RMSProp optimizer to train the deep neural network built in the previous step based on the cross-entropy error between labels and logits, and use the L2 regularization method to prevent overfitting.

本发明的进一步技术方案是:在多层组合卷积神经网络特征提取中,使用多个不同长度的一维卷积核模拟不同大小的窗口来提取氨基酸序列特征,具体组合卷积神经网络的构建方式,包括如下步骤:The further technical solution of the present invention is: in the feature extraction of multi-layer combined convolutional neural network, use a plurality of one-dimensional convolution kernels of different lengths to simulate windows of different sizes to extract amino acid sequence features, and specifically combine the construction of convolutional neural network method, including the following steps:

1层一维卷积神经网络构建:分别用卷积核结点数为3,5,7,9,11的一维卷积网对蛋白质中的氨基酸序列进行卷积运算提取特征,卷积网的输入通道数为42,对应一个氨基酸的42个特征,输出通道数为50,然后将卷积网的输出用ReLU函数激活,最后用分批归一化函数防止模型过拟合;One-layer one-dimensional convolutional neural network construction: use one-dimensional convolutional networks with convolution kernel nodes of 3, 5, 7, 9, and 11 to perform convolution operations on amino acid sequences in proteins to extract features. The number of input channels is 42, corresponding to 42 features of an amino acid, the number of output channels is 50, and then the output of the convolutional network is activated with the ReLU function, and finally the batch normalization function is used to prevent the model from overfitting;

k层一维卷积神经网络构建:将前面所述的1层一维卷积神经网络构建方法迭代运行k次,即可构建k层一维卷积神经网络,其中,从第2层开始,卷积网的输入通道数和输出通道数均为50;K-layer one-dimensional convolutional neural network construction: run the above-mentioned one-layer one-dimensional convolutional neural network construction method iteratively k times to construct a k-layer one-dimensional convolutional neural network, wherein, starting from the second layer, The number of input channels and the number of output channels of the convolutional network are both 50;

原始特征与多个卷积神经网络输出特征组合输出:本发明中,分别用1层、2层和3层一维卷积神经网络对氨基酸序列的特征进行提取,并与原始特征组合输出,则共有42+50×3=792个特征输出到双向长短期记忆神经网络中继续处理。Combined output of original features and multiple convolutional neural network output features: In the present invention, the features of the amino acid sequence are extracted with 1-layer, 2-layer and 3-layer one-dimensional convolutional neural networks, and combined with the original features for output, then A total of 42+50×3=792 features are output to the bidirectional long-short-term memory neural network for further processing.

本发明的技术效果是:本发明涉及一种基于深度神经网络的蛋白质二级结构预测方法,通过多层组合卷积神经网络和双向长短期记忆神经网络相结合的方法,让系统自动完成蛋白质二级结构预测,解决了传统预测方法准确率低的问题。在多层组合卷积神经网络中,利用多个不同长度的一维卷积核模拟不同大小的窗口来提取氨基酸序列特征,避免了经典方法不能有效提取氨基酸序列间有效信息的问题,使系统能够同时提取氨基酸序列间和序列内的特征,进一步提升了深层架构的特征提取能力。The technical effect of the present invention is: the present invention relates to a method for predicting protein secondary structure based on deep neural network, through the method of combining multi-layer combined convolutional neural network and bidirectional long-term and short-term memory neural network, the system can automatically complete protein secondary structure The hierarchical structure prediction solves the problem of low accuracy of traditional prediction methods. In the multi-layer combined convolutional neural network, multiple one-dimensional convolution kernels of different lengths are used to simulate windows of different sizes to extract amino acid sequence features, which avoids the problem that the classical method cannot effectively extract effective information between amino acid sequences, and enables the system to Simultaneously extract features between amino acid sequences and within sequences, further improving the feature extraction capabilities of deep architectures.

附图说明Description of drawings

图1为本发明的流程图。Fig. 1 is a flowchart of the present invention.

图2为本发明的多层组合卷积神经网络特征提取流程图。Fig. 2 is a flow chart of feature extraction of multi-layer combined convolutional neural network of the present invention.

图3为本发明的1层一维卷积神经网络流程图。Fig. 3 is a flow chart of the one-layer one-dimensional convolutional neural network of the present invention.

具体实施方式Detailed ways

下面结合具体实施例,对本发明技术方案进一步说明。The technical solutions of the present invention will be further described below in conjunction with specific embodiments.

如图1所示,本发明的具体实施方式是:提供一种基于深度神经网络的蛋白质二级结构预测方法,包括如下步骤:As shown in Figure 1, the specific embodiment of the present invention is: provide a kind of protein secondary structure prediction method based on deep neural network, comprise the steps:

步骤100:提取蛋白质序列特征,网络的输入特征中,氨基酸序列信息的特征个数是21,氨基酸结构信息的特征个数也是21,每个氨基酸有42个特征被用来预测其对应的二级结构。每个蛋白质序列最多包含700个氨基酸,所以每个蛋白质可以用700×42的矩阵来表示,若一个蛋白质中的氨基酸不足700个,则用0填充后面的氨基酸序列特征。一个蛋白质序列可以表示为:Step 100: Extract protein sequence features. Among the input features of the network, the number of features of amino acid sequence information is 21, the number of features of amino acid structure information is also 21, and 42 features of each amino acid are used to predict its corresponding secondary structure. Each protein sequence contains a maximum of 700 amino acids, so each protein can be represented by a 700×42 matrix. If a protein has less than 700 amino acids, fill the amino acid sequence features with 0. A protein sequence can be represented as:

Figure GDA0001991677080000031
Figure GDA0001991677080000031

其中L是蛋白质序列的长度,D是每个氨基酸的特征个数。X的每一列是一个氨基酸x。一个拥有所有特征的氨基酸可以看作是空间

Figure GDA0001991677080000032
中的一个向量,其中第j个坐标对应第j个特征。在本发明中,L=700,D=42。where L is the length of the protein sequence and D is the number of features per amino acid. Each column of X is an amino acid x. An amino acid with all the features can be seen as the space
Figure GDA0001991677080000032
A vector in , where the jth coordinate corresponds to the jth feature. In the present invention, L=700, D=42.

Y是与L个氨基酸对应的标签数据集,可以表示为:Y is a label dataset corresponding to L amino acids, which can be expressed as:

Figure GDA0001991677080000033
Figure GDA0001991677080000033

其中C是数据集中的类别数。在本发明中,C=8。Y的每一列是一个在空间

Figure GDA0001991677080000034
中的向量,其中第j个坐标对应第j个类别。where C is the number of categories in the dataset. In the present invention, C=8. Each column of Y is a space
Figure GDA0001991677080000034
A vector in , where the jth coordinate corresponds to the jth category.

Figure GDA0001991677080000035
Figure GDA0001991677080000035

本发明将用深层架构使用L个氨基酸训练构建X→Y的映射函数。训练后,当一个新的氨基酸x输入时,深层架构可以使用映射函数确定x对应的标签y。The present invention will use the deep framework to use L amino acid training to construct the mapping function of X→Y. After training, when a new amino acid x is input, the deep architecture can use the mapping function to determine the label y corresponding to x.

步骤200:多层组合卷积神经网络特征提取,使用多个不同层次的卷积神经网络分别提取特征,并与原始特征组合后传给下一层。Step 200: Multi-layer combined convolutional neural network feature extraction, using multiple convolutional neural networks at different levels to extract features respectively, and combining with the original features to pass to the next layer.

如图2所示,在多层组合卷积神经网络特征提取步骤中,包括如下步骤:As shown in Figure 2, in the multi-layer combined convolutional neural network feature extraction step, the following steps are included:

步骤210:1层一维卷积神经网络特征提取,如图3所示,分别用卷积核结点数为3,5,7,9,11的一维卷积网对蛋白质中的氨基酸序列进行卷积运算提取特征,卷积网的输入通道数为42,对应一个氨基酸的42个特征,输出通道数为50。Step 210: 1-layer one-dimensional convolutional neural network feature extraction, as shown in Figure 3, respectively use the one-dimensional convolutional network with the number of convolution kernel nodes as 3, 5, 7, 9, 11 to carry out the amino acid sequence in the protein The convolution operation extracts features. The number of input channels of the convolutional network is 42, corresponding to 42 features of an amino acid, and the number of output channels is 50.

以卷积核结点数为3的一维卷积网为例,其功能是对输入数据进行特征提取,其内部包含50个卷积核,组成卷积核的每个元素都对应3个权重系数和一个偏差量,类似于一个前馈神经网络的神经元。对于输入X的每一行特征,卷积核会依次对每个氨基酸及其相邻的氨基酸做矩阵元素乘法求和;卷积运算后基于得到的每一列特征加权求和并叠加偏差量:Take a one-dimensional convolutional network with 3 convolution kernel nodes as an example. Its function is to extract features from the input data. It contains 50 convolution kernels inside, and each element that makes up the convolution kernel corresponds to 3 weight coefficients. and a bias, similar to the neurons of a feed-forward neural network. For each row feature of the input X, the convolution kernel will perform matrix element multiplication and summation on each amino acid and its adjacent amino acids in turn; after the convolution operation, the weighted summation based on each column feature obtained and superimposed deviation:

Figure GDA0001991677080000041
Figure GDA0001991677080000041

i∈{0,1,…,L1}i∈{0,1,…,L 1 }

其中b0是偏差量。h0和h1表示卷积的输入和输出,

Figure GDA0001991677080000049
等于输入X的第k个氨基酸。L1是输出通道数,K0是特征图的通道数,f是卷积核大小。在这里L1=50,K0=42,f=3。where b 0 is the amount of bias. h 0 and h 1 represent the input and output of the convolution,
Figure GDA0001991677080000049
Equal to the kth amino acid of input X. L 1 is the number of output channels, K 0 is the number of channels of the feature map, and f is the size of the convolution kernel. Here L 1 =50, K 0 =42, f=3.

然后将卷积网的输出用ReLU函数激活:Then activate the output of the convolutional network with the ReLU function:

Figure GDA0001991677080000042
Figure GDA0001991677080000042

最后用分批归一化函数防止模型过拟合:Finally, use the batch normalization function to prevent the model from overfitting:

Figure GDA0001991677080000043
Figure GDA0001991677080000043

卷积核结点数为5,7,9,11的1层一维卷积网与上述结构相同,分别将f的值设为5,7,9,11即可。则1层一维卷积神经网络提取的特征为

Figure GDA0001991677080000044
The one-layer one-dimensional convolutional network with the number of convolution kernel nodes of 5, 7, 9, and 11 is the same as the above structure, and the value of f is set to 5, 7, 9, and 11 respectively. Then the features extracted by the 1-layer one-dimensional convolutional neural network are
Figure GDA0001991677080000044

步骤220:2层一维卷积神经网络特征提取,比1层一维卷积神经网络特征提取多了1次卷积、激活和分批归一化运算。以卷积核结点数为3的一维卷积网为例,对于输入X依次进行2次卷积、激活和分批归一化运算:Step 220: 2-layer one-dimensional convolutional neural network feature extraction, one more operation of convolution, activation and batch normalization than the one-layer one-dimensional convolutional neural network feature extraction. Taking a one-dimensional convolutional network with a convolution kernel node number of 3 as an example, two convolution, activation, and batch normalization operations are performed on the input X in sequence:

Figure GDA0001991677080000045
Figure GDA0001991677080000045

i∈{0,1,…,L1}i∈{0,1,…,L 1 }

Figure GDA0001991677080000046
Figure GDA0001991677080000046

Figure GDA0001991677080000047
Figure GDA0001991677080000047

Figure GDA0001991677080000048
Figure GDA0001991677080000048

i∈{0,1,…,L2}i∈{0,1,…,L 2 }

其中b1是偏差量。d1是1次卷积、激活和分批归一化运算的输出,h2表示卷积的输入和输出。L2是输出通道数,K1是特征图的通道数,f是卷积核大小。在这里L2=50,K1=50,f=3。where b1 is the amount of bias. d 1 is the output of 1 convolution, activation and batch normalization operation, and h 2 represents the input and output of convolution. L 2 is the number of output channels, K 1 is the number of channels of the feature map, and f is the size of the convolution kernel. Here L 2 =50, K 1 =50, f=3.

然后将卷积网的输出用ReLU函数激活:Then activate the output of the convolutional network with the ReLU function:

Figure GDA0001991677080000051
Figure GDA0001991677080000051

最后用分批归一化函数防止模型过拟合:Finally, use the batch normalization function to prevent the model from overfitting:

Figure GDA0001991677080000052
Figure GDA0001991677080000052

卷积核结点数为5,7,9,11的2层一维卷积网与上述结构相同,分别将f的值设为5,7,9,11即可。则2层一维卷积神经网络提取的特征为

Figure GDA0001991677080000053
The 2-layer one-dimensional convolutional network with convolution kernel nodes of 5, 7, 9, and 11 is the same as the above structure, and the value of f is set to 5, 7, 9, and 11 respectively. Then the features extracted by the 2-layer one-dimensional convolutional neural network are
Figure GDA0001991677080000053

步骤230:3层一维卷积神经网络特征提取,比2层一维卷积神经网络特征提取多了1次卷积、激活和分批归一化运算。对于输入X依次进行3次卷积、激活和分批归一化运算,第3次卷积运算的输出通道数L3=50。3层一维卷积神经网络提取的特征为

Figure GDA0001991677080000054
Step 230: 3-layer one-dimensional convolutional neural network feature extraction, one more operation of convolution, activation and batch normalization than 2-layer one-dimensional convolutional neural network feature extraction. For the input X, three convolution, activation and batch normalization operations are performed sequentially, and the number of output channels of the third convolution operation is L 3 =50. The features extracted by the three-layer one-dimensional convolutional neural network are
Figure GDA0001991677080000054

步骤240:原始特征与多个卷积神经网络输出特征组合输出,将输入X与步骤210、步骤220、步骤230提取的特征Step 240: Combining the original features with multiple convolutional neural network output features and outputting the input X with the features extracted in step 210, step 220, and step 230

Figure GDA0001991677080000055
Figure GDA0001991677080000055

组合在一起,输出到步骤300。步骤210、步骤220和步骤230各输出50×5个特征,则共有42+50×5×3=792个特征输出到双向长短期记忆神经网络中继续处理。combined and output to step 300. Step 210, step 220 and step 230 each output 50×5 features, then a total of 42+50×5×3=792 features are output to the bidirectional long-short-term memory neural network for further processing.

步骤300:双向长短期记忆神经网络特征提取,分别由前向长短期记忆神经网络和后向长短期记忆神经网络提取特征并组合而成。Step 300: Bidirectional long-short-term memory neural network feature extraction, respectively extracting features from the forward long-term short-term memory neural network and the backward long-term short-term memory neural network and combining them.

Figure GDA0001991677080000056
Figure GDA0001991677080000056

其中,

Figure GDA0001991677080000057
表示长短期记忆神经网络基于前t-1个氨基酸特征的在第t个位置提取到的特征表示。/>
Figure GDA0001991677080000058
表示长短期记忆神经网络基于后L-t个氨基酸特征的在第t个位置提取到的特征表示。长短期记忆神经网络隐藏层的结点数为800,则前向和后向长短期记忆神经网络整合后输出的特征个数为1600。in,
Figure GDA0001991677080000057
Indicates the feature representation extracted by the long short-term memory neural network based on the first t-1 amino acid features at the tth position. />
Figure GDA0001991677080000058
Indicates the feature representation extracted by the long-short-term memory neural network based on the last Lt amino acid features at the tth position. The number of nodes in the hidden layer of the long-term short-term memory neural network is 800, and the number of features output after the integration of the forward and backward long-term short-term memory neural networks is 1600.

步骤400:全连接层建立特征映射,使用全连接层将步骤300提取的特征gt映射到蛋白质8个不同的类别。Step 400: The fully connected layer establishes a feature map, and uses the fully connected layer to map the feature g t extracted in step 300 to 8 different categories of proteins.

Figure GDA0001991677080000059
Figure GDA0001991677080000059

其中wst是连接第s个特征到第t个类别的权重,bt是第t个类别的偏差量。gst是步骤300输出的第s个氨基酸第t个特征。N是gst的特征个数。在这里N=1600。where w st is the weight connecting the sth feature to the tth category, and bt is the bias of the tth category. g st is the t-th feature of the s-th amino acid output in step 300 . N is the number of features of g st . Here N=1600.

步骤500:训练深度神经网络,使用L个标注数据来优化参数空间W,从而使深层架构拥有更好的区分能力。这个任务可以转化为一个优化问题:Step 500: Train the deep neural network, use L labeled data to optimize the parameter space W, so that the deep architecture has better discrimination ability. This task can be transformed into an optimization problem:

Figure GDA0001991677080000061
Figure GDA0001991677080000061

其中in

Figure GDA0001991677080000062
Figure GDA0001991677080000062

T表示损失函数。在本发明中T是交叉熵误差函数,使用RMSProp优化器基于T训练前面步骤中构建的深度神经网络,并用L2正则化方法防止过拟合。T represents the loss function. In the present invention, T is a cross-entropy error function, and the RMSProp optimizer is used to train the deep neural network constructed in the previous steps based on T, and an L2 regularization method is used to prevent overfitting.

本发明提出一种基于深度神经网络的蛋白质二级结构预测方法,利用多层卷积神经网络的特征提取、多个不同层次特征的组合和双向长短期记忆神经网络特征提取,组合实现了一个用于蛋白质二级结构预测的深层架构,有效提升了蛋白质二级结构预测的正确率。The present invention proposes a protein secondary structure prediction method based on a deep neural network, using the feature extraction of a multi-layer convolutional neural network, the combination of multiple different-level features and the feature extraction of a bidirectional long-short-term memory neural network to achieve a combination of Based on the deep structure of protein secondary structure prediction, it effectively improves the accuracy of protein secondary structure prediction.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims (2)

1. A protein secondary structure prediction method based on a deep neural network comprises the following steps:
extracting protein sequence characteristics: the input features of the network are the sequence and structure information of amino acids, and the data is downloaded from PDB and marked by DSSP system; wherein the number of the characteristics of the amino acid sequence information is 21, the number of the characteristics of the amino acid structure information is 21, and 42 characteristics of each amino acid are used for predicting the corresponding secondary structure;
multilayer combined convolutional neural network feature extraction: respectively extracting features by using a plurality of convolutional neural networks with different layers, combining the features with the original features, and transmitting the combined features to the next layer;
two-way long-short-term memory neural network feature extraction: extracting features by using a two-way long-short-term memory neural network and transmitting the features to the next layer; the number of nodes of the hidden layer of the long-short-period memory neural network is 800, and the number of features output after the integration of the forward and backward long-short-period memory neural networks is 1600;
the full connection layer establishes a feature map: mapping features to 8 different classes of proteins using the full-ligation layer;
training the deep neural network: the deep neural network constructed in the previous step was trained based on cross entropy errors between labels and logits using an RMSProp optimizer, and an L2 regularization method was used to prevent overfitting.
2. The protein secondary structure prediction method based on deep neural network according to claim 1, wherein in the multilayer combined convolutional neural network feature extraction, a plurality of one-dimensional convolution kernels with different lengths are used for simulating windows with different sizes to extract amino acid sequence features, and the construction mode of the combined convolutional neural network specifically comprises the following steps:
1 layer one-dimensional convolution neural network construction: the method comprises the steps of performing convolution operation on amino acid sequences in proteins by using one-dimensional convolution networks with the number of convolution kernel nodes of 3,5,7,9 and 11 respectively to extract characteristics, wherein the number of input channels of the convolution network is 42, the number of output channels of the convolution network corresponds to 42 characteristics of one amino acid, the number of output channels of the convolution network is 50, activating output of the convolution network by using a ReLU function, and finally preventing model overfitting by using a batch normalization function;
2-layer one-dimensional convolutional neural network construction: the characteristic extraction of the one-dimensional convolution neural network is more than that of the 1-layer one-dimensional convolution neural network by 1 time, and the convolution, activation and batch normalization operations are performed; sequentially performing convolution, activation and batch normalization operation on the input X for 2 times;
3-layer one-dimensional convolution neural network construction: the characteristic extraction of the one-dimensional convolution neural network is more than that of the 2-layer one-dimensional convolution neural network by 1 convolution, activation and batch normalization operation; sequentially carrying out convolution, activation and batch normalization operation on the input X for 3 times;
the original features are combined with a plurality of convolutional neural network output features to output: extracting amino acid sequence features by using 1 layer, 2 layer and 3 layer one-dimensional convolutional neural networks respectively, and combining with original features to output, so as to share
Figure 81631DEST_PATH_IMAGE002
The characteristic is output to the two-way long-short-term memory neural network for continuous processing.
CN201910085554.9A 2019-01-29 2019-01-29 A Method for Protein Secondary Structure Prediction Based on Deep Neural Network Active CN109817276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910085554.9A CN109817276B (en) 2019-01-29 2019-01-29 A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910085554.9A CN109817276B (en) 2019-01-29 2019-01-29 A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

Publications (2)

Publication Number Publication Date
CN109817276A CN109817276A (en) 2019-05-28
CN109817276B true CN109817276B (en) 2023-05-23

Family

ID=66605588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910085554.9A Active CN109817276B (en) 2019-01-29 2019-01-29 A Method for Protein Secondary Structure Prediction Based on Deep Neural Network

Country Status (1)

Country Link
CN (1) CN109817276B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276113A (en) * 2019-06-11 2019-09-24 嘉兴深拓科技有限公司 A kind of network structure prediction technique
CN110310698A (en) * 2019-07-05 2019-10-08 齐鲁工业大学 Classification modeling method and system based on protein length and DCNN
CN110534160B (en) * 2019-09-02 2022-09-30 河南师范大学 Method for predicting protein solubility by convolutional neural network
CN110827923B (en) * 2019-11-06 2021-03-02 吉林大学 Prediction method of semen protein based on convolutional neural network
CN110827922B (en) * 2019-11-06 2021-04-16 吉林大学 Prediction method of amniotic fluid protein based on recurrent neural network
CN111210869B (en) * 2020-01-08 2023-06-20 中山大学 Protein refrigeration electron microscope structure analysis model training method and analysis method
CN113223620B (en) * 2021-05-13 2023-02-07 西安电子科技大学 Protein solubility prediction method based on multi-dimensional sequence embedding
CN115527613A (en) * 2021-09-13 2022-12-27 烟台双塔食品股份有限公司 Pea protein data feature coding and extracting method
CN113921086B (en) * 2021-09-14 2024-08-02 上海中科新生命生物科技有限公司 Protein de novo peptide sequencing method and system based on mass spectrometry
CN113851192B (en) * 2021-09-15 2023-06-30 安庆师范大学 Training method and device for amino acid one-dimensional attribute prediction model and attribute prediction method
CN115312119B (en) * 2022-10-09 2023-04-07 之江实验室 Method and system for identifying protein domains based on protein three-dimensional structure images
CN116304889A (en) * 2023-05-22 2023-06-23 鲁东大学 A Receptor Classification Method Based on Convolution and Transformer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
WO2017212802A1 (en) * 2016-06-06 2017-12-14 日立マクセル株式会社 Water discharge facility structure, shower head provided with water discharge facility structure, faucet water supply facility, waterfall bath facility
CN108197427A (en) * 2018-01-02 2018-06-22 山东师范大学 Proteins subcellular location method and apparatus based on depth convolutional neural networks
CN108363774A (en) * 2018-02-09 2018-08-03 西北大学 A kind of drug relationship sorting technique based on multilayer convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017212802A1 (en) * 2016-06-06 2017-12-14 日立マクセル株式会社 Water discharge facility structure, shower head provided with water discharge facility structure, faucet water supply facility, waterfall bath facility
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
CN108197427A (en) * 2018-01-02 2018-06-22 山东师范大学 Proteins subcellular location method and apparatus based on depth convolutional neural networks
CN108363774A (en) * 2018-02-09 2018-08-03 西北大学 A kind of drug relationship sorting technique based on multilayer convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction";Matt Spencer等;《IEEE/ACM Transactions on Computational Biology and Bioinformatics》;20151231;103-112 *
"基于卷积长短时记忆神经网络的蛋白质二级结构预测";郭延哺等;《模式识别与人工智能》;20180630;第31卷(第6期);562-568 *

Also Published As

Publication number Publication date
CN109817276A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN109817276B (en) A Method for Protein Secondary Structure Prediction Based on Deep Neural Network
US20230316699A1 (en) Image semantic segmentation algorithm and system based on multi-channel deep weighted aggregation
CN109597891B (en) Text emotion analysis method based on bidirectional long-and-short-term memory neural network
CN109902293B (en) A text classification method based on local and global mutual attention mechanism
CN106980683B (en) Blog text abstract generating method based on deep learning
CN113469119B (en) Cervical cell image classification method based on visual converter and graph convolutional network
CN111046661B (en) Reading understanding method based on graph convolution network
CN111291556B (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN113095415A (en) Cross-modal hashing method and system based on multi-modal attention mechanism
CN111476315B (en) An Image Multi-label Recognition Method Based on Statistical Correlation and Graph Convolution Technology
CN111897957B (en) Capsule neural network integrating multi-scale feature attention and text classification method
CN109934261A (en) A knowledge-driven parameter propagation model and its few-shot learning method
CN108154228A (en) A kind of artificial neural networks device and method
CN109766557B (en) A sentiment analysis method, device, storage medium and terminal equipment
CN111400494B (en) A sentiment analysis method based on GCN-Attention
CN112989835B (en) Extraction method of complex medical entities
CN109919175B (en) Entity multi-classification method combined with attribute information
CN112733768A (en) Natural scene text recognition method and device based on bidirectional characteristic language model
CN116402066A (en) Attribute-level text emotion joint extraction method and system for multi-network feature fusion
CN116502181A (en) Channel expansion and fusion-based cyclic capsule network multi-modal emotion recognition method
CN111199797A (en) Auxiliary diagnosis model establishing and auxiliary diagnosis method and device
CN115588122A (en) A News Classification Method Based on Multimodal Feature Fusion
CN115116548A (en) Data processing method, apparatus, computer equipment, medium and program product
CN111783688A (en) A classification method of remote sensing image scene based on convolutional neural network
CN116883723A (en) A compositional zero-shot image classification method based on parallel semantic embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20241001

Address after: Across from Zhujiang Township Central Primary School, Anfu County, Ji'an City, Jiangxi Province 343223

Patentee after: Jiangxi Qixin Raincoat Manufacturing Co.,Ltd.

Country or region after: China

Address before: 264025 No. 186 Hongqi Middle Road, Zhifu District, Shandong, Yantai

Patentee before: LUDONG University

Country or region before: China

TR01 Transfer of patent right