[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114596912B - Short peptide histology identification method based on polypeptide length and application thereof - Google Patents

Short peptide histology identification method based on polypeptide length and application thereof Download PDF

Info

Publication number
CN114596912B
CN114596912B CN202210151752.2A CN202210151752A CN114596912B CN 114596912 B CN114596912 B CN 114596912B CN 202210151752 A CN202210151752 A CN 202210151752A CN 114596912 B CN114596912 B CN 114596912B
Authority
CN
China
Prior art keywords
polypeptide
theoretical
mass
ion
charge ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210151752.2A
Other languages
Chinese (zh)
Other versions
CN114596912A (en
Inventor
徐巨才
黄峻洪
陈雅君
严嘉慧
梁姚顺
陈静
刘万顺
范丽琪
江桂玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuyi University
Original Assignee
Wuyi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuyi University filed Critical Wuyi University
Priority to CN202210151752.2A priority Critical patent/CN114596912B/en
Publication of CN114596912A publication Critical patent/CN114596912A/en
Application granted granted Critical
Publication of CN114596912B publication Critical patent/CN114596912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a polypeptide length-based short peptide histology identification method and application thereof. The short peptide group identification method based on the polypeptide length can be used for carrying out the instantiation of all polypeptide fragments within the specified polypeptide length range, generating theoretical primary parent ions and theoretical secondary child ions one by one, continuously comparing the theoretical primary parent ions and the theoretical secondary child ions with actual ion spectrogram data acquired by substances in a sample, correlating the polypeptide meeting the matching requirement with corresponding substances, and finally realizing the identification of the polypeptide sequence structure of the substances in the sample. The method can directly search and identify the protein enzymolysis products without dead angles and omission without depending on a protein sequence database, and meanwhile, the method adopts a completely matched evaluation mode for the short peptides, so that the method has strong accuracy and high reliability of the identification result, and can fully make up the defects of the current proteomic analysis method and tools. The invention has important significance for improving the level of the analytical technology of polypeptide group and promoting the development of industry.

Description

Short peptide histology identification method based on polypeptide length and application thereof
Technical Field
The invention relates to the field of polypeptide histology identification, in particular to a short peptide histology identification method based on polypeptide length and application thereof.
Background
After the food-borne protein is processed by a modern biological enzymolysis technology and becomes the food-borne polypeptide, not only the molecular weight is smaller, and the digestion and absorption characteristics are greatly enhanced, but also various biological activities of enzymolysis products, such as antioxidation, anti-aging, memory improvement and the like, are endowed, and the food-borne polypeptide is widely focused by scientific researchers, consumers and food enterprises. It is notable that since most proteases for food use are crude enzymes and the enzyme sites are wide, the enzymatic products often contain a large amount of small molecule short peptides. Meanwhile, the short peptides endow the enzymolysis products with stronger digestion and absorption characteristics and functional characteristics. However, the composition and distribution of short peptides in enzymatic products has not been able to be resolved until now due to the lack of efficient analytical methods and tools.
In the related art, researchers mainly analyze the polypeptide composition of the food-derived protein enzymatic hydrolysate by using proteomics related methods and software, such as Mascot, maxquant, sequest, peaks. However, these proteomic studies and software subjects are mainly proteins, not polypeptides. Short peptides are often ignored by proteomic analysis methods and tools because of the high frequency of occurrence in each protein sequence, and the general lack of specificity required for protein sequence identification. Wherein, the identification result of Mascot is mainly distributed above 7 peptides, and the identification capability of short peptides is basically deleted; maxquat is mainly used for proteomics analysis during specific enzymolysis, and is relatively weak during analysis of nonspecific enzyme degradation products, such as analysis time period of about month and short peptides less in analysis results. In addition, tools such as Mascot, maxquant and sequence must rely on known protein databases when performing polypeptide analysis, and food protein raw materials often cannot provide complete or accurate sequence databases, which greatly increases the difficulty of short peptide analysis. Peaks is used as a polypeptide analysis tool for de-sequencing from the head, and polypeptide resolution can be carried out independently of a protein database, but the practical process finds that the resolution result of the software is less, and the analysis requirement of food short peptide group science cannot be met. Meanwhile, there are few studies on identification and application of the short peptide group in the related art, and thus, a need for a perfect identification method of the short peptide group is urgent.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a polypeptide length-based short peptide group identification method, which can directly search and identify protein enzymolysis products without dead angles and omission without depending on a protein sequence database.
The invention also provides application of the short peptide histology identification method based on the polypeptide length in the polypeptide histology.
In a first aspect of the invention, there is provided a method of identifying a short peptide group based on the length of a polypeptide, comprising the steps of:
s1, presetting a polypeptide fragment length range, an amino acid residue list and a secondary ion cluster analysis type, and obtaining mass spectrum data of primary parent ions and secondary child ions of substances acquired in a mass spectrum of a sample to be detected;
s2, presetting a primary parent ion signal response intensity threshold value and a primary parent ion mass-to-charge ratio range, and screening mass spectrum data obtained in the step S1 according to the preset primary parent ion signal response intensity threshold value and the primary parent ion mass-to-charge ratio range to obtain a substance set C2 meeting requirements in a sample to be detected;
s3, obtaining a polypeptide fragment length value L within a preset polypeptide fragment length range, and calculating the number N of all theoretical polypeptide fragments under each polypeptide fragment length according to the number N of amino acid residue types in the amino acid residue list in the step S1 L Obtaining a sequence set of the theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio of the theoretical polypeptide fragments;
s4, screening candidate polypeptide fragments meeting conditions in the substance set C2 in the step S2 according to the primary theoretical parent ion mass-charge ratio and the secondary theoretical child ion cluster mass-charge ratio of the theoretical polypeptide fragments;
and S5, scoring the candidate polypeptide fragments, and selecting the candidate polypeptide fragment with the highest score to judge the candidate polypeptide fragment with the highest score as a polypeptide fragment sequence within a preset length range in the sample to be detected.
According to some embodiments of the invention, the predetermined polypeptide fragment length in step S1 is in any length range, preferably 2-7.
According to some embodiments of the invention, the amino acid residues in the list of amino acid residues are any non-repeating amino acid residues, and the amino acid residue numbers are numbered starting from 0 and are required to be unique and contiguous.
According to some embodiments of the invention, the amino acid residues may be common 20 amino acid residues, or may be unusual amino acid residues.
According to some embodiments of the invention, the unusual amino acid residues comprise at least one of hydroxyproline residues and selenocysteine residues.
According to some embodiments of the present invention, the secondary ion cluster type in step S1 may be any one or more of an a ion cluster, a b ion cluster, and a y ion cluster.
According to some embodiments of the invention, the threshold value in step S2 is 0-0.02Da or the threshold value is 0-40ppm.
According to some embodiments of the invention, the secondary sub-ion cluster mass to charge ratio deviation value in step S2 is in the range of 0-0.05Da, and/or the secondary sub-ion cluster mass to charge ratio deviation value is in the range of 0-80ppm.
According to some embodiments of the present invention, the method for calculating the mass-to-charge ratio of the secondary theoretical ion cluster in step S4 is as follows:
wherein: mz (a) k ) Is a as k Mass-to-charge ratio of ions; mz (b) k ) B is k Mass-to-charge ratio of ions; mz (y) k ) Is y k Mass-to-charge ratio of ions; l is the length value of the polypeptide fragment to be searched; k is the ion number: an integer from 1 to L; m (H) + ) Is the molar mass of hydrogen ions, M (A j ) Represents the jth amino acid residue (A) in the polypeptide fragment j ) M (CO) is the molar mass of CO (carbonyl); m (H) 2 O) is the molar mass of water molecules.
According to some embodiments of the present invention, the specific steps of screening the sample to be tested to obtain the satisfactory substance set C2 in step S2 are as follows:
s11, acquiring charge information of each ion in the primary mass spectrum and the secondary mass spectrum according to the primary parent ion and the secondary child ion cluster mass spectrum data of the substance acquired by the sample to be detected in the mass spectrum in the step S1, and performing ion standardization treatment to acquire a primary parent ion mass-to-charge ratio and a secondary child ion cluster mass-to-charge ratio of the substance;
S12, filtering the substances in the step S11 according to the primary parent ion signal response intensity to obtain a substance set C1 higher than the primary parent ion signal response intensity threshold in the step S2;
and S13, filtering the substance set in the step S12 according to the primary parent ion mass-to-charge ratio range in the step S2 to obtain a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement.
According to some embodiments of the invention, the ion normalization process refers to converting multi-charge ions in a mass spectrum into ions with unit positive charges through mass-to-charge ratio calculation, wherein unknown charge ions default to single charge ions.
According to some embodiments of the invention, the computational transformation process in the ion normalization process is exemplified as follows:
the substance with z positive charges and the mass-to-charge ratio actually measured as X is converted into ions with unit positive charges, and the mass-to-charge ratio of the converted ions with unit positive charges is (Xxz-zxH+H)/1, wherein H is the molar mass of hydrogen ions with unit positive charges.
According to some embodiments of the invention, the primary parent ion signal response intensity threshold is 3 times or more the instrument noise intensity; the primary parent ion mass-to-charge ratio range is 70-2000Da.
According to some embodiments of the invention, the screening of the candidate polypeptide fragments in step S4 comprises the steps of:
s21, acquiring an unsearched length value L within the length range of the preset polypeptide fragment in the step S1;
s22, initializing a polypeptide coding value u=0;
s23, carrying out N-ary conversion on a polypeptide coding value U according to the number N of amino acid residue types, and representing a calculation result according to L bits to obtain a sequence codon X;
s24, associating numbers at each position in the sequence codon X with amino acid residue sequence numbers, and translating the sequence codon X into a theoretical polypeptide fragment P;
s25, obtaining a primary theoretical parent ion mass-to-charge ratio Z and a secondary theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
s26, calculating the absolute value of the difference value between the first-order parent ion mass-to-charge ratio of the substance set C2 and the first-order theoretical parent ion mass-to-charge ratio Z of the theoretical polypeptide fragment, and marking the absolute value as E 1 Obtaining a substance set F1 with an absolute value meeting a threshold, and judging whether the substance set F1 is empty:
if the sequence is empty, acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragment P in the step S3, wherein the polypeptide coding value is added with 1 on the basis of the original value;
if not, comparing the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference value between the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified, the absolute value of which is in the mass-to-charge ratio deviation value range of the secondary sub-ion clusters, and judging whether the substance set F2 is blank or not:
If the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragments in the step S3, wherein the polypeptide coding value is added with 1 on the basis of the original value; if the substance collection F2 is empty, directly counting the next theoretical polypeptide fragment, and adding 1 to the polypeptide coding value on the basis of the original value;
s27, repeating the steps S22-S26 until the encoded value U of the polypeptide is equal to the number of all possible polypeptides of the length, i.e. U=N L
S28, obtaining the length value of the next non-searched polypeptide fragment, and repeating the steps S21-S27 until the length values of the polypeptide fragments within the preset length range of the polypeptide fragments in the step S1 are searched.
According to some embodiments of the invention, the set threshold in step S26 is 0-0.02Da or the set threshold is 0-40ppm.
According to some embodiments of the invention, the secondary sub-ion cluster mass to charge ratio deviation value in step S26 is in the range of 0-0.05Da, and/or the secondary sub-ion cluster mass to charge ratio deviation value is in the range of 0-80ppm.
According to some embodiments of the invention, the scoring method in step S5 includes two methods:
The first scoring calculation method comprises the following steps: comparing the candidate polypeptide fragments with the type of the secondary ion cluster of the actual substance in the sample to be detected, counting 10 points when detecting one complete ion cluster, and outputting the result to select the candidate identification polypeptide with the highest score as the final identification result by default, and marking the final identification result as S A
The second scoring calculation method is as follows: recording the number of secondary ion types of substances in the sample to be detected matched with the candidate polypeptide fragments, and recording the number as N M Calculating the average value of the absolute value of the deviation of the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragment, and marking as E 2 Second score S B The calculation formula of (2) is as follows:
wherein C is the number of theoretical secondary ion cluster types, and L is the length value of the polypeptide.
According to some embodiments of the invention, the output results of the scoring in the step S5 are respectively according to S A >S B >E 1 >E 2 Is ordered by priority order of S A 、S B The higher the score, the higher the confidence of the result, E 1 、E 2 The smaller the error, the higher the reliability of the result, and the candidate identification polypeptide with the highest grading in the first ranking result is selected as the final preferred identification result by default.
According to some embodiments of the invention, in the first score calculating method, the secondary ion cluster type may be any one of a, b or y ion clusters or a combination thereof.
In a second aspect, the invention provides the use of a method for identifying short peptide groups based on the length of a polypeptide in peptide groups or proteomics.
According to some embodiments of the invention, there are at least the following advantages:
(1) The short peptide group identification method based on the polypeptide length can perform the exam and search matching in the appointed polypeptide search range, does not need to depend on a protein database at all, realizes the indiscriminate and non-missing short peptide search matching, greatly improves the short peptide identification efficiency and flux, and remarkably lightens the burden of scientific research staff.
(2) The short peptide group identification method based on the polypeptide length can not only efficiently search short peptides, but also search long peptides when necessary, and the method supports parallel computing processing, thereby having higher flexibility and expanding prospect.
(3) According to the short peptide histology identification method based on the polypeptide length, the short peptides are subjected to infinite search in a one-by-one mode, so that the requirements and loads on computer hardware are greatly reduced, and the working efficiency and simplicity are improved.
(4) The short peptide group identification method based on the polypeptide length adopts a completely matched evaluation mode for the identification result, and the identification result is reliable and accurate.
Drawings
The invention is further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flowchart of the identification of short peptides according to an embodiment of the present invention.
FIG. 2 is a mass spectrum of the identification result of the sample in example 1 of the present invention.
FIG. 3 is a statistical chart of the identification results of the sample in example 2 of the present invention.
FIG. 4 is a statistical chart of the identification results of the sample in comparative example 1 of the present invention.
Detailed Description
The conception and the technical effects produced by the present invention will be clearly and completely described in conjunction with the embodiments below to fully understand the objects, features and effects of the present invention. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present invention based on the embodiments of the present invention.
To fully demonstrate the innovativeness and advancement of the present invention, high resolution lc data of the same sample, either glutathione standard or soy protein hydrolysate (protease is Alcalase broad spectrum alkaline protease), was used in the following examples and comparative examples, and the liquid phase separation consisted of mobile phase a (acetonitrile) and mobile phase B (0.1% formic acid by volume fraction) using a lc I-class (Waters) -ESI-Q-TOF (bruker, germany). The gradient elution procedure was: at 0-60min, the volume fraction of mobile phase B is changed from 95% to 60%; at 60-64min, the volume fraction of mobile phase B was changed from 60% to 95%, at 64-70min, the volume fraction of mobile phase B was 95%, the column was 1.0mm×100mm HSS T3 (1.8 μm,100A, waters, USA), the sample loading was 1. Mu.L, the flow rate was 0.05mL/min, and the column temperature was 40 ℃. Mass spectrometry ESI-Q-TOF was operated in positive ion scan mode, with Auto MS/MS automatic secondary detection being taken for the first 3 parent ions. The detection range of the mass spectrum ions is 50-1200m/z, and the spectrum collection rate of the mass spectrum during two-dimensional liquid analysis is 10HZ.
Examples
The embodiment discloses a short peptide group identification method of candidate polypeptide fragment length, which is shown in figure 1 and comprises the following specific steps:
(1) The length range of the polypeptide fragment, the amino acid residue list and the type of the secondary ion cluster are preset, and mass spectrum data of primary parent ions and secondary child ions of substances acquired in mass spectrum of a sample to be detected are obtained.
Wherein the length range of the preset polypeptide fragment is any length range, preferably 2-7; amino acid residues in the amino acid residue list are any non-repeated amino acid residues, such as the common 20 amino acid residues, the amino acid residue numbers are numbered from 0, and are required to be unique and continuous; the secondary ion cluster type includes any one or more of an a ion cluster, a b ion cluster, and a y ion cluster.
(2) And screening according to a preset primary parent ion signal response intensity threshold value and a primary parent ion mass-to-charge ratio range to obtain a substance set C2 (hereinafter referred to as a substance set C2) meeting requirements in the sample to be detected. The specific steps for screening and obtaining the substance set C2 meeting the requirements in the sample to be tested are as follows:
1) Acquiring primary parent ion and secondary ion mass spectrum data of a substance in a mass spectrum according to the sample to be detected in the step (1), obtaining charge information of each ion in the primary mass spectrum and the secondary mass spectrum, and carrying out ion standardization treatment to obtain a primary parent ion mass-to-charge ratio and a secondary child ion cluster mass-to-charge ratio of the substance; the ion normalization process is to convert multi-charge ions in a mass spectrogram into ions with unit positive charges through mass-to-charge ratio calculation, wherein unknown charge ions default to single charge ions, and the calculation conversion process is exemplified as follows: the substance with z positive charges and the mass-to-charge ratio actually measured as X is converted into ions with unit positive charges, and the mass-to-charge ratio of the converted ions with unit positive charges is (Xxz-zxH+H)/1, wherein H is the molar mass of hydrogen ions with unit positive charges.
2) Filtering the substances according to the primary parent ion signal response intensity, wherein the primary parent ion signal response intensity threshold is 3 times or more of the noise intensity of the instrument, and obtaining a substance set C1 (hereinafter referred to as a substance set C1) higher than the primary parent ion signal response intensity threshold after filtering;
3) And filtering the substances with the primary parent ion signal response intensity higher than the threshold value in the step 2) according to the primary parent ion mass-to-charge ratio range, wherein the primary parent ion mass-to-charge ratio range is 70-2000Da, and a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement is obtained.
(3) Obtaining a polypeptide fragment length value L within a preset polypeptide fragment length range, and calculating the number N of all theoretical polypeptide fragments under the polypeptide fragment length according to the number N of amino acid residue types in the amino acid residue list in the step (1) L And obtaining a sequence set of theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio of the theoretical polypeptide fragments. The calculation method of the mass-to-charge ratio of the secondary theoretical ion cluster comprises the following steps:
wherein: mz (a) k ) Is a as k Mass-to-charge ratio of ions; mz (b) k ) B is k Mass-to-charge ratio of ions; mz (y) k ) Is y k Mass-to-charge ratio of ions; l is the length value of the polypeptide fragment to be searched; k is the ion number: an integer from 1 to L; m (H) + ) Is the molar mass of hydrogen ions, M (A j ) Represents the jth amino acid residue (A) in the polypeptide fragment j ) M (CO) is the molar mass of CO (carbonyl); m (H) 2 O) is the molar mass of water molecules.
(4) Screening candidate polypeptide fragments meeting conditions in the substance set C2 in the step (2) according to the primary theoretical parent ion mass-to-charge ratio and the secondary theoretical ion cluster mass-to-charge ratio of the theoretical polypeptide fragments, wherein the screening of the candidate polypeptide fragments comprises the following steps:
1) Acquiring an unsearched length value L within the length range of the preset polypeptide fragment in the step (1);
2) Initializing a polypeptide code value u=0, and listing a polypeptide fragment;
3) Performing N-ary conversion on the polypeptide coding value U according to the number N of the amino acid residues, and representing the calculation result according to L bits to obtain a sequence codon X;
4) Correlating the number at each position in sequence codon X with the amino acid residue sequence number, translating sequence codon X into the theoretical polypeptide fragment P;
5) Obtaining a first-stage theoretical parent ion mass-to-charge ratio Z and a second-stage theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
6) Calculating the absolute value of the difference between the first-order parent ion mass-to-charge ratio of the substance set C2 and the first-order theoretical parent ion mass-to-charge ratio Z of the theoretical polypeptide fragment P, and marking as E 1 Obtaining a substance set F1 (hereinafter referred to as a substance set F1) with an absolute value meeting a threshold, wherein the threshold is 0-0.02Da or the threshold is 0-40ppm, and judging whether the substance set F1 is empty or not:
if the polypeptide is empty, the next polypeptide fragment is exemplified, and the polypeptide coding value is added with 1 on the basis of the original polypeptide coding value;
if not, comparing the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference value between the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified (hereinafter referred to as a substance set F2) with the absolute value within the mass-to-charge ratio deviation value range of the secondary sub-ion clusters, namely, the mass-to-charge ratio deviation value range of the secondary sub-ion clusters is 0-0.05Da or the mass-to-charge ratio deviation value range of the secondary sub-ion clusters is 0-80ppm, and judging whether the substance set F2 is blank or not:
if the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and listing the next theoretical polypeptide fragment, wherein the polypeptide coding value is added with 1 on the basis of the original value; if the substance collection F2 is empty, directly counting the next theoretical polypeptide fragment, and adding 1 to the polypeptide coding value on the basis of the original value;
7) Repeating steps 2) -6) until the polypeptide code value U is equal to the number of all possible polypeptides of the length, i.e. u=n L
8) Acquiring the next unsearched length value, and repeating the steps 1) -7) until all the length values are searched.
(5) And scoring the candidate polypeptide fragments according to the matching degree of the secondary sub-ion clusters, and selecting the candidate polypeptide fragment with the highest score as an experimental map identification result of the sample to be detected. Wherein the scoring method comprises two methods:
the first scoring calculation method comprises the following steps: comparing the candidate polypeptide fragments with the type of the secondary ion cluster of the actual substance in the sample to be detected, counting 10 points when detecting one complete ion cluster, and outputting the result to select the candidate identification polypeptide with the highest score as the final identification result by default, and marking the final identification result as S A
The second scoring calculation method is as follows: recording the number of secondary ion types of substances in the sample to be detected matched with the candidate polypeptide fragments, and recording as N M Calculating the average value of the absolute value of the deviation of the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragment, and marking as E 2 Second score S B The calculation formula of (2) is as follows:
wherein, C is the number of theoretical secondary ion cluster types, L is the length value of the polypeptide;
The output results are respectively according to S A >S B >E 1 >E 2 Is ordered by priority order of S A 、S B The higher the score, the higher the confidence of the result, E 1 、E 2 The smaller the error, the higher the reliability of the result, and the candidate identification with the highest default selection score in the first one of the ranking resultsThe polypeptides were identified as final preferred.
Example 1
The analytical target sample in this example is a glutathione standard, and the specific steps for the identification of the short peptide group based on the length of the candidate polypeptide fragment are as follows:
(1) Firstly, presetting a polypeptide fragment length range to be 2-4 according to a short peptide length range to be measured, wherein the amino acid residues in an amino acid residue list are 20 common residues, and the preset secondary ion clusters are a, b and y ion clusters; wherein, the amino acid residue numbers are written continuously from 0 to 0 according to the molecular weight of the amino acid residues, namely each amino acid residue is assigned a unique and continuous number, and the specific sequence is shown in the table 1.
Table 1:
table 1 shows a list of common amino acid residues, and specific amino acid residues can be added according to the actual analysis requirements when the invention is implemented, i.e. unusual amino acid residues can be added according to the actual analysis requirements.
(2) Raw data acquired in mass spectrum of substances in a glutathione standard sample to be detected are acquired, wherein the raw data comprise primary parent ion and secondary child ion mass spectrum data, charge information of each ion in the primary mass spectrum and the secondary mass spectrum is further acquired, and ion standardization processing is carried out on each ion. The ion normalization process refers to converting multi-charge ions in mass spectrogram information into ions with unit positive charges through mass-to-charge ratio calculation, wherein unknown charge ions are directly defaulted to single charge ions.
Filtering the substances according to the primary parent ion signal response intensity, wherein the primary parent ion signal intensity threshold is 1000counts (about 50 times of instrument noise), the primary parent ion signal response intensity is absolute intensity, and a substance set C1 with the primary parent ion signal response intensity higher than the threshold is reserved after filtering. Further, filtering is performed on the substance set C1 with the primary parent ion signal response intensity higher than the threshold value according to the primary parent ion mass-to-charge ratio range, the primary parent ion mass-to-charge ratio range set in the embodiment is 70 Da to 1200Da, and only substances with the primary parent ion mass-to-charge ratio meeting the range requirement, namely the substance set C2 of the sample to be detected, are reserved after filtering.
(3) Obtaining an unexpired length value of 3 according to the preset length range of the polypeptide fragments, and calculating the number of all polypeptide fragments 20 under the length according to the number of amino acid residue types (20) and the length value of the polypeptide fragments (3) 3 8000 is added up, and the sequence set of theoretical polypeptide fragments, the primary theoretical parent ion mass-to-charge ratio and the secondary theoretical child ion cluster mass-to-charge ratio are obtained.
(4) Screening candidate polypeptide fragments meeting conditions in a sample substance to be detected according to the primary theoretical parent ion mass-charge ratio and the secondary theoretical ion cluster mass-charge ratio of the theoretical polypeptide fragments, wherein the specific steps are as follows:
Step 1: obtaining an unsearched length value (e.g., a length value of 3) within a preset polypeptide fragment length range;
step 2: initializing a polypeptide code value u=0, and listing a polypeptide fragment;
step 3: performing 20-ary conversion on the polypeptide coding value U according to the number (20) of amino acid residues, and expressing the calculation result according to 3 bits to obtain a sequence codon X (such as '000');
step 4: associating a number at each position in sequence codon X with the amino acid residue sequence number, translating sequence codon X into a theoretical polypeptide fragment (e.g., translating sequence codon X "000" into "G-G-G", i.e., "Gly-Gly-Gly");
step 5: obtaining a set of primary theoretical parent ion mass-to-charge ratios and secondary theoretical child ion clusters (including a, b or y ion clusters) of the theoretical polypeptide fragments ("Gly-Gly-Gly);
step 6: calculating the primary parent ion mass-to-charge ratio of substance set C2 in mass spectrogram of glutathione standard sample to be detected and theoretical polypeptide tabletThe absolute value of the first-order theoretical parent ion mass-to-charge ratio difference of the segment (Gly-Gly-Gly) is denoted as E 1 Obtaining a substance set F1 with an absolute value smaller than a threshold value of 0.005Da, and judging whether the substance set F1 is empty:
if the sequence is empty, the next theoretical polypeptide fragment is exemplified, and the polypeptide coding value is added with 1 on the basis of the original value, namely U=1;
If the target is not empty, comparing the mass-to-charge ratio of the secondary ion cluster of the target in the target set F1 with the mass-to-charge ratio of the secondary ion cluster of the theoretical polypeptide fragment ('Gly-Gly-Gly') one by one, and if the deviation value of the mass-to-charge ratio of the secondary ion is smaller than 0.02Da, marking the theoretical polypeptide fragment as a candidate identification result of the target, namely a candidate polypeptide fragment, and listing the next theoretical polypeptide fragment, wherein the polypeptide coding value is added with 1 on the basis of the original value, namely U=1;
step 7: repeating steps 2-6 until the polypeptide code value U is equal to 20 of all possible polypeptides in the length 3
Step 8: and obtaining the next unsearched length value, and repeating the steps 1-7 until all the length values are searched.
(5) Scoring the candidate polypeptide fragments according to the matching degree of the secondary sub-ion clusters, and selecting the candidate polypeptide fragment with the highest score as an experimental map identification result of a sample to be tested, wherein the scoring method comprises the following two steps:
the first scoring calculation method comprises the following steps: by comparing the candidate polypeptide fragments with the type of the secondary ion clusters of the actual substances in the sample to be detected, the type of the secondary ion clusters can be any one of a, b or y ion clusters or a combination thereof, 10 points are counted when one complete ion cluster is detected, 20 points are counted when two complete ion clusters are detected, and the like, the candidate identification polypeptide with the highest score is selected as the final identification result by default according to the output result, and the final identification result is recorded as S A
The second scoring calculation method is as follows: recording the number of secondary ion species of the substance in the sample to be detected matched with the candidate polypeptide fragment, and recording as N M And calculating the absolute value of the deviation between the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragmentThe average of the values, denoted E 2 Second score S B The calculation formula of (2) is as follows:
wherein, C is the number of the secondary theoretical ion cluster types, and L is the length value of the polypeptide;
the output results are respectively according to S A >S B >E 1 >E 2 Is ordered by priority order of S A 、S B The higher the score, the higher the confidence of the result, E 1 、E 2 The smaller the error, the higher the reliability of the result, and the candidate identification polypeptide with the highest grading in the first ranking result is selected as the final preferred identification result by default.
The specific output results are shown in table 2.
Table 2:
in table 2, the material source refers to the spectrum source number during data acquisition of the mass spectrometer.
As can be seen from Table 2, when the candidate polypeptide fragment is "ECG", S A The score of (2) is the same as the other, but S B The highest score is 78, and the output results are respectively according to S A >S B >E 1 >E 2 Is ordered by priority order of S A 、S B The higher the score, the higher the confidence of the result, E 1 、E 2 The smaller the error, the higher the reliability of the result, and the candidate identified polypeptide having the highest score in the first of the above-mentioned ranked results is selected as the final preferred identified result by default, so that in the identified result of this embodiment, S A Is consistent with the scoring result, preferably adoptsS B Wherein the candidate polypeptide fragment "ECG" is scored highest, with a value of 78, as the final preferred identification.
FIG. 2 is an ion cluster characterization result, wherein a1 and a2 are a ion cluster ions; b1, b2 and b3 ions are b ion cluster ions; y2 and y3 are y ion cluster ions. As can be seen from FIG. 2, in this example, a complete b ion cluster (b 1, b2 and b3 ions) is detected, S A Score 10 points, S B The score is 78, and the identification result of the candidate polypeptide fragment is ECG, namely the amino acid sequence of the polypeptide in the sample to be tested is identified as glutamic acid-cysteine-glycine, which is consistent with the actual sequence of the sample. From FIG. 2, it can be seen that the identification method of the present invention based on the length of polypeptide can be used to identify short peptides with high accuracy.
Example 2
The analysis target sample in this example is a soybean enzymolysis product, and the specific steps of short peptide group identification based on the length of the candidate polypeptide fragment are as follows:
(1) The length identification range of the polypeptide fragment is firstly preset to be 2-6, the amino acid residues in the amino acid residue list are 20 common residues, the amino acid acquisition sequence numbers are written continuously from small to large according to the molecular weight of the amino acid residues, namely, each amino acid residue is assigned with a unique continuous sequence number from 0, and the sequence number is shown in table 1 of the specific embodiment 1.
(2) Raw data acquired in mass spectrum of substances in soybean enzymolysis products to be detected are acquired, wherein the raw data comprise primary parent ion and secondary child ion mass spectrum data, charge information of each ion in the primary mass spectrum and the secondary mass spectrum is further acquired, and ion standardization processing is carried out on each ion. The ion normalization process refers to converting multi-charge ions in mass spectrogram information into ions with unit positive charges through mass-to-charge ratio calculation, wherein unknown charge ions are directly defaulted to single charge ions.
Filtering the substances according to the primary parent ion signal response intensity, wherein the primary parent ion signal intensity threshold is 500counts (namely, the signal intensity threshold is more than 3 times of the instrument noise), the ion signal response intensity is absolute intensity, and a substance set C1 with the primary parent ion signal response intensity higher than the threshold is reserved after filtering. Further, filtering is performed on the substance set C1 with the primary parent ion signal response intensity higher than the threshold value according to the primary parent ion mass-to-charge ratio range, the primary parent ion mass-to-charge ratio range set in the embodiment is 70 Da to 1500Da, and only substances with the primary parent ion mass-to-charge ratio meeting the range requirement, namely the substance set C2 of the sample to be detected, are reserved after filtering.
(3) Obtaining an unsearched length value 2 according to the preset length range of the polypeptide fragments, and calculating the number of all polypeptide fragments 20 under the length according to the number of amino acid residue types (20) and the length value (2) of the polypeptide fragments 2 And adding 400 to obtain a sequence set of theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio.
(4) Screening candidate polypeptide fragments meeting conditions in a sample substance to be detected according to the primary theoretical parent ion mass-charge ratio and the secondary theoretical ion cluster mass-charge ratio of the theoretical polypeptide fragments, wherein the specific steps are as follows:
step 1: obtaining an unsearched length value (e.g., length value of 2) within a preset polypeptide fragment length range;
step 2: initializing a polypeptide code value u=0, and listing a polypeptide fragment;
step 3: performing 20-ary conversion on the polypeptide coding value U according to the number (20) of amino acid residues, and expressing the calculated result according to 2 bits to obtain a sequence codon X (such as '00');
step 4: associating a number at each position in sequence codon X with the amino acid residue sequence number, translating sequence codon X into a theoretical polypeptide fragment (e.g., translating sequence codon X "00" into "G-G", i.e., "Gly-Gly");
Step 5: obtaining a set of primary theoretical parent ion mass-to-charge ratios and secondary theoretical child ion clusters (including a, b or y ion clusters) of the theoretical polypeptide fragments ("Gly-Gly") described above;
step 6: calculating the primary parent ion mass-to-charge ratio of substance set C2 and the primary theoretical parent ion mass of theoretical polypeptide fragment ('Gly-Gly') in a mass spectrogram of a glutathione standard sample to be detectedAbsolute value of the charge ratio difference, denoted as E ,1 Obtaining a substance set F1 with an absolute value smaller than a threshold value of 10ppm, and judging whether the substance set F1 is empty:
if the sequence is empty, the next theoretical polypeptide fragment is exemplified, and the polypeptide coding value is added with 1 on the basis of the original value, namely U=1;
if the target is not empty, comparing the mass-to-charge ratio of the secondary ion cluster of the target in the target set F1 with the mass-to-charge ratio of the secondary ion cluster of the target polypeptide fragment ('Gly-Gly') one by one, and if the deviation value of the mass-to-charge ratio of the secondary ion cluster is smaller than 0.02Da, marking the target polypeptide fragment as a candidate polypeptide fragment of the target, namely a candidate identification result, and listing the next target polypeptide fragment, wherein the polypeptide coding value is added with 1 on the basis of the original value, namely U=1;
step 7: repeating the steps 2-6 until the polypeptide code value U is greater than the number of all possible polypeptides in the length, namely U is greater than 400; counting the number of polypeptides in the soybean enzymolysis product to be detected when the length of the polypeptide fragment is 2;
Step 8: the next unsearched length values (e.g., 3, 4, 5, and 6) are obtained, and steps 1-7 are repeated until all length values have been retrieved.
The detection results are shown in FIG. 3. When the length of the polypeptide fragment is 2, the detected number of the polypeptides in the soybean enzymolysis product to be detected is 163, and when the length of the polypeptide fragment is 3, the detected number of the polypeptides in the soybean enzymolysis product to be detected is 335; when the length of the polypeptide fragment is 4, the detected quantity of the polypeptides in the soybean enzymolysis product to be detected is 93; when the length of the polypeptide fragment is 5, the detected quantity of the polypeptides in the soybean enzymolysis product to be detected is 31; and when the length of the polypeptide fragment is 6, the detected quantity of the polypeptides in the soybean enzymolysis product to be detected is 23.
Comparative example 1
This comparative example was tested using Protein Discovery software from thermo corporation of the united states (version 2.4, built-in sequence engine), the sequence library was a soy protein sequence library (downloaded from UniProt), the sample was the soy protein enzymatic product of example 2, the enzymatic hydrolysis was performed in a non-specific manner, the parent ion response threshold was set to 500, the polypeptide identification length was 4-6 (due to the minimum limit of the method to 4), the primary ion mass to charge ratio deviation was 10ppm, the secondary ion mass to charge ratio deviation was 0.02Da, the secondary ion cluster types were a, b, y ions, the false discovery rate was 1%, the other parameters were set using software default parameters, and the statistical distribution of the identification results was shown in fig. 4.
As can be seen from fig. 4, when the length of the polypeptide fragment is 4, the detected number of polypeptides in the soybean enzymatic hydrolysate to be detected is 1; when the length of the polypeptide fragment is 5, the detected quantity of the polypeptides in the soybean enzymolysis product to be detected is 21; and when the length of the polypeptide fragment is 6, detecting the quantity of the polypeptides in the soybean enzymolysis product to be detected to be 24.
From this, it can be seen that comparative example is superior to example 2 in terms of statistics of the identification result of the same soybean proteolytic product sample, in that the method for identifying short peptide group based on polypeptide length according to the present invention in example 2 is excellent in the range of polypeptide fragment length analysis of 2 to 6, more short peptides are identified at each length value, and in contrast, comparative example 1 can conduct short peptide analysis only between 4 to 6, and the number of identified short peptides is far smaller than the former.
More importantly, the short peptide histology identification method based on the polypeptide length in the embodiment 2 does not load any protein sequence database in the whole analysis process, directly realizes dead angle-free retrieval and identification of the short peptide, and fully illustrates the advancement, creativity and wide application prospect of the method.
In summary, the invention discloses a polypeptide length-based short peptide group identification method and application thereof, wherein polypeptide fragments within a specified polypeptide length range are exemplified, theoretical primary parent ions and theoretical secondary child ions are generated one by one, the theoretical primary parent ions and the theoretical secondary child ions are continuously compared with actual ion spectrogram data acquired by substances in a sample, the polypeptide fragments meeting matching requirements are associated with corresponding substances, and finally, the identification of the polypeptide sequence structure of the substances in the sample is realized.
The polypeptide length-based short peptide histology identification method can directly search and identify protein enzymolysis products without dead angles and omission without depending on a protein sequence database, and meanwhile, the method adopts a completely matched evaluation mode for the short peptide, has strong accuracy and high reliability of the identification result, and can fully make up for the defects of the current proteomics analysis method and tools. In addition, the short peptide identification result obtained by the method can be widely applied to qualitative and quantitative analysis of short peptides in food-borne protein enzymolysis products or other biological samples, and has important significance in improving the level of analytical technology of polypeptide groups and promoting industry development.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present invention. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Claims (10)

1. A method for identifying a short peptide group based on the length of a polypeptide, comprising the steps of:
S1, presetting a polypeptide fragment length range, an amino acid residue list and a secondary ion cluster type, and obtaining mass spectrum data of primary parent ions and secondary child ions of substances acquired in a mass spectrum of a sample to be detected;
s2, presetting a primary parent ion signal response intensity threshold value and a primary parent ion mass-to-charge ratio range, and screening mass spectrum data obtained in the step S1 according to the preset primary parent ion signal response intensity threshold value and the primary parent ion mass-to-charge ratio range to obtain a substance set C2 meeting requirements in a sample to be detected;
s3, obtaining a polypeptide fragment length value L in the preset polypeptide fragment length range, and calculating the number N of all theoretical polypeptide fragments under each polypeptide fragment length according to the number N of amino acid residue types of the amino acid residue list in the step S1 L Obtaining a sequence set of the theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio of the theoretical polypeptide fragments;
s4, screening candidate polypeptide fragments meeting conditions in the substance set C2 in the step S2 according to the primary theoretical parent ion mass-charge ratio and the secondary theoretical child ion cluster mass-charge ratio of the theoretical polypeptide fragments;
and S5, scoring the candidate polypeptide fragments, and selecting the candidate polypeptide fragment with the highest score to judge the candidate polypeptide fragment with the highest score as a polypeptide fragment sequence within a preset length range in the sample to be detected.
2. The method for identifying short peptide group as claimed in claim 1,
the length range of the preset polypeptide fragment in the step S1 is any length range, preferably 2-7;
the amino acid residues in the amino acid residue list in the step S1 are any nonrepeating amino acid residues, the serial numbers of the amino acid residues are numbered from 0, and the unique and continuous amino acid residues are required;
the secondary ion cluster type in step S1 includes any one or more of an a ion cluster, a b ion cluster, and a y ion cluster.
3. The method of claim 1, wherein the primary parent ion signal response intensity threshold is 3 times or more the instrument noise intensity; the primary parent ion mass-to-charge ratio range is 70-2000Da.
4. A method for identifying a short peptide group according to any one of claims 1 to 3, wherein the specific steps of screening the sample to be tested for a satisfactory collection of substances C2 in step S2 are as follows:
s11, acquiring charge information of each ion in the primary mass spectrum and the secondary mass spectrum according to the primary parent ion and the secondary child ion cluster mass spectrum data of the substance acquired by the sample to be detected in the mass spectrum in the step S1, and performing ion standardization treatment to acquire a primary parent ion mass-to-charge ratio and a secondary child ion cluster mass-to-charge ratio of the substance;
S12, filtering the substances in the step S11 according to the primary parent ion signal response intensity to obtain a substance set C1 higher than the primary parent ion signal response intensity threshold in the step S2;
and S13, filtering the substance set in the step S12 according to the range of the primary parent ion mass-to-charge ratio in the step S2 to obtain a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement, namely screening to obtain the substance set C2 meeting the requirement in the sample to be detected.
5. The method according to claim 4, wherein the ion normalization process is to convert multi-charged ions in a mass spectrum into ions with unit positive charges by mass-to-charge ratio calculation, wherein unknown charge ions are defined as single charge ions.
6. The method of claim 1, wherein the screening of the candidate polypeptide fragments in step S4 comprises the steps of:
s21, acquiring an unsearched length value L in the length range of the preset polypeptide fragment in the step S1;
s22, initializing a polypeptide coding value u=0;
s23, carrying out N-ary conversion on a polypeptide coding value U according to the number N of amino acid residue types, and representing a calculation result according to L bits to obtain a sequence codon X;
S24, associating numbers at each position in the sequence codon X with amino acid residue sequence numbers, and translating the sequence codon X into a theoretical polypeptide fragment P;
s25, obtaining a primary theoretical parent ion mass-to-charge ratio Z and a secondary theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
s26, calculating the absolute value of the difference value between the first-order parent ion mass-to-charge ratio of the substance set C2 and the first-order theoretical parent ion mass-to-charge ratio Z of the theoretical polypeptide fragment P, and marking the absolute value as E 1 Obtaining a substance set F1 with an absolute value meeting a set threshold, and judging whether the substance set F1 is empty:
if the sequence is empty, acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragments in the step S3, wherein the polypeptide coding value is added with 1 on the basis of the original value;
if not, comparing the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference value between the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified, the absolute value of which is in the mass-to-charge ratio deviation value range of the secondary sub-ion clusters, and judging whether the substance set F2 is blank or not:
If the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragments in the step S3, wherein the polypeptide coding value is added with 1 on the basis of the original value; if the substance collection F2 is empty, directly counting the next theoretical polypeptide fragment, and adding 1 to the polypeptide coding value on the basis of the original value;
s27, repeating the steps S22-S26 until the encoded value U of the polypeptide is equal to the number of all possible polypeptides of the length, i.e. U=N L
S28, acquiring a next non-searched polypeptide fragment length value in the preset polypeptide fragment length range in the step S1, and repeating the steps S21-S27 until the polypeptide fragment length values in the preset polypeptide fragment length range in the step S1 are searched.
7. The method according to claim 6, wherein the threshold value is 0-0.02Da or the threshold value is 0-40ppm in step S26.
8. The method according to claim 6, wherein the secondary ion cluster mass to charge ratio deviation value in step S26 is in the range of 0-0.05Da.
9. The method of claim 1, wherein the scoring in step S5 comprises two methods:
The first scoring calculation method comprises the following steps: by comparing the candidate polypeptide fragments with the type of the secondary ion cluster of the actual substance in the sample to be detected, 10 points are counted for each detected complete ion cluster, and the output result of the detection is defaultly selected as the candidate identification polypeptide with the highest scoreFor final evaluation result, it is denoted as S A
The second scoring calculation method is as follows: recording the number of secondary ion species of substances in the sample to be detected matched with the candidate polypeptide fragments under the analysis type of the secondary ion clusters preset in the step S1, and recording as N M Calculating the average value of the absolute value of the deviation of the mass-to-charge ratio of the secondary sub-ions of the substances in the sample to be detected and the mass-to-charge ratio of the secondary theoretical sub-ions of the corresponding candidate polypeptide fragments, and marking as E 2 Second score S B The calculation formula of (2) is as follows:
wherein, C is the number of theoretical secondary ion cluster types, L is the length value of the polypeptide;
preferably, the output results of the scoring are respectively in accordance with S A >S B >E 1 >E 2 Is ordered by priority order of S A 、S B The higher the score, the higher the confidence of the result, E 1 、E 2 The smaller the error, the higher the reliability of the result, and the first candidate identification polypeptide with the highest score in the sorting result is selected as the final preferred identification result.
10. Use of the short peptide omic identification method according to any one of claims 1-9 in proteomics or proteomics.
CN202210151752.2A 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof Active CN114596912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210151752.2A CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210151752.2A CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Publications (2)

Publication Number Publication Date
CN114596912A CN114596912A (en) 2022-06-07
CN114596912B true CN114596912B (en) 2023-08-29

Family

ID=81805260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210151752.2A Active CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Country Status (1)

Country Link
CN (1) CN114596912B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116217659B (en) * 2022-10-10 2024-02-27 上海市农业科学院 Stropharia rugoso-annulata mycelium flavor peptide and preparation method and application thereof
CN118298927B (en) * 2024-06-06 2024-09-20 五邑大学 AI polypeptide structure identification method based on transducer model and application thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010256101A (en) * 2009-04-23 2010-11-11 Shimadzu Corp Method and device for analyzing glycopeptide structure
WO2012128580A1 (en) * 2011-03-22 2012-09-27 Korea Advanced Institute Of Science And Technology Water-soluble polypeptides comprised of repeat modules, method for preparing the same and method for a target-specific polypeptide and analysis of biological activity thereof
CN104034791A (en) * 2014-05-04 2014-09-10 北京大学 CID and ETD mass spectrogram fusion based polypeptide de novo sequencing method
CN104076115A (en) * 2014-06-26 2014-10-01 云南民族大学 Protein second-level mass spectrum identification method based on peak intensity recognition capability
CN106645437A (en) * 2015-10-30 2017-05-10 中国科学院大连化学物理研究所 Polypeptide amino acid sequence De novo sequencing method based on chemical modification and isotope labeling
CN110556162A (en) * 2019-08-20 2019-12-10 广州基迪奥生物科技有限公司 Detection and analysis method of cyclic RNA translation polypeptide based on translation group
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829539B2 (en) * 2001-04-13 2004-12-07 The Institute For Systems Biology Methods for quantification and de novo polypeptide sequencing by mass spectrometry
JP4932833B2 (en) * 2005-06-03 2012-05-16 ウオーターズ・テクノロジーズ・コーポレイシヨン Method and apparatus for chemical analysis using fractionation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010256101A (en) * 2009-04-23 2010-11-11 Shimadzu Corp Method and device for analyzing glycopeptide structure
WO2012128580A1 (en) * 2011-03-22 2012-09-27 Korea Advanced Institute Of Science And Technology Water-soluble polypeptides comprised of repeat modules, method for preparing the same and method for a target-specific polypeptide and analysis of biological activity thereof
CN104034791A (en) * 2014-05-04 2014-09-10 北京大学 CID and ETD mass spectrogram fusion based polypeptide de novo sequencing method
CN104076115A (en) * 2014-06-26 2014-10-01 云南民族大学 Protein second-level mass spectrum identification method based on peak intensity recognition capability
CN106645437A (en) * 2015-10-30 2017-05-10 中国科学院大连化学物理研究所 Polypeptide amino acid sequence De novo sequencing method based on chemical modification and isotope labeling
CN110556162A (en) * 2019-08-20 2019-12-10 广州基迪奥生物科技有限公司 Detection and analysis method of cyclic RNA translation polypeptide based on translation group
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
pepReap:基于支持向量机的肽鉴定算法;王海鹏,付岩,孙瑞祥,贺思敏,曾嵘,高文;计算机研究与发展(第09期);54-61 *

Also Published As

Publication number Publication date
CN114596912A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
Beer et al. Improving large‐scale proteomics by clustering of mass spectrometry data
Wu et al. Shotgun proteomics: tools for the analysis of complex biological systems
CN114596912B (en) Short peptide histology identification method based on polypeptide length and application thereof
Purvine et al. Shotgun collision‐induced dissociation of peptides using a time of flight mass analyzer
Merkley et al. Applications and challenges of forensic proteomics
Yan et al. Mass spectrometry-based quantitative proteomic profiling
US9354236B2 (en) Method for identifying peptides and proteins from mass spectrometry data
US20030068825A1 (en) System and method of determining proteomic differences
CN113362899A (en) Deep learning-based protein mass spectrum data analysis method and system
Gao et al. Protein analysis by shotgun proteomics
CA2349265A1 (en) Protein expression profile database
Söderberg et al. Detection of crosslinks within and between proteins by LC-MALDI-TOFTOF and the software FINDX to reduce the MSMS-data to acquire for validation
Ahrné et al. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates
JP2013531225A (en) Method and system for analysis of peptide sample streams using tandem mass spectrometry
KR20190035325A (en) Bioinformatics platform for high-throughput identification and quantification of o-glycopeptide
JP5636614B2 (en) Comparative analysis method of data obtained by LC-MALDI
CN114639445B (en) Polypeptide histology identification method based on Bayesian evaluation and sequence search library
US7230235B2 (en) Automatic detection of quality spectra
WO2010094300A1 (en) A method for determining in silico- a set of selected target epitopes
Del Boccio et al. Homo sapiens proteomics: clinical perspectives
Shao et al. Denoising peptide tandem mass spectra for spectral libraries: a Bayesian approach
Keane et al. Mass spectrometry based proteomics: Changing the impact of protein analysis in forensic science
EP1946119A4 (en) An additive scoring method for modified polypeptide
Xu et al. Peak Detection On Data Independent Acquisition Mass Spectrometry Data With Semisupervised Convolutional Transformers
Lee Probability-based shotgun cross-linking sites analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant