WO2003015001A2 - Method for identification of protein function - Google Patents
Method for identification of protein function Download PDFInfo
- Publication number
- WO2003015001A2 WO2003015001A2 PCT/GB2002/003244 GB0203244W WO03015001A2 WO 2003015001 A2 WO2003015001 A2 WO 2003015001A2 GB 0203244 W GB0203244 W GB 0203244W WO 03015001 A2 WO03015001 A2 WO 03015001A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- peptides
- sequence
- protein
- frameset
- sequences
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the present invention relates to methods of determining functions of protein sequences using computational methods.
- the functional signatures may not map directly onto structure, so that dissimilar structures may have functional similarity (e.g. bacterial and mammalian serine proteases) .
- three dimensional structural information may be insufficient and additional information, such as sequence motifs, common residue clusters or characteristic surface properties, will be required (see Orengo et al , 1999) .
- PROSITE a method for detecting active sites and patterns in protein sequences (Bairoch, 1991) ; PFAM/SCOP analysis, a protein domain assignment method ' (Murzin et al , 1995); COGs (Clusters of Orthologous Groups) analysis, a method of protein function prediction; Superfamily assignments; Functional categories assignment
- fold assignment by sequence similarity to protein of known 3D structure.
- the reported fraction of- fold assignments in the various genomes amounts only to about 10-20% of sequences.
- fold assignment include the steps of :
- step 6 Checking the 'goodness' of the postulated model (e.g. using PROCHECK) .
- the process may be repeated from step 1 one or more times until step 6 is acceptable.
- sequence profiling Gribshov et al (1990)
- sequence motif searching Bork & Gibson, 1996)
- Sequence profile methods use evolutionary information from neighbouring sequences in the sequence database to build a profile.
- An iterative sequence profile method able to detect distant relationships is PSI-BLAST.
- a further method is fold assignment, or 'threading'-. These methods explicitly incorporate structural information from available 3-D protein structures. In many cases these methods can detect distantly diverged proteins as well as unrelated proteins with a similar fold (see PROTEINS: 23(5), 1995; Suppl 1, 1997 and Suppl 3, 1999 for further details of methods and results) . However, unless at least one 3D protein structure of the family is known, the method will be unable to assign the new sequence to a structural or functional family.
- Fold assignment methods are at least as successful as sequence profiling methods and, in addition, are able to assign another 10-15 % of open reading frames (ORFs) from genome sequencing projects. Furthermore, some of the predictions from folds assignment methods are not detectable using sequence based methods. Conversely, sequence based methods sometimes identify distant relationships that fold assignment methods do not detect. This is because the sequence methods incorporate evolutionary information from neighbouring sequences whereas traditional fold methods typically do not.
- ORFs open reading frames
- Another problem in relation to protein structure is in providing an accurate and reliable theoretical method to identify peptides that bind with high affinity to HLA molecules.
- the identification of tumour and virus immunogenic epitopes is of great importance for the design of tumour and virus vaccines.
- the most common property of all the immunogenic peptides is their high affinity for the HLA molecule (Sette et al, 1994; Oukka et al , 1996, van den Burg et al, 1996; Tourdot et al, 1997) .
- a reliable theoretical method to identify peptide sequences within a given antigen that bind strongly to HLA would, therefore, be of great utility for the selection of immunogenic peptides, provided it is both efficient and accurate.
- Affinity for HLA principally depends on the allele-specific pattern of conserved residues at particular positions in the peptide, the primary anchor motifs (Rammensee et al, 1995; Engelhard, 1994) . Although the large majority of immunogenic epitopes possess the allele-specific primary anchor motifs the presence of these motifs is not a sufficient condition for a peptide to show strong binding. Secondary anchors and deleterious residues at non-conserved positions also influence the peptide-HLA interaction (Ruppert et al, 1993) .
- Grassy et al disclose a method for computer-assisted rational design of immunosuppressive compounds.
- the reference describes the analysis of a set of peptides for immunosuppressive activity.
- a learning set of inactive and active peptides were analysed by a range of topological descriptors, and a set of topological descriptors for the active set of peptides was defined.
- the descriptors were used to screen a virtual combinatorial library of peptide which was generated based on a partially randomised lOmer consensus sequence in which positions 1, 5 and 10 were fixed as arginine, arginine, and tyrosine respectively.
- the method utilises a residue independent computational model (SCIPS, for Sequence Comparison In Property Space) whose inputs are not sequences but physicochemical and topological parameters derived from those sequences.
- SCIPS residue independent computational model
- the invention provides a method for • determining whether a query protein sequence has a functional property of interest, which method comprises:
- the invention provides a method for determining whether a query protein sequence has a functional property of interest, which method comprises: (i) providing a dataset of proteins which share the functional property of interest;
- a further aspect of the invention provides a computer system which is operatively configured to implement the method of the first or second aspect.
- a computer system we mean the hardware means, software means and data storage means used to determine whether a query protein sequence has a functional property of interest according to the present invention.
- the minimum hardware means of a computer-based system of the present invention typically comprises a central processing unit (CPU) , working memory and data storage, and e.g. input means, output means etc.
- the data storage may comprise magnetic storage media such as floppy discs, hard disc storage medium and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. Examples of such systems are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based, Windows NT or IBM OS/2 operating systems.
- Figure 1 shows a flow chart which schematically illustrates the invention.
- Figure 2 shows histograms and corresponding normal distributions of the measured relative affinity (RA) values for 120 immunogenic and 72 non immunogenic sequences.
- Figure 3A & B shows data relating to validation of the model
- Figure 4 shows the immunogenicity of peptides belonging to the external validation set.
- Figures 5a and b respectively show distributions of the BiMass and SFPEITHI scores for the high and low affinity peptides of the external validation set.
- the protein sequence may be of any species origin, including human, primate, mammalian, vertebrate, insect, yeast, eukaryotic or prokaryotic.
- the sequence may be a confirmed protein sequence (i.e. based on direct protein sequence or a translation of an mRNA) or a hypothetical protein sequence based upon an open reading frame of genomic DNA, or a de novo designed sequence. Dataset of Proteins .
- the dataset of proteins may be a set of proteins associated with the functional property of interest.
- the proteins may also be of any species origin, including human, primate, mammalian, vertebrate, insect, yeast, eukaryotic or prokaryotic.
- the dataset may be limited to proteins of a single species or it may comprise sequences from different organisms.
- the dataset may further comprise synthetically generated sequences, for example sequences which have been synthesised in a random or semi-random manner and selected to have the property of interest .
- the proteins may be full length sequences, partial sequences or peptides, or mixtures thereof. Where the sequences are partial sequences or peptides, these will be of sufficient length to be associated with the functional property. This will usually be at least 5, more generally at least 8, such as at least 9 or 10 amino acids in length.
- the size of the dataset will be dependent upon a number of circumstances, including the number of sequences available in the art and the discrimination achieved by the topological parameters. Usually, it is desirable that the dataset comprises at least 5, preferably at least 10, more preferably at least 20 members. The maximum size of the dataset will be dependent only upon the numbers of sequences available and the computational power available for their analysis. However, a dataset of up to 500 members is feasible.
- proteins in relation to the dataset is used to mean any polypeptide sequence, including short sequences (e.g. 5 or more amino acids) which are often referred to in the art as “peptides” or “polypeptides” .
- Functional Property of Interest This is any property associated with a set of sequences and known to a person of skill in the art.
- Such properties include domains with enzymatic function, e.g.
- kinase kinase, phosphatase or acetylase activity
- domains which bind target molecules such as other proteins or DNA
- domains which act as agonists or antagonists of biological function proteins involved in signal transduction such as GPCRs, intracellular effector proteins and families as defined in the SCOP database (http://scop.mrc- lmb.cam.ac.uk/scop/) .
- the dataset is one which the proteins contain regions of sequence homology associated with said functional property of interest.
- sequence homology it is meant that those of skill in the art operating available sequence alignment algorithms are able to determine at least one region in the members of the dataset whose sequences contain regions of alignment of statistical significance.
- the algorithm BLAST mentioned above, may be used with default parameters to determine regions of sequence homology of at least 30% identity. This level of homology is often associated with a common evolutionary origin and hence related function of the protein.
- the region of homology may not be across the entire length of the protein, but within a frameset of the protein, as defined herein.
- Proteins which share a functional property of interest may do' so because of a region of sequence within the protein which imparts such a property to the protein. This is well known as such by those of skill in the art. For example, DNA- binding domains, zinc-finger domains, transmembrane domains, signal sequences and the like are found in subdomains of proteins and often such a subdomain may be transferred ("donated") to other "recipient" protein sequences to transfer the property from donor to recipient.
- the property may be discontinuous within the protein.
- the target binding of an antibody variable domain is primarily determined by the properties of three hypervariable regions in each of a light and heavy chain variable domain which are separated by framework regions .
- the frameset may be continuous or discontinuous to cover 2 or 3 different regions.
- the frameset is the region from which the descriptor parameters are determined.
- the frameset will define sequences of substantially the same length, although some variation in sequence length is permitted. This is because functional domains shared by proteins will often differ in length to some extent .
- the actual size of the frameset will be determined by factors which will vary in each case, taking account of the functional property of interest and the capacity of the computational model.
- the frameset is a 9mer-10mer size.
- the frameset may be in this range, or greater, for example in the range of from 8 to 50 amino acids, such as from 8 to 40, for example from.10 to 25 amino acids.
- the frameset is the ' same size as the peptides of the proteins of the dataset .
- the frameset may be defined as a sequence within a longer protein sequence.
- a dataset of proteins which share a common property of interest may be aligned so that the region of each protein associated with the property is shown aligned, irrespective of where the region appears in relation to the rest of the protein in which it is located (see Figure 1 box 1) . This region is then used to determine a frameset, for which descriptors are calculated.
- Physiochemical and Topological Descriptors This includes any physical or chemical property of a molecule, including topological parameters of the molecule. It may include either properties which are "static” (at least in a time-averaged sense) such as the dipole moment of a molecule, and/or "dynamic", such as ones characterising the range of conformations through which the molecule may flex over a period of time. In the case of some molecules, the flexing of the molecule over time can be determined with high accuracy using modern molecular modelling techniques. In the present context, a frameset of a protein is also considered to be a molecule for which descriptor parameters can be measured or calculated.
- descriptors examples include molar mass, ellipsoidal volume, molar volume, lipophilicity, dipole moment, total number of N or C or O atoms, number of methyl groups, and various topological, connectivity and shape indices, such as the Wiener or Balaban indices, the Kier Chi indices and the like. These and many other descriptors are defined in the "Tsar Reference Guide", published by Oxford Molecular (2000) .
- Figure 1 outlines an embodiment of the invention.
- a set .of protein sequences (shown by six parallel lines) each contain a region of homology (thick line) which is associated with a function common to all sequences.
- the region of homology defines a frameset.
- the parameters of the frameset are encoded to provide a plurality of descriptors (step 2 of Figure 1) .
- active sequences a plurality of sequences which are not associated with the property are also encoded.
- Inactive sequences will be of substantially the same size as the frameset, and similar in number. For example, where the number of protein sequences having activity is a number x, then from x/2 to 2x inactive sequences will likewise be encoded.
- the descriptors are analysed in order to determine a set of descriptors and their values which describe the frameset (step 3 of Figure 1) .
- the descriptors and their values are not common to the set of inactive sequences.
- a way of analysing and selecting the descriptors is as follows:
- Intercorrelated descriptors are first either removed by using standard statistical practices, or decorrelated through algorithms known in the art like principal component analysis (PCA) or Gram-Schmidt orthogonalisation.
- PCA principal component analysis
- Gram-Schmidt orthogonalisation 2- From the descriptors obtained in the previous step, a set of descriptors is selected in which space the set of active sequences- is well separated from the set of inactive sequences. The ideal situation is when there is no overlap between the two regions of the space.
- a person of skill in the art has various ways to handle cases where there is a certain overlap and to determine the smallest possible set of descriptors exhibiting the smallest possible overlapping region.
- Such descriptor space is not necessarily a linear one and various computational techniques are available in the art to perform such an analysis, including neural networks, genetic algorithms, partial least squares (PLS) , fuzzy logic and the like.
- the precise means of determination may be selected by a person of skill in the art, taking account of the nature and number of sequences to be analysed and personal preference. In the accompanying example we have used a neural network, and this is preferred.
- the process will provide a set of descriptor parameters for a frameset which are indicative of the property of interest .
- the number and nature of the descriptors selected will depend upon the frameset and property selected, and will differ on a case-by-case basis. Usually, about 15-40 descriptors will be selected, though this is not fixed.
- the method then comprises scanning a query protein sequence for the presence of a frameset which have parameters for the selected descriptors which match those which are indicative of the property.
- a frameset which have parameters for the selected descriptors which match those which are indicative of the property.
- the frameset is shown as a box which is moved stepwise (e.g. 1 amino acid at a time) along, the protein sequence, wherein the value of the selected descriptors are calculated for the residues within each box.
- stepwise e.g. 1 amino acid at a time
- the scanning means inspecting successive regions of the sequence.
- the scanning may be directed to pre-selected regions, or linearly along the protein sequence in steps of more than one amino acid.
- the invention thus provides for the provision of novel peptides which have a property of interest, as well as for the identification of sequences which have a property of interest but which do not have sufficient sequence homology for this property to be identified by conventional methods.
- a further aspect of the invention is a peptide obtained by the process of the invention, as well as a composition comprising said peptide plus a pharmaceutically acceptable carrier or diluent.
- Peptides of particular interest which may be obtained in accordance with the invention include peptides which bind HLA Class I or Class II antigens, receptors and/or their cognate ligands, enzymes and protein-protein interaction inhibitors.
- This example illustrates the invention in relation to a method for the prediction of HLA-A*0201 affinity, the selection of immunogenic HLA-A*0201 bound peptides, and the experimental test of the predictions.
- a unique feature of this computational method is that it performs the selection in "property space” rather than sequence space and, as such, is capable of finding 'family' relationships among groups of peptides that are not identifiable using conventional sequence comparison methods.
- the model operates by the simulation of an artificial neural network whose inputs are not amino acid sequences but physicochemical and topological descriptors derived from those sequences .
- the model can identify 86.8% of high affinity peptides with a probability of correct prediction of 94.3%. More importantly, it is able to predict 88.6% of immunogenic peptides with an equally high probability (85.3%), leading to the possibility of creating an almost complete immunogenic epitope map of any tumour or virus antigen.
- the aim of this study was to create a computational model for the identification of peptides exhibiting affinities for HLA- A*0201 sufficiently high to ensure their immunogenicity. It was therefore necessary to define the affinity threshold discriminating immunogenic from nonimmunogenic peptides under the experimental conditions of the HLA-A*0201 affinity measurements. 192 peptides with various HLA-A*0201 affinities were tested for their capacity to elicit a CTL response in HHD mice.
- HLA-A*0201 transgenic mice such as HHD and A2/Kb mice
- HHD and A2/Kb mice are also immunogenic in humans and, conversely, that peptides nonimmunogenic in HLA-A*0201 transgenic mice are nonimmunogenic in humans
- Each peptide was tested in more than twelve HHD mice in several independent experiments.
- Peptides were considered immunogenic when a) the specific , lysis of induced CTL was at least 15% above the nonspecific lysis and b) specific CTL were generated in more .than 20% of primed mice.
- 172 peptides extracted from 18 antigens were included in the database for the training of the neural network.
- 110 peptides were 9mers and 62 peptides were lOmers.
- Their sequence characteristics are illustrated in Table 1. Except at the anchor positions P2 and P9/10 occupied by any of L/M/l/V/A in the large majority of peptides (91.3%) there was a fair representation of all 20 amino acids.
- the architecture of the back propagation neural network, the transfer parameters and the convergence RMS, necessary to obtain good generalized performances, were optimised by trial and error with the help of the internal validation set formed by a random choice of 30% of the database. Numerous combinations of 60 descriptors were tested and an iterative selection procedure was followed by displaying the dependencies of the output variables on each input (descriptor) variable. For each descriptor combination, particular attention was paid to exclude combinations exhibiting a correlation of 0.7 or higher. Moreover, care was taken to keep the network sufficiently small in terms of the number of weights to be computed. In practical terms, the ratio p Number of input samples / Number of weights to be evaluated was kept within the range 1.8 ⁇ p ⁇ 2.2 (Tetko et al. , 1993) .
- a network comprising three neurones in the hidden layer, a noise level equal to 0.03 and a convergence rate of 0.03 was selected as the final model since the predictions allowed us to obtain the best simulation results.
- 120 peptides were used as the learning set and the remaining 52 peptides (39x9mers and 13xl0mers) were used as the internal validation set.
- SEN sensitivity
- SPE specificity
- NPV negative predictive value
- the ideal theoretical model for the identification of high affinity peptides must combine a high sensitivity (detection of a high percentage of strong binders) , a high specificity (good discrimination between strong and intermediate/weak binders) , high PPV (low percentage of false positive peptides) and a high NPV (low percentage of false negative peptides) .
- HLA-A*0201 motifs because these peptides are likely to be HLA-A*0201 epitopes. From a total of 556 peptides having the HLA-A*0201 motifs 135 peptides were predicted to have a high affinity (50 hTERT, 45 HER- 2/neu, 24 PSMA, 16 NPM/ALK) and 421 peptides were predicted to have an intermediate/low affinity (185 hTERT, 158 HER- 2/neu, 41 PSMA, 37 NPM/ALK) . 48 of the 556 peptides were randomly selected to be tested for their affinity in blind experiments (Table 2) .
- Figure 3B shows the plots of the experimental and predicted RA values of the 48x9/l0mers peptides (Bi) , divided into 37x9mers (B 2 ) and llxlOmers (B 3 ) .
- TN/(TN+FP) TN/(TN+FP)
- PPV TP/(TP+FP)
- NPV TN/ (TN+FN) where TP (true positive) corresponds to strong binders well predicted
- a predicted RA of 1.0 as a new threshold for the identification of strong binders, the sensitivity increased to 96% of the high affinity peptides. It was expected that this higher discrimination threshold would also result in more false positives, but the PPV was nevertheless high at 77%.
- the efficiency of the SCIPS model to identify strong binding sequences can be enhanced by raising the predicted RA threshold.
- the aim of this work was to create an HLA-A*0201 affinity prediction model capable of selecting strong HLA-A*0201 binders that, according to current immunological dogma, should also be immunogenic.
- the SCIPS model we describe allows the identification of almost all the high affinity but also a significant percentage of intermediate/low affinity immunogenic peptides. It represents, therefore, a powerful tool for the identification of immunogenic virus and tumour epitopes that could be used for specific vaccination. It is now well documented that virus and tumor antigens contain a large number of immunogenic epitopes (Menendez- Arias et al . , 1998; Cibotti et al . , 1992).
- the establishment of the complete immunogenic epitope map of these antigens could be of great immunotherapeutic interest for two reasons.
- tumour antigens are non-mutated self-proteins and their specific CTL repertoire is strongly influenced by the mechanisms of negative selection (Disis et al . , 1996; Coletta et al . , 2000; Kast et al . , 1994).
- Second, the identification of a large number of immunogenic epitopes will allow a polyspecific vaccination that has been demonstrated to be more efficient than a monospecific vaccination (Oukka et al , 1996) .
- HLA-A*0201 epitopes have the specific anchor and strong residues (L/M/V/l/A) in P2 and C-terminal P (primary anchor motifs) .
- primary anchor motifs only 30% of peptides with primary anchor motifs exhibit a high affinity. This is due to the presence of secondary anchor motifs which are also involved either favorably or unfavorably in the peptide-HLA-A*0201 interaction.
- Extended motifs (primary and secondary anchor motifs) and statistical binding matrices have already been used to perform a search of high affinity immunogenic peptides (Parker et al . , 1994; Brusic et al . , 1994).
- ANN Neural networks
- SCIPS model of the present invention allows for the first time the creation of complete immunogenic epitope maps of tumour and virus antigens (antigen CTL epitope BiomapTM) .
- a similar approach is currently being developed for peptides presented by HLA-
- HLA-A201 B*0702 and HLA-A*0301.
- HLA-A201 these three HLA molecules cover 80% of the Caucasian population.
- the long- term benefits of this strategy would be that a reliable prediction of immunogenicity could be generated from genome data.
- the sequences from the human genome could be translated to "antigen CTL epitope Biomaps" of potential self- reactivities of autoimmune and antitumor relevance whereas the sequences from the various microbial and virus genomes could be translated to "antigen CTL epitope Biomaps" of potential interest in vaccine development.
- SCIPS method may also be applied to the analysis of polypeptide sequences. Using a scanning frame of sequences (eg 10-15 residues) encoded in property space, any new. sequence may be assigned to its correct functional family.
- HLA-A*0201 transgenic, ⁇ 2m - /-, D b -/- HHD mice (Pascolo et al, 1997) were injected sc with lOO ⁇ g of peptide emulsified in incomplete Freund's adjuvant (IFA) in the presence of 140 ⁇ g of the I-A b restricted HBVcore 128-140 T-helper epitope.
- IFA incomplete Freund's adjuvant
- spleen cells (5xl0 7 cells in 10ml) were stimulated in vi tro with peptide (lO ⁇ M) .
- the bulk responder populations were tested for specific cytotoxicity by using uncoated or peptide coated HLA-A*0201 expressing RMAS-HHD murine tumour cells.
- T2 cells (3xl0 5 cells/ml) were incubated with various concentrations of peptides in serum-free RPMI 1640 medium supplemented with lOOng/ml of human ⁇ 2m at 37°C for 16 hrs . Cells were then washed twice and stained with the BB7.2 mAb followed by FITC conjugated goat anti mouse Ig mAb to quantify the expression of HLA-A*0201. For each peptide concentration, the HLA-A*0201 specific staining was calculated as the % of the staining obtained with lOO ⁇ M of the reference peptide HIVpol 589 (IVGAETFYV) .
- RA (Concentration of peptide that induces 20% of HLA-A*0201 expression / Concentration of the reference peptide that induces 20% of HLA-A*0201 expression) and is expressed as log ⁇ 0 .
- the mean RA value for each peptide was determined from at least three independent experiments. In all experiments, 20% of HLA-A*0201 expression using the reference peptide was obtained at l-3 ⁇ M.
- Multi- layered feed-forward networks are highly non-linear tools for function approximation. A summation of the combined inputs is used to predict the output values via a transfer function. In this study we used a three-layer, fully connected architecture. The parametric model represented by this network can be mathematically formulated as :
- K is the number of input nodes, N the number of hidden nodes and M the number of output nodes.
- x k is the output of the input node k
- ⁇ n is the bias of the input of hidden node n
- W kn is the weight connecting input node k to hidden node n
- w nm is the weight connecting hidden node n to output node m
- f is the activation function.
- the neural network implementation in TSAR ® uses an identity activation function. For our set of peptides, the artificial neural network calculated the difference between the predicted RA and the experimental values. This difference is used to adjust the weights in the hidden layers and to minimize the overall error. For testing the predictive ability of the SCIPS model, 30% of the input data were excluded from the learning set and used as an internal validation set.
- Table 2 List of peptides selected by the SCIPS model. Each peptide shows a predicted relative affinity (see Methods) and its BiMass score.
- Antigen RA Bi ass sequence experimental predicted Score hTERT S4 ( : ⁇ o) ILAKFLHWLM 0.5 0.7 63 hTERTi22 YLPNTVTDA 0.0 0.5 52 hTERT 122(10 ) YLPNTVTDAL 0.3 0.9 48 hTERT 381 RLPQRYWQM 0.8 0.9 56 hTERT 407 VLLKTHCPL 0.3 -0.2 134 hTERT 496 (10) SLGKHAKLSL 1.7 1.8 21 hTERT 511 (10) KMSVRGCAWL 1.4 1.4 297 hTERT 540 ILAKFLHWL -0.4 1.3 1745 hTERT 544 (10) FLHWLMSVYV -0.7 -0.1 1796 hTERT 547 (10) WLMSVYWEL -0.5 -0.2 835 hTERT 548 LMSVYWEL -1.0 -0.1 60 hTERTsie LLTSRLRFI 0.9 1.0 45 hTERT 7
- the 161 peptides tested for their immunogenicity belong to the ' training and the validation (internal and external) sets, The chi-square test was used to compare immunogenicity between TP and FN and between TN and FP .
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Peptides Or Proteins (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01402106.7 | 2001-08-03 | ||
EP01402106 | 2001-08-03 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2003015001A2 true WO2003015001A2 (en) | 2003-02-20 |
WO2003015001A3 WO2003015001A3 (en) | 2004-08-19 |
Family
ID=8182843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2002/003244 WO2003015001A2 (en) | 2001-08-03 | 2002-07-15 | Method for identification of protein function |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2003015001A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003054770A1 (en) * | 2001-12-21 | 2003-07-03 | Janssen Pharmaceutica N.V. | A method of clustering transmembrane proteins |
CN103177198A (en) * | 2011-12-26 | 2013-06-26 | 深圳华大基因科技有限公司 | Protein identification method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000079263A2 (en) * | 1999-06-18 | 2000-12-28 | Synt:Em S.A. | Identifying active molecules using physico-chemical parameters |
WO2001031579A2 (en) * | 1999-10-27 | 2001-05-03 | Barnhill Technologies, Llc | Methods and devices for identifying patterns in biological patterns |
-
2002
- 2002-07-15 WO PCT/GB2002/003244 patent/WO2003015001A2/en not_active Application Discontinuation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000079263A2 (en) * | 1999-06-18 | 2000-12-28 | Synt:Em S.A. | Identifying active molecules using physico-chemical parameters |
WO2001031579A2 (en) * | 1999-10-27 | 2001-05-03 | Barnhill Technologies, Llc | Methods and devices for identifying patterns in biological patterns |
Non-Patent Citations (4)
Title |
---|
ADAMS H-P ET AL: "Prediction of binding to MHC class I molecules" JOURNAL OF IMMUNOLOGICAL METHODS, ELSEVIER SCIENCE PUBLISHERS B.V.,AMSTERDAM, NL, vol. 185, no. 2, 25 September 1995 (1995-09-25), pages 181-190, XP004021192 ISSN: 0022-1759 * |
BRUSIC V ET AL: "PREDICTION OF MHC BINDING PEPTIDES USING ARTIFICIAL NEURAL NETWORKS" COMPLEX SYSTEMS: MECHANISM OF ADAPTATION, IOS PRESS, AMSTERDAM,, NL, 1994, pages 253-260, XP000933707 cited in the application * |
BRUSIC V ET AL: "PREDICTION OF MHC CLASS II-BINDING PEPTIDES USING AN EVOLUTIONARY ALGORTIHM AND ARTIFICIAL NEURAL NETWORK" BIOINFORMATICS, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 14, no. 2, 1998, pages 121-130, XP000929180 ISSN: 1367-4803 * |
GRASSY G ET AL: "COMPUTER-ASSISTED RATIONAL DESIGN OF IMMUNOSUPPRESSIVE COMPOUNDS" NATURE BIOTECHNOLOGY, NATURE PUBLISHING, US, vol. 16, August 1998 (1998-08), pages 748-752, XP000981977 ISSN: 1087-0156 cited in the application * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003054770A1 (en) * | 2001-12-21 | 2003-07-03 | Janssen Pharmaceutica N.V. | A method of clustering transmembrane proteins |
CN103177198A (en) * | 2011-12-26 | 2013-06-26 | 深圳华大基因科技有限公司 | Protein identification method |
Also Published As
Publication number | Publication date |
---|---|
WO2003015001A3 (en) | 2004-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | An introduction to epitope prediction methods and software | |
Janin et al. | Macromolecular recognition in the protein data bank | |
Athanasios et al. | Protein-protein interaction (PPI) network: recent advances in drug discovery | |
Schuler et al. | SYFPEITHI: database for searching and T-cell epitope prediction | |
Frishman et al. | Seventy‐five percent accuracy in protein secondary structure prediction | |
Tong et al. | Methods and protocols for prediction of immunogenic epitopes | |
Brusic et al. | Prediction of promiscuous peptides that bind HLA class I molecules | |
Pontén et al. | The Human Protein Atlas—a tool for pathology | |
Buus | Description and prediction of peptide-MHC binding: the ‘human MHC project’ | |
Sun et al. | Advances in in-silico B-cell epitope prediction | |
Singh et al. | Structural interaction fingerprints: a new approach to organizing, mining, analyzing, and designing protein–small molecule complexes | |
Hattotuwagama et al. | Quantitative online prediction of peptide binding to the major histocompatibility complex | |
Nimrod et al. | In silico identification of functional regions in proteins | |
Sathiamurthy et al. | Population of the HLA ligand database | |
AU2001245011B2 (en) | System and method for systematic prediction of ligand/receptor activity | |
Ramensky et al. | A novel approach to local similarity of protein binding sites substantially improves computational drug design results | |
Verkhivker et al. | Monte Carlo simulations of the peptide recognition at the consensus binding site of the constant fragment of human immunoglobulin G: the energy landscape analysis of a hot spot at the intermolecular interface | |
Stoddard et al. | Molecular recognition analyzed by docking simulations: the aspartate receptor and isocitrate dehydrogenase from Escherichia coli. | |
JP2002533477A (en) | Systems and methods for structure-based drug design including accurate prediction of binding free energy | |
Lauria et al. | Drugs polypharmacology by in silico methods: new opportunities in drug discovery | |
Brusic et al. | Bioinformatics tools for identifying T-cell epitopes | |
Flower et al. | Computational vaccinology: quantitative approaches | |
JP2003524831A (en) | System and method for exploring combinatorial space | |
WO2003015001A2 (en) | Method for identification of protein function | |
EP1373887A1 (en) | Computer-based strategy for peptide and protein conformational ensemble enumeration and ligand affinity analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |