WO2005069188A1 - 化合物および蛋白質間の相互作用を予測するシステム - Google Patents
化合物および蛋白質間の相互作用を予測するシステム Download PDFInfo
- Publication number
- WO2005069188A1 WO2005069188A1 PCT/JP2004/019404 JP2004019404W WO2005069188A1 WO 2005069188 A1 WO2005069188 A1 WO 2005069188A1 JP 2004019404 W JP2004019404 W JP 2004019404W WO 2005069188 A1 WO2005069188 A1 WO 2005069188A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- protein
- class
- information
- compound
- proteins
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
Definitions
- the present invention relates to a system for analyzing and Z or predicting an interaction between a compound and a protein, a system for predicting a similar protein or a similar compound, and a method thereof. More specifically, based on data in which amino acid sequence information of a protein, structural information of a compound, etc., and information of an interaction between a compound and a protein are correlated, an arbitrary compound, an interaction between a protein, a similar protein or a similar compound, It is a method of analyzing and predicting Z or Z.
- a docking study for analyzing the complementarity between a compound and a protein, and a method for analyzing the complementarity between a compound and a protein, are currently known as methods for analyzing and predicting the interaction between a protein that is an in vivo factor and a chemical substance.
- Informatics technology that performs analysis based on information science using both structural descriptors as explanatory variables is known.
- the most advanced is the docking study technology. This is a method to search for a model in which any compound binds well near the active site of a protein.
- this method requires a large amount of time because it is necessary to search for the optimal solution in addition to the assumption that the three-dimensional structure coordinates of the protein are known. Computing speed is not suitable for exhaustive analysis, and its accuracy is sufficient, but it is difficult and difficult.
- Non-Patent Document 1 As a method for improving the calculation accuracy, annotations for four main classes of ligands are available. And a technique for applying it to in silico screening and library design (Non-Patent Document 1). The annotation method in this document is based on stratification based on ligand function and existing classification. Then, based on the annotation, a search is performed on the ligand database.
- Non-Patent Document 2 a method of searching for binding of a ligand to a similar target as well as the same target as a ligand serving as a reference has been proposed (Non-Patent Document 2).
- the search method described in this document uses the description of a molecule that reflects the ability of the molecule to interact with the target protein and the molecular structure.
- Non-patent literature l Ansgar Schuffenhauer et al., "Ontology for drug ligands and application of ontology to in silico screening and library design (An
- Non-patent document 2 Ansgar Schuffenhauer et al., "Similarity Metrics for Ligands Reflecting tne Similarity of the Target Proteins" J J. Chem. Inf. Comput. Sci. 2003 Volume 43 pages 391-405.
- the problems of the present invention are roughly classified into two problems. One is to build a database, and the other is to establish an informatics method. Specifically, the construction of a database that integrates information on chemical substances and biotechnology for the interaction of multiple compounds and multiple proteins, and a comprehensive system that uses them to provide both calculation speed and accuracy The establishment of a method for analyzing a typical compound-protein interaction. It was developed by multiplying the means to solve the problems. As a result of intensive studies, the present inventors have found that one of the features of the apparatus and method of the present invention is that data in which amino acid sequence information of a protein, structural information of a compound, and interaction information of a protein compound are correlated with each other.
- the present invention includes the following features.
- the method according to the present invention provides amino acid sequence information of proteins, amino acid sequence information of proteins systematically classified according to function and similarity of Z or structure, structural information of compounds, and interaction between proteins and compounds. This is a method of predicting the interaction between a given protein and a compound based on data on which action information is correlated.
- the method for predicting the interaction between an arbitrary protein and an arbitrary compound according to the present invention comprises the steps of: (a) treating the protein and a group of proteins having a similar function or Z or structure to the protein; And a structure-activity relationship model that can discriminate a compound group interacting with an arbitrary compound group, and (b) an interaction between the protein and a group of proteins that are functionally and Z or structurally similar to the protein.
- This is a method for predicting a combination of a compound and a structure-activity relationship model capable of identifying a compound group that interacts with the protein from the compound group.
- the method for predicting the interaction between an arbitrary protein and an arbitrary compound according to the present invention is based on a systematic classification based on the function of amino acid sequence information and the similarity of Z or structure.
- the compound group that interacts with the protein group that belongs to the classification item to which the protein belongs is compared with the protein group that belongs to the child classification item that has a common parent classification item with the protein.
- Interact This is a method of predicting by combining a compound group with a structure-activity model that can be identified.
- the system according to the present invention is a prediction system for predicting a protein having a similar function and Z or structure, wherein (a) a first classification showing a classification of a functional characteristic and a Z or structural characteristic of a protein.
- a first recording means for recording information on a first-class protein belonging to the first category and information on a non-first-class protein not belonging to the first category; (b) a first-class protein recorded on the first recording means;
- Information of a second class protein belonging to the second class indicating the classification of functional features and Z or structural features that are smaller concepts than the first class, and belonging to the first class but not belonging to the first class.
- Second recording means for recording a non-secondary class protein which does not belong to the two classes, (c) acquisition means for acquiring the information to be predicted indicating the functional characteristics and Z or structural characteristics of the protein to be predicted, (d) the acquisition Means for prediction and information recorded in the first recording means.
- First analyzing means for analyzing the similarity between the protein to be predicted and the first class protein in comparison with the non-first class protein based on the obtained information, and (e) the first analyzing means. If it is analyzed that the protein to be predicted is similar to the protein of the first class by the above, the non-second class protein is further analyzed based on the information to be predicted and the information recorded in the second recording means.
- Second analysis means for analyzing the similarity between the protein to be predicted and the second class of proteins in comparison with (f), based on the analysis results by the first analysis means and the second analysis means,
- An output means for outputting information on a protein having similar functional characteristics and Z or structural characteristics to the target protein
- the prediction system further comprises: (g) information on a compound interacting with the protein, and information similar to the protein to be predicted analyzed by (d) the first analysis means and Z or (e) the second analysis means. And interaction information analyzing means for analyzing information of a compound predicted to interact with the protein to be predicted, based on the information of the protein to be predicted, and (f) the output means, Information on proteins having similar functional and Z or structural characteristics to the target protein, and analyzed by Z or the interaction information analyzing means And outputs information on the compound.
- the (f) output means of the present invention may further include information on the protein and Z or a compound interacting with the protein as an analysis result by the second analysis means, and Is also a large concept, and outputs information on the protein and Z or a compound interacting with the protein as an analysis result by the first analysis means.
- the system according to the present invention is a prediction system for predicting a protein (or compound) having a similar function and Z or structure, and (a) a function characteristic of the protein (or compound) and Z Or, information on Class 1 proteins (or Class 1 conjugates) belonging to Class 1 indicating the classification of structural characteristics, and non-Class 1 proteins (or non-Class 1 compounds) not belonging to Class 1
- a non-second class protein (or a non-second class protein) (C) acquisition means for acquiring prediction target information indicating functional characteristics and Z or structural characteristics of the prediction target protein (or prediction target compound); (d) prediction target information acquired by the acquisition means and the first recording means Based on the information recorded in the above, the protein to be predicted (or the compound to be predicted) and the protein to be classified into the first class (in the comparison with the non-class 1 protein (or the non-class 1 conjugate)).
- the protein to be predicted is converted into the first class protein (or the first class compound) by the first analysis means. If it is analyzed as similar to the non-second-class protein (or the non-second-class protein) based on the information to be predicted and the information recorded in the second recording means.
- a second analysis means for analyzing the similarity between white matter (or the compound to be predicted) and the second class protein (or the second class conjugate);
- the first analysis means and the second analysis means Output means for outputting information on a protein (or compound) having similar functional characteristics and Z or structural characteristics to the target protein (or target compound) based on the analysis result; It is a prediction system provided with.
- FIG. 1 is a flowchart showing a procedure of a process according to an embodiment of the present invention.
- FIG. 2 is a diagram showing the concept of a category hierarchy when proteins are hierarchically classified in the embodiment.
- FIG. 3 is an example of a configuration of a screen displaying an analysis result according to the embodiment.
- Fig. 4 shows a global model of "Phosphodiesterase (hereinafter, referred to as" PDE ”) among the models obtained for the active group data set in Example 1.
- PDE Phosphodiesterase
- Fig. 5 shows that among the models obtained for the inactive group data set used in the global model in Example 1, “PDE global model + local model of each subtype” and “ This is a graph showing the difference in discrimination ability when the “global model of each subtype” is applied. The vertical axis of the graph indicates “rate recognized as activity”.
- FIG. 6 shows the “PDE global model + local model of each subtype” and “each subtype” among the models obtained for the inactive group data set used in the local model in Example 1. This is a graph showing the difference in discrimination ability when the “global model of type” is applied. The vertical axis of the graph indicates “rate recognized as activity”.
- Fig. 7 shows the discriminability and the Bayesian net analysis that also obtained the CART method (prior equal probability) for the active group data set using the global model of PDE and the local model of each subtype. It is a graph which showed the result of comparison. The vertical axis of the graph indicates the “rate recognized as activity”.
- Fig. 8 shows the discriminability and Bayesian net obtained by the CART method (prior equal probability) using the global model of PDE and the local model of each subtype for the inactive group data set. It is the graph which showed the result of having compared with analysis. The vertical axis of the graph is Rate ".
- Figure 9 shows the results of comparing the CART method (equi-established) with Bayesian net analysis using the PDE global model and the local model of each subtype for the active group data set. It is a graph. The vertical axis of the graph indicates “rate recognized as activity”.
- Fig.10 shows the results of comparison between the CART method (equivalent establishment) and Bayesian net analysis using the global model of PDE and the local model of each subtype for the inactive group data set. It is a graph. The vertical axis of the graph indicates “rate recognized as activity”.
- Figure 11A shows the combination of the global model of PDE and the local model of the subtype (PDE1), and three types of data: an active group, an inactive group used in the global model, and an inactive group used in the local model. It is a distribution graph applied to the set. The vertical axis of the graph indicates the “rate recognized as activity”.
- Fig.11B shows the combination of the global model of PDE and the local model of subtype (PDE2), and the three types of active group, inactive group used for global model, and inactive group used for local model. It is a distribution graph applied to the data set.
- Fig. 11C shows the combination of the global model of PDE and the local model of subtype (PDE3) to generate three types of groups: an active group, an inactive group used for the global model, and an inactive group used for the local model. It is a distribution graph applied to the data set.
- Fig.11D shows the combination of the global model of PDE and the local model of subtype (PDE4) to generate three types of groups: an active group, an inactive group used for the global model, and an inactive group used for the local model. It is a distribution graph applied to the data set.
- Fig. 11E shows the combination of the PDE global model and the subtype (PDE5) local model to generate three types: an active group, an inactive group used for the global model, and an inactive group used for the local model. It is a distribution graph applied to the data set.
- FIG. 12 is a functional block diagram of the interaction analysis device.
- FIG. 13 is an example of a hardware configuration of an analyzer.
- FIG. 14 is a conceptual diagram of an interaction analysis process as an embodiment of the present invention.
- FIG. 15 is a diagram showing an example of the structure of a protein database in the embodiment.
- FIG. 16 is a diagram showing a structural example of a compound database in the embodiment.
- FIG. 17 is a diagram showing an example of the structure of an interaction database in the embodiment.
- FIG. 18 is a diagram showing a structural example of a systematic classification database in the embodiment.
- FIG. 19 is a schematic diagram illustrating an evaluation function according to the embodiment.
- FIG. 20 is a flowchart of an interaction analysis processing program according to the first embodiment.
- FIG. 21 is a flowchart of an interaction analysis processing program according to the second embodiment.
- FIG. 22A and FIG. 22B are screen display examples output according to the second embodiment.
- the "amino acid information of a protein” in the present invention includes, for example, a sequence, a function, or a three-dimensional structure.
- sequence and function there are known information, information on which the informatics power is estimated, plural kinds of annotation information, and ontology information oriented to systematic function classification.
- known information includes a public database such as a PDB (Protein Data Bank) and a commercial or in-house database constructed by homology modeling.
- Commercial homology modeling databases include FAMSBASE sold by SGI.
- “Structural information of a compound” of the present invention includes, for example, information describing a structural formula, drug information, and the like. For example, the presence or absence and pharmacological activity of a compound and the Z or strength, or Launched from Biological Testing And development stage information leading up to. For example, MDDR (MDL Drug Data Report) of MDL and the like can be mentioned.
- MDDR MDL Drug Data Report
- the "protein amino acid information" used in the present invention may be a database obtained by combining and integrating the amino acid information portions of the protein, or a database containing all of the amino acid information of the protein as described above.
- a database containing all of the amino acid information of the protein there is no limitation on the source of data such as commercial or in-house data as long as the information is provided.
- information classified systematically according to function and similarity of Z or structure is mentioned, for example, information of Gene Ontology (registered trademark) and the like. "Gene Ontology (registered trademark) Has been published on the website of the Consortium (http://www.geneontology.org).
- reaction refers to the relationship between a protein and a compound having activity against the protein, or the relationship between the compound and a protein having complementarity to the compound.
- interaction includes the interaction of multiple compounds with multiple proteins or the interaction of multiple proteins with multiple compounds.
- the "structure-activity relationship model" of the present invention uses, for example, a protein (or compound) belonging to a predetermined classification and structural characteristic information of an arbitrary protein (or compound) as a data set, and uses them as explanatory variables. And an evaluation function (Global Model) obtained by a predetermined analysis method, and as a data set, the structural characteristics of proteins (or compounds) belonging to a predetermined classification and proteins (or compounds) related thereto. It uses information and gives them explanatory variables, and includes an evaluation function (hereinafter referred to as a local model) obtained by a predetermined analysis method.
- a local model an evaluation function obtained by a predetermined analysis method.
- an explanatory variable for example, a pharmacophore descriptor used as an explanatory variable in a quantitative structure-activity relationship analysis, a topological index used for similarity search, or an ADMET-related index can be used.
- an analysis method for example, a multiple regression analysis, a linear 'non-linear discriminant analysis, a logistic regression analysis, a neural network, a decision tree analysis, a Bayesian network, a support vector machine, or the like can be used.
- the functions of the device according to the embodiment of the present invention can be divided into “search”, “browsing”, and “analysis”, and the existing environment can be used as it is for “search” and “analysis”.
- Existing environments include systems that have a compound and protein list input / output function and can be viewed in a format in which compound information and bioinformation are linked.For example, a client server type system, a Web-based system, etc. No.
- the system of the present invention can access a plurality of databases, display the input list in a reflected form, and individually specify output targets.
- Program description languages include C, C ++, JAVA (registered trademark), HTML, XML, and the like. It is also possible to use an existing program such as rChimej, which is provided free of charge by MDL for browsing the structural formula on the Web base.
- FIG. 1 is a flowchart showing the concept of one embodiment of the present invention.
- 101 shows a database of information on the protein side.
- the information on the protein side to be integrated includes “amino acid sequence” and “three-dimensional structure (including modeled one)”, and includes, for example, information obtained by SwissProt or the like.
- Reference numeral 102 denotes a database of information on the compound side.
- the information on the compound to be integrated includes “structural formula” and “conformation”, and includes, for example, information obtained from CAS or the like.
- 103 shows amino acid sequence information of proteins systematically classified according to function and Z or structural similarity.
- a category when proteins are hierarchically classified such as ontology information including a GO number of a gene ontology, and the like, may be mentioned.
- Information on 101 proteins is related to systematic classification information by 103 information.
- Reference numeral 104 denotes an interaction database associated with information on proteins and compounds. Examples of the information include a sales database such as MDDR (MDL Drug Data Report) of MDL, pharmacological activity test data, and information such as reverse proteomics.
- 105 shows a function of analyzing a structure-activity relationship.
- the amino acid sequence information of the protein (101 and 103 in FIG. 1), the structural information of the compound (102 in FIG. 1), and the information of the interaction between the protein and the compound are associated with each other, Based on the data in the integrated database (104 in Fig. 1), comprehensive interaction analysis is performed taking into account both the commonalities and differences of the functional features and Z or structural features to be analyzed. Specifically, it is systematically separated by 103 functions and similarity in Z or structure. Utilizing amino acid information of classified proteins, for example, Gene Ontology (registered trademark)
- a node indicates a category when proteins are hierarchically classified, and includes, for example, a GO number of a gene ontology.
- Figure 2 shows an image of the node hierarchy.
- the analysis model at each node is a combination (203) of the “global model at the node one level higher” (201) and the “local model between peer nodes” (202). It is a combination of the global model (204) of the top node and the local model (205) of each node.
- Model construction at each node is performed by informatics analysis using various structural descriptors as explanatory variables.
- various structure descriptors a pharmacophore descriptor used as an explanatory variable in a quantitative structure-activity relationship analysis, a topological index used for similarity search, an index related to ADMET, and the like can be used.
- the “global model of the top node” in FIG. 1 represents a model that can significantly distinguish a compound group belonging to the top node from any other compound group.
- the “local model of each node” refers to a model in which a group of compounds belonging to a certain node can be distinguished from a group of compounds belonging to other nodes having a common parent node.
- FIG. 3 shows a screen display image of the interaction analysis system according to the embodiment. The details of the interaction analysis process will be described later.
- Numeral 301 in FIG. 3 shows a ⁇ diagram of the functional classification of amino acid sequence information (eg, gene ontology) of proteins systematically classified based on the similarity of function and Z or structure. 301 is associated with the corresponding compound number.
- the tree is expanded only for the node containing the specified amino acid sequence information number (for example, the GO number in Gene Ontology) or the compound number, and the others are displayed in a folded state.
- the tree on the right side of FIG. 3 shows the expanded state. For example, the GO number corresponding to the analysis result is displayed in a different character color.
- Each node displays the lower amino acid sequence information number (for example, "GO number”; the same applies hereinafter), the number of amino acids, and the total number of compounds, and changes as the display format changes. Click any compound number By clicking, the corresponding structural formula and its accompanying data are displayed.
- buttons with list input / output functions are arranged (302 to 305 in Fig. 3). The input assumes the amino acid sequence information number and the compound number, and the output assumes the protein sequence information, the protein coordinate data (PDB format), and the compound number.
- a check box is set for each of the amino acid sequence information number corresponding to the terminal node and the compound number, and a list is output for each checked item.
- a plurality of amino acid sequence information numbers of 301 can be designated by clicking a node or inputting a list of amino acid sequence information numbers.
- the Run button of 304 By pressing the Run button of 304, the number of the specified amino acid sequence information X
- the score of the structural formula is calculated Is done. If you specify the number of the specific amino acid sequence information in 301 or one of the compounds displayed in 306, the score is displayed in the other.
- the Filter button (305) after specifying the threshold records (nodes) that are equal to or greater than the threshold are extracted.
- the filter operation in 305 can be executed multiple times with and / or / not specified, and the result can be output to a delimited text file such as CSV (Comma Separated Values) format.
- CSV Common Separated Values
- Verification of the analysis processing method used by the interaction analysis system as the embodiment will be described.
- verification results of a plurality of analysis processing methods using examples of mutual analysis information of a predetermined compound and a protein will be described.
- the interaction analysis processing by the interaction analysis system uses the CART method in which a pharmacophore descriptor or the like for identifying the presence or absence of pharmacological activity of the compound is used as an explanatory variable.
- a “global model (global model)” using various compound sets including “inactive group” as a compound group that does not interact with the target protein (or “small interaction”; the same applies to the following).
- the “local model (local model)” that uses a compound set near the active group to realize the interaction analysis processing. The contents of global model and local model It will be described later.
- Table 1 shows the one-level tree structure, which is the basic unit of the Gene Ontology, between the "global noremodel of each node” and the "global noremodel of the upper node and the locale model of each node".
- 3 shows a comparison of the discriminating power of each lower node in FIG.
- five subtypes PDE1 to PDE5 having, as an example, phosphodiesterase (Phosphodiesterase (hereinafter, “PDE”);) as an upper node were used.
- PDE is a general term for an enzyme that hydrolyzes a phosphoric diester into a phosphoric monoester.
- the upper node, PDE has 2871 compounds.
- the variation in the number of compounds belonging to each lower node is large with a minimum of 29 compounds (PDE2) and a maximum of 1699 compounds (PDE4).
- PDE2 29 compounds
- PDE4 maximum of 1699 compounds
- Table 1 shows the discriminating power of the global model, and the lower part shows the discriminating power of the local model.
- Each column has two numerical values.
- the left side is the discriminating ability for the data set (learning data) used to construct the model (for example, the evaluation function for identifying the compound), and the right side is the construction. It shows the discrimination ability for the data set (verification data) used for verification of the model.
- An overview of the results shows that a favorable model is generally obtained when the prior probabilities are equal, so the following study used a model with the prior probabilities equal.
- FIGS. 4 and 6 show (l) a group of compounds interacting with PDE (active group), (2) an inactive group (compound group not interacting with PDE) used in the global model, and (3) )
- the inactive group used for the local model, and the three models obtained above "(a) Global model of PDE + local model of each subtype (" GlobaLPDE & This is a graph showing the difference in discriminability between the case where “Local_PDEx ⁇ ” and “(b) Global model of each subtype (shown as“ Global_PDEx ”in the figure)” are applied. As shown in Figs.
- the Naive Bayes method, the Markov Blanket method, and the Augmented Markov Blanket method which complete calculations in a short time, have similar tendencies. Their discrimination ability is hardly sufficient. Was something.
- the Sons & Spouses method requires a relatively long calculation time as compared with the previous three methods, but shows discrimination ability close to that of the CART method. However, when the number of active groups is extremely small, the discrimination ability is greatly reduced.
- the Augmented Naive Bayes method requires almost the same calculation time, but shows high discrimination even when the number of active groups is small.
- the Sons & Spouses method showed the same discrimination power as the CART method, but clearly overtrained. Therefore, the Augmented Naive Bayes method and the Sons & Spouses method have advantages and disadvantages.
- the discrimination rate for the active group was improved, and the false recognition rate for the inactive group was higher. Especially for PDE-1 and PDE-2 with a small number of data, a significant improvement in the discrimination rate was observed.
- the Augmented Naive Bayes method was overtrained, there was no difference in the results even when the prior probabilities were considered.
- the distribution threshold of the score value may be used to determine the classification threshold.
- the Sons & Spouses method can be adopted in consideration of the balance between discrimination ability and overtraining. The results are shown in Figs.
- the classification is determined by the binary, but here it is expressed as the probability of matching at each node.
- the probability is expressed as a conditional probability with the upper node as an example, and the classification threshold is determined from the distribution of the probability values.
- Figures 11A, B, C, D, and E show the probability distributions of the three data sets for each subtype (PDE-1-5). In the figure, “1” represents an active group, “0” represents an inactive group used in a local model, and “11” represents an inactive group used in a global model.
- the horizontal axis represents the conditional probability when the global model of PDE and the local model of each subtype are applied.
- the vertical axis of the graph indicates “rate recognized as activity”. As shown in Fig. 11, as in the case of the CART method, both the inactive group in the global model and the inactive group in the local model are well separated from the active group. In addition, since it is represented by conditional probabilities, some values take an intermediate value between 0 and 1.
- an analysis model may be constructed in which inactivity information is treated as a missing value and only data with known pharmacological activity is used as a data set. Therefore, the activity model of the inhibitor for PDE and the subtype of PDE is referred to as “Support Vector Machine” below, which is referred to as “SVM”. ), A prediction model was constructed, and 4-fold cross validation was performed. The parameters in the SVM were fixed, and the standardization of the explanatory variables and the Gaussian 'kernel were used. The software used is LIBSVM. The concept of SVM is described, for example, in “Vapnik, Statistical Learning Theory, Wiley, 1998”. Crossing Tables 4 and 5 show the test results.
- OCSVM One-Class SVM
- a model of OCSVM was constructed for PDE1-5 inhibitors, cross-validation was performed within the active group, and discrimination ability for 3000 randomly sampled compounds was verified.
- the parameters in the SVM were fixed, and explanatory variables were standardized and the RBF kernel (Gaussian 'kernel) was used.
- the software used was LIBSVM.
- the concept of OCSVM is described in, for example, “B. Scholkopf.et.al. Estimating the support of a high-dimensional distribution. Neural Computation, 1 ⁇ , 2001, 1443-1471”.
- the results of verification of the analysis processing method have been described using a plurality of general statistical processes as examples.
- the analysis processing according to the present invention can be realized by any of the above methods, a modification of each method, a combination of each method, or a method known to those skilled in the art.
- an apparatus that implements the above-described analysis processing method as an embodiment of the present invention and details of the analysis processing method will be mainly described.
- FIG. 12 shows a functional block diagram of an interaction analyzer 500 as an embodiment of the system or method of the present invention.
- the interaction analyzer 500 includes (a) first recording means 72, (b) second recording means 74, (c) acquisition means 70, (d) first analysis means 76, (e) second analysis means 78, (F) output means 82; and (g) interaction information analysis means 80.
- FIG. 13 shows an example of a node configuration in which the interaction analysis device 500 shown in FIG. 12 is realized using a CPU.
- the interaction analysis device 500 includes a CPU 10, a memory 12, a speaker 14, a communication circuit 16, a keyboard Z mouse 18, a display (display device) 20, and a hard disk 22.
- the CPU 10 executes an interaction analysis process described later and controls the entire interaction analysis device 500.
- the hard disk 22 records a program (for example, an interaction analysis processing program) that controls the protein database 600, the compound database 700, the interaction database 800, the systematic classification database 900, and the interaction analyzer 500.
- the memory 12 is used as a work area of the CPU 10 and a storage area for acquired data. Information entered by operating the keyboard / mouse 18 is processed by the CPU 10. It is.
- OS operating system
- NT NT
- 2000 or the like
- the computer program of the embodiment implements each function shown in FIG. 12 in cooperation with the OS, but is not limited thereto, and may implement each function by the computer program alone.
- FIG. 14 is a conceptual diagram of the interaction analysis processing as an embodiment of the present invention.
- the interaction analyzer 500 as an embodiment includes a protein database 600, a compound database 700, an interaction database 800, and a systematic classification database 900.
- the device 500 has, for example, a function of predicting a protein that interacts with a compound to be analyzed and a function of predicting a compound that interacts with a protein to be analyzed.
- the protein database 600 information on a plurality of proteins is recorded.
- the compound database 700 records information on a plurality of compounds.
- the interaction database records information about the interactions between proteins and compounds (Symbol 1000). Therefore, the interacting objects of the protein recorded in the protein database 600 and the compounds recorded in the compound database 700 are associated with each other.
- the protein information recorded in the protein database 600 is systematically classified according to the information in the systematic classification database 900.
- the systematic classification database 900 may systematically classify the compound information recorded in the compound database 700.
- the systematic classification database 900 may systematically classify information combining proteins (included in database 600) and compounds that interact with the protein (included in database 700).
- the systematic classification database 900 according to the embodiment includes information obtained by systematically classifying information related to a protein based on protein function and similarity of Z or structure, more specifically, a gene ontology database. Hierarchically classify proteins by ontology information including GO numbers Contains similar information.
- the information of the interaction between the protein and the compound is systematically classified based on the information of the database 900. Becomes (symbol 1002).
- the systematic classification of proteins and Zs or compounds is not limited to those described in the embodiment, and may include, for example, physical properties, molecular structures, structural formulas, amino acid sequences, structural annotation information, ligand functions, or functional annotation information. And the similarity of information about Z or structure can be used.
- the tree structure 1004 shown in FIG. 14 shows the relationship between proteins and Z or compounds that are systematically classified by the systematic classification database 900.
- the upper classification node 1008 includes a plurality of proteins and Zs or compounds.
- each of the lower classification nodes 1006 and 1010 includes those having predetermined functional characteristics and Z or structural characteristics selected from proteins and Z or compounds belonging to the higher classification node 1008.
- Figure 14 shows a total of three classification nodes divided into two layers for convenience of explanation. Any number of layers in the systematic classification and the number of classification nodes included in each layer can be adopted according to the contents of the systematic classification to be used.
- the interaction analyzer 500 uses the tree structure 1004 to systematically classify each node of the protein and the Z or the compound. Use information from Specifically, the device 500 analyzes whether or not the analysis target belongs to a higher-level classification node (step S101). Next, the device 500 analyzes whether or not the analysis target belongs to the lower classification node (S103). As described above, the device 500 analyzes whether or not the analysis target belongs to each classification node, that is, a protein and a Z or a compound having a similar function and Z or a structure (the interaction information contained in the database is known. ) And output information about the proteins and Zs or compounds that interact with the target of analysis.
- each component of the interaction analysis apparatus 500 shown in FIG. 12 and the corresponding functions in the embodiment include the following, for example.
- the first recording means 72 includes a node recorded in the systematic classification database 900 (see Fig. 18). Corresponds to information about A (see Table 66 in Figure 19).
- the second recording means 74 corresponds to the information on the node A-1 (or A-2) recorded in the systematic classification database 900 (see Table 62 or 68 in FIG. 19).
- the obtaining means 70 corresponds to the CPU 10 of the device 500 that executes the processing of step S201 in FIG.
- the first analysis means 76 corresponds to the CPU 10 executing the process of step S203 in FIG.
- the second analysis means 78 corresponds to the CPU 10 executing the processing of step S205 in FIG.
- the output unit 82 corresponds to the CPU 10 that executes the processing of step S211 in FIG. 20 or step S307 in FIG.
- the interaction information analysis means 80 corresponds to the CPU 10 executing the processing of step S305 in FIG.
- FIG. 15 shows the recorded contents of the protein database 600 as the embodiment.
- the protein database 600 records information on a plurality of proteins. More specifically, the protein database 600 includes information on “protein ID (Protein ID)” for identifying the protein, and “Structure Index” as an example of the structural characteristics and Z or functional characteristics of the protein. Columns included. Information on each protein contained in the protein database 600 is based on information in a general public database.
- the “structural index” is, for example, a value obtained by numerically converting the amino acid sequence and the three-dimensional structural information of Z or protein by means known to those skilled in the art.
- FIG. 16 shows the recorded contents of the compound database 700 as the embodiment.
- the compound database 700 records information on a plurality of compounds.
- the compound database 700 includes a column for recording “compound ID” for identifying a compound, and information indicating the structural characteristics and Z or functional characteristics of the compound.
- the information indicating the structural characteristics and the Z or functional characteristics of the compound includes, for example, the structural characteristics of the compound (including physical properties) and the structural characteristics based on the structural formula of Z or the compound.
- Figure 16 shows examples of structural characteristic information such as LogP (oil-water partition coefficient, n—particular tanol Z-water partition coefficient)), hydrogen bond acceptor (HBA), and hydrogen bond Includes donor (Hydrogen bond donor (HBD)) and molecular weight (Molecular weight (MW)).
- LogP oil-water partition coefficient, n—particular tanol Z-water partition coefficient
- HBA hydrogen bond acceptor
- HBD hydrogen bond Includes donor
- MW molecular weight
- FIG. 17 shows the recorded contents of the interaction database 800 as the embodiment.
- the interaction database 800 contains the proteins contained in the protein database 600 (identified by “Protein ID”) and the compounds contained in the compound database 700 (compounds identified by “Compound ID”).
- Activity which is information on the interaction with the compound (e.g., information on conjugates that exhibit pharmacological activity on proteins).
- activity information for example, information of MDL (MDL Drug Data Report) of MDL, information of general public database and information of Z or experimentally confirmed can be used.
- information on this interaction can also be created based on the correspondence between the names of proteins and compounds showing pharmacological activity (including synonyms).
- a numerical value (including a score value indicating a probability) that is an index of the interaction can be recorded.
- information on the interaction between the protein and the compound is recorded in the interaction database 800.
- the information on the interaction is recorded in the protein database 600 and Z or the compound database 700, so that the combination of the interacting protein and the compound can be associated.
- the device 500 can analyze the interaction between proteins.
- the protein databases 600 and Z or the interaction database 800 record combinations of interacting proteins.
- FIG. 18 shows the recorded contents of the systematic classification database 900 as the embodiment.
- the systematic classification database 900 includes information for systematically classifying a plurality of proteins recorded in the protein database 600 according to function and Z or structural similarity.
- proteins are hierarchically classified according to functional classification information of amino acid sequence information (for example, GO number of Gene Ontology).
- the systematic classification database 900 records the systematic classification information of proteins according to function and Z or structural similarity, for example, in an XML (Extensible Markup Language) tree structure 50.
- XML Extensible Markup Language
- Each node of the XML tree structure 50 is associated with a node number based on the GO number of the gene ontology and an evaluation function.
- the table data 52 recorded in the systematic classification database 900 records the correspondence between the protein ID included in the XML node and the node number.
- the table data 54 recorded in the systematic classification database 900 records a correspondence between a node number and an evaluation function for determining belonging to the node.
- FIG. 19 is a schematic diagram illustrating an evaluation function according to the embodiment.
- the evaluation function global model and local model
- the function and the Z or structural characteristic information of the protein (or compound) to be analyzed can be obtained.
- Acting compounds (or proteins) can be analyzed.
- the analysis target is a protein
- an evaluation function using the function information of the protein and Z or structural characteristic information as explanatory variables is used.
- an evaluation function using the function and Z or structural characteristic information of the compound as explanatory variables is used.
- FIG. 19 illustrates, as an example, an evaluation function that uses structural characteristic information of a protein as an explanatory variable.
- the tree structure 60 is the systematic classification information of proteins recorded in the systematic classification database 900 shown in FIG.
- the table 66 includes a protein belonging to the node A, which is an upper node (“P001” -006 shown by the symbol 67) and an arbitrary protein (“P007” -one).
- the tables 62 and 68 show the proteins belonging to node A (" P001 ”-“ P006 ”).
- the evaluation function can be obtained by using a predetermined analysis method based on information on a protein (or compound) whose structural characteristic information is known, which is included in the protein database 600 (or the compound database 700). Can be.
- the function of the apparatus 500 generating the evaluation function is referred to as a “learning function”.
- the evaluation function of the classification node A distinguishes between a protein belonging to the classification node A and an arbitrary protein not belonging to the classification A when the structural characteristic information of the protein is given as the explanatory variable X. This is the function that makes it possible.
- the evaluation function indicated by the symbol 69 and the evaluation functions indicated by the symbols 64 and 65 are different in a data set used to obtain the evaluation function.
- an evaluation function is obtained using information of a protein belonging to a predetermined upper classification (node A) and information of an arbitrary protein as a data set.
- the lower node (node A-1) information on proteins belonging to the lower classification (node A-1) and information on related proteins (belonging to node A but not belonging to node A-1) to obtain an evaluation function.
- the evaluation function included in the symbol 69 is expressed as a global model
- the evaluation functions included in the symbols 64 and 65 are expressed as a local model.
- One feature of the interaction analysis processing described below is that a global model in an upper node and a local model in a lower node are executed in combination. More specifically, the global model and the local model use different data sets. Therefore, it is possible to narrow down the classification nodes to be analyzed in a wide comparison target range by the global model, and to compare nearby nodes by the local model.
- the classification node to be analyzed can be specified after the difference from the object can be significantly identified.
- the analysis considering the “commonality” of the functional feature and the Z or structural feature of the analysis target, and the “difference” of the functional feature and the Z or structural feature of the analysis target are performed.
- One feature is that comprehensive interaction analysis is performed using both the analysis (local model) that is considered.
- the results of verifying the effectiveness of the analysis processing by combining the global model at the upper node and the local model at the lower node are as described in the item “2. Verification of the analysis processing method” above, for example.
- FIG. 20 is a flowchart of an interaction analysis processing program according to the first embodiment, which is executed by the CPU 10 of the interaction analysis apparatus 500.
- the device 500 performs the following processes: (1) prediction of a protein interacting with a compound, (2) prediction of a compound interacting with a protein, and (3) prediction of an interaction between a compound and a protein. It is possible.
- (2) prediction of a compound interacting with a protein will be described as an example.
- Other (1) prediction of the protein interacting with the compound and (3) prediction of the interaction between the compound and the protein can be executed by the same processing.
- the CPU 10 of the device 500 is operated by operating the keyboard Z mouse 18 by the user of the device.
- step S201 in FIG. 20 input of data on the functional characteristics and the Z or structural characteristics of the protein to be analyzed is received (step S201 in FIG. 20).
- structural feature data obtained by numerically converting an amino acid sequence is input.
- the evaluation function is, for example, the evaluation function (global model) of the node A shown in FIGS. 18 and 19 (see a symbol 69 in FIG. 19). If the analysis target does not belong to the higher classification node, the CPU 10 ends the processing.
- the CPU 10 It is analyzed whether or not the analysis target belongs to each of the +1 classification nodes (S205). Specifically, the CPU 10 calculates the presence / absence (Y) of belonging to the node using the input structural feature data as the explanatory variable (X) for the evaluation function of the lower classification node.
- the evaluation function is, for example, an evaluation function (local model) of the nodes A-1, A-2, ⁇ N shown in FIGS. 18 and 19 (see symbols 64 and 65 in FIG. 19).
- the CPU 10 determines whether or not the hierarchy ⁇ + 1 is the lowest hierarchy (lowest classification node).
- step S207 If it is determined that the class is not the lowest hierarchy, the CPU 10 sets N to N + 1 (S209), and executes the processing from step S205 on the classification node further lower than the classification node analyzed as belonging to the analysis target. repeat. If it is determined in the processing of step S207 that the hierarchical layer is the lowest hierarchical level, the CPU 10 outputs the analysis result of the classification node to the display 20, and ends the processing (S211).
- the CPU 10 applies the structural characteristic data to be analyzed to the global model of the upper node, and further applies it to the local model of the lower node in order. As a result, the CPU 10 outputs a classification node to which the analysis target belongs, that is, a protein (or a protein group) having similar structural characteristics and Z or functional characteristics to the protein to be analyzed.
- a classification node to which the analysis target belongs that is, a protein (or a protein group) having similar structural characteristics and Z or functional characteristics to the protein to be analyzed.
- the analysis result of the attribution to the classification node is expressed by a binary value of 0 or 1 (see FIG. 19).
- the analysis result of the attribution to the classification node may be represented by a score value.
- score values for example, not only the score value of the lowest classification node to which the analysis target belongs but also all (or part) classification nodes from the higher classification node to which the analysis target belongs to the lower classification node
- Information reflecting the score value may be output. For example, as an analysis result, the average value of the score values of all the belonging classification nodes can be displayed, or the value obtained by multiplying the score value of all the classification nodes can be displayed.
- the CPU 10 can also extract and output a record (a classification node or a corresponding tree) having a predetermined threshold (for example, 0.5) or more.
- FIG. 3 shows an example of the screen configuration of the analysis result output by the process of step 211. The contents of Fig. 3 were explained in the item of "1 2. Interface" above. As shown in Fig.
- the analysis result of the classification node is a tree structure that includes not only the lowest classification node predicted to belong to the analysis target but also the higher classification nodes including that classification node. indicate. Therefore, the user of the device can grasp the analysis result indicating to which classification node the analysis target belongs as the position in the entire tree structure (or a part thereof). For example, the difference is that one branch has the lowest classification node that belongs to the third lowest from the bottom of the systematic classification database 900, and another branch has the lowest classification node that belongs to the second lowest from the bottom. In some cases, by displaying the tree structure, it is possible to easily grasp the difference in the hierarchy of the plurality of classification nodes. In another embodiment, the CPU 10 can assign and display the above-described score values to the plurality of classification nodes (not shown).
- the CPU 10 can output information on compounds that interact with the protein (or protein group) based on information on the protein (or protein group) having similar structural characteristics and Z or functional characteristics. In the following description, an example of outputting such interaction information will be described as a second embodiment.
- FIG. 21 is a flowchart of an interaction analysis processing program according to the second embodiment, which is executed by the CPU 10 of the interaction analysis apparatus 500.
- the second embodiment and the first embodiment are common up to the processing of step S211 in FIG.
- the CPU 10 analyzes the information of the classification node of each hierarchy to which the analysis target belongs, and records it as the “classification node analysis result” in the memory 12 or the like (step S301). ).
- the analysis of the information of the classification nodes includes, for example, the above-described assignment of the score values, extraction of the classification nodes by using a threshold value, and the like.
- the CPU 10 records the ID of the protein belonging to the classification node corresponding to the lowest layer of the branch determined to belong to the analysis target in the memory 12 or the like as a “candidate ID” (S303). .
- CP U10 refers to the systematic classification database 900 illustrated in FIG.
- the CPU 10 refers to the protein database 600, the interaction database 800, and the compound database 700 to obtain information on compounds that interact with the protein identified by the “candidate ID” as “interaction candidate information”.
- the candidate ID is “P001”
- the CPU 10 acquires the compound “C005” interacting with “P001” based on the interaction database 800 (see FIG. 17), and obtains the compound database 700 (see FIG. 16). ),
- the information on “C005” is acquired as “interaction candidate information”.
- the CPU 10 outputs the interaction candidate information to the display 20, and ends the processing (S307).
- FIG. 22A and FIG. 22B are screen display examples output according to the second embodiment.
- FIG. 22A is an example of a screen displaying information on a protein predicted to interact with the compound when information on the compound is input as an analysis target.
- FIG. 22B is an example of a screen displaying information on a compound predicted to interact with a compound when information on the protein is input as an analysis target.
- the CPU 10 executes the output process at step S307 in FIG. 21 (see 22 as an output example) in addition to the output process at step S211 in FIG. 20 (see FIG. 3 as an output example). I do.
- the output processing in step S211 in FIG. 20 can be omitted.
- the display of the interaction candidate information associated with the tree structure illustrated in FIG. 3 described above can be adopted. Specifically, a compound (or protein) that interacts with a protein (or compound) belonging to the classification node is also displayed near the classification node in the tree structure illustrated in FIG.
- the interaction analysis device 500 is illustrated as an embodiment of the system or the method of the present invention.
- the method of the present invention can also be used as a stand-alone ordinary application software.
- Other embodiments include the following examples. [0096] (1) Client-server type
- a server device that executes the same process as the interaction analysis device 500, a process of transmitting data relating to an analysis target, and a process of receiving an analysis result (step S201 in FIG. 20, (See S 211), a combination with a client computer (client-server type) may be adopted.
- client-server type includes, for example, a system connected by a local area network (LAN) and a system by an ASP (Application Service Provider) service.
- the system or method of the present invention can be adopted as a module for adding functions to amino acid sequence analysis software and chemical structure analysis software.
- the system or method of the present invention is applied as a module for adding functions to a protein database (for example, PDB, FAMSBASE) or a structural database (for example, ISISBase (trademark) or Accord for Excel (trademark)). You can also.
- the interaction analysis device 500 is illustrated as an embodiment of the system or method of the present invention.
- other devices such as a Personal Digital Assistant (PDA) may be used.
- PDA Personal Digital Assistant
- a program for operating the CPU 10 is stored in the node disk 22, but this program may be read from a CD-ROM in which the program is stored and installed on a hard disk or the like.
- a program such as a DVD-ROM, a flexible disk (FD), or an IC card may be installed from a computer-readable recording medium.
- the program can be downloaded using a communication line.
- the program stored in the CD-ROM is not indirectly executed by the computer, but the program stored in the CD-ROM is directly executed. It may be executed.
- programs that can be executed by a computer include those that can be directly executed by simply installing the program as it is, and those that need to be converted into another form once ( For example, decompressing data that has been compressed), and also includes those that can be executed in combination with other module parts.
- each function of FIG. 12 is realized by a CPU and a program. A part or all of each function may be configured by hardware logic (logic circuit).
- the series of operations can be automated, and as the database is expanded ( ⁇ sales DB, in-house pharmacology evaluation results, reverse proteomics information, etc.), the model is updated as needed, and the quality of the fine copy database and the accuracy of prediction are measured.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005516985A JPWO2005069188A1 (ja) | 2003-12-26 | 2004-12-24 | 化合物および蛋白質間の相互作用を予測するシステム、類似蛋白質または類似化合物を予測するシステム、およびそれらの方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003-435659 | 2003-12-26 | ||
JP2003435659 | 2003-12-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005069188A1 true WO2005069188A1 (ja) | 2005-07-28 |
Family
ID=34791758
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/019404 WO2005069188A1 (ja) | 2003-12-26 | 2004-12-24 | 化合物および蛋白質間の相互作用を予測するシステム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2005069188A1 (ja) |
WO (1) | WO2005069188A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007105794A1 (ja) * | 2006-03-15 | 2007-09-20 | Nec Corporation | 分子構造予測システム、方法及びプログラム |
JP5448447B2 (ja) * | 2006-05-26 | 2014-03-19 | 国立大学法人京都大学 | ケミカルゲノム情報に基づく、タンパク質−化合物相互作用の予測と化合物ライブラリーの合理的設計 |
WO2014054526A1 (ja) * | 2012-10-01 | 2014-04-10 | 独立行政法人科学技術振興機構 | 承認予測装置、承認予測方法、および、プログラム |
CN107977548A (zh) * | 2017-12-05 | 2018-05-01 | 东软集团股份有限公司 | 预测蛋白质间相互作用的方法、装置、介质及电子设备 |
CN110070909A (zh) * | 2019-03-21 | 2019-07-30 | 中南大学 | 一种基于深度学习的融合多特征的蛋白质功能预测方法 |
CN113851195A (zh) * | 2020-06-28 | 2021-12-28 | 中国中医科学院中医临床基础医学研究所 | 一种化合物-靶蛋白绑定预测方法 |
US12099003B2 (en) | 2018-01-26 | 2024-09-24 | Viavi Solutions Inc. | Reduced false positive identification for spectroscopic classification |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002033596A2 (en) * | 2000-10-17 | 2002-04-25 | Applied Research Systems Ars Holding N.V. | Method of operating a computer system to perform a discrete substructural analysis |
WO2003058499A1 (fr) * | 2001-12-28 | 2003-07-17 | Celestar Lexico-Sciences, Inc. | Appareil et procede de recherche de connaissance, programme et support d'enregistrement associes |
-
2004
- 2004-12-24 WO PCT/JP2004/019404 patent/WO2005069188A1/ja active Application Filing
- 2004-12-24 JP JP2005516985A patent/JPWO2005069188A1/ja active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002033596A2 (en) * | 2000-10-17 | 2002-04-25 | Applied Research Systems Ars Holding N.V. | Method of operating a computer system to perform a discrete substructural analysis |
WO2003058499A1 (fr) * | 2001-12-28 | 2003-07-17 | Celestar Lexico-Sciences, Inc. | Appareil et procede de recherche de connaissance, programme et support d'enregistrement associes |
Non-Patent Citations (2)
Title |
---|
ROCHE O. ET AL: "Ligand-Protein DataBase: Linking Protein-Ligand Complex Structures to Binding Data", J.MED. CHEM., vol. 44, no. 22, 2001, pages 3592 - 3598, XP002987717 * |
XUE L. ET AL: "Molecular Descriptors for Effective Classification of Biologically Active Compounds Based on Principal Component Analysis Identified by a Genetic Algorithm", J. CHEM. INF. COMPUT. SCI., vol. 40, no. 3, 2000, pages 801 - 809, XP002987718 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007105794A1 (ja) * | 2006-03-15 | 2007-09-20 | Nec Corporation | 分子構造予測システム、方法及びプログラム |
JP5448447B2 (ja) * | 2006-05-26 | 2014-03-19 | 国立大学法人京都大学 | ケミカルゲノム情報に基づく、タンパク質−化合物相互作用の予測と化合物ライブラリーの合理的設計 |
WO2014054526A1 (ja) * | 2012-10-01 | 2014-04-10 | 独立行政法人科学技術振興機構 | 承認予測装置、承認予測方法、および、プログラム |
JP2014071836A (ja) * | 2012-10-01 | 2014-04-21 | Japan Science & Technology Agency | 承認予測装置、承認予測方法、および、プログラム |
CN107977548A (zh) * | 2017-12-05 | 2018-05-01 | 东软集团股份有限公司 | 预测蛋白质间相互作用的方法、装置、介质及电子设备 |
CN107977548B (zh) * | 2017-12-05 | 2020-04-07 | 东软集团股份有限公司 | 预测蛋白质间相互作用的方法、装置、介质及电子设备 |
US12099003B2 (en) | 2018-01-26 | 2024-09-24 | Viavi Solutions Inc. | Reduced false positive identification for spectroscopic classification |
CN110070909A (zh) * | 2019-03-21 | 2019-07-30 | 中南大学 | 一种基于深度学习的融合多特征的蛋白质功能预测方法 |
CN110070909B (zh) * | 2019-03-21 | 2022-12-09 | 中南大学 | 一种基于深度学习的融合多特征的蛋白质功能预测方法 |
CN113851195A (zh) * | 2020-06-28 | 2021-12-28 | 中国中医科学院中医临床基础医学研究所 | 一种化合物-靶蛋白绑定预测方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2005069188A1 (ja) | 2007-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Muzio et al. | Biological network analysis with deep learning | |
Ehrlich et al. | Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review | |
Vanhaelen et al. | Design of efficient computational workflows for in silico drug repurposing | |
Jónsdóttir et al. | Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates | |
Aittokallio et al. | Graph-based methods for analysing networks in cell biology | |
Sarkar et al. | CAOS software for use in character‐based DNA barcoding | |
Boulesteix et al. | Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics | |
Trevino et al. | GALGO: an R package for multivariate variable selection using genetic algorithms | |
US8949157B2 (en) | Estimation of protein-compound interaction and rational design of compound library based on chemical genomic information | |
Priya et al. | Machine learning approaches and their applications in drug discovery and design | |
EP2600269A2 (en) | Microarray sampling and network modeling for drug toxicity prediction | |
Lin et al. | Clustering methods in protein-protein interaction network | |
JP2009520278A (ja) | 科学情報知識管理のためのシステムおよび方法 | |
JP2006323846A (ja) | 高スループットデータ分析を利用して有意な分子を特定するための、ネットワークを利用した方法 | |
Srinivasan et al. | Current progress in network research: toward reference networks for key model organisms | |
Cannataro et al. | Data management of protein interaction networks | |
Bender | Bayesian methods in virtual screening and chemical biology | |
Chen et al. | PubChem BioAssays as a data source for predictive models | |
R Andersson et al. | Quantitative chemogenomics: machine-learning models of protein-ligand interaction | |
Rapicavoli et al. | Computational methods for drug repurposing | |
Rodin et al. | Systems biology data analysis methodology in pharmacogenomics | |
Juan et al. | Bioinformatics: microarray data clustering and functional classification | |
WO2005069188A1 (ja) | 化合物および蛋白質間の相互作用を予測するシステム | |
Guo et al. | TRScore: a 3D RepVGG-based scoring method for ranking protein docking models | |
Wang et al. | Multitask CapsNet: an imbalanced data deep learning method for predicting toxicants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005516985 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |