Machine Learning Framework for Conotoxin Class and Molecular Target Prediction
"> Figure 1
<p>Example structures of the alpha, mu, kappa, and omega classes (pharmacological families) of conotoxins. The backbone structures are shown in a pink cartoon representation. Disulfide bridges are shown in yellow. Class, toxin name, and mass are given below each structure. PDB references are 1MXN [<a href="#B7-toxins-16-00475" class="html-bibr">7</a>], 7SAV [<a href="#B16-toxins-16-00475" class="html-bibr">16</a>], 1DW4 [<a href="#B17-toxins-16-00475" class="html-bibr">17</a>], and 1AV3 [<a href="#B18-toxins-16-00475" class="html-bibr">18</a>] clockwise from top left.</p> "> Figure 2
<p>Samples of different conotoxin classes bound to their target receptors. (<b>a</b>) Alpha conotoxin PNIA (PDB: 2BR8 [<a href="#B19-toxins-16-00475" class="html-bibr">19</a>]) bound to the Acetylcholine binding protein (AChBP). To the left, a complex structure shows the toxin in pink, its disulfide bonds in yellow, and the AChBP in silver. To the right, circles are zoomed-in to show the same binding site, but the bottom circle shows a transparent receptor to more easily see the conotoxin conformation. (<b>b</b>) Mu conotoxin KIIIA (PDB: 6J8E [<a href="#B20-toxins-16-00475" class="html-bibr">20</a>]) bound to the voltage gated sodium channel Nav1.2-beta2, with the right showing similar zoomed in perspectives as (<b>a</b>). (<b>c</b>) Omega conotoxin MVIIA (PDB: 7MIX [<a href="#B21-toxins-16-00475" class="html-bibr">21</a>]), marketed as ziconotide, is shown in its complex with the voltage gated calcium channel Cav2.2. The center structure is the conotoxin/ion channel complex with a zoomed-in view of the bound toxin displayed in ribbon representation (<b>left</b>) and a zoomed-in view showing the receptor (transparent) and the conotoxin in a surface representation to illustrate the tight, key-like fit of the toxin binding site (<b>right</b>).</p> "> Figure 3
<p>Comparison plots of f1 scores obtained from different ML models for the different feature sets and feature set combinations in predicting alpha, mu, and omega conotoxin classes using different ML models.</p> "> Figure 4
<p>(<b>a</b>) A cartoon representation showing how SMOTE-Tomek works together to handle imbalanced datasets. Top left, a mixture of classes, orange squares, and blue circles. The orange squares are underrepresented relative to the circles. Top right, SMOTE produces additional square entries, shown in green, by interpolating between the existing data. Bottom left, Tomek determines pairs for square and circle data (red circle) that are at the boundary between the circle and square classes. Bottom right, entries belonging to the more represented class in the Tomek pairs are removed, and a more evenly balanced and clearly separated training set has been produced. (<b>b</b>) Overall ML pipeline describing the process of using a dataset to train and cross validate a classifier.</p> ">
Abstract
:1. Introduction
2. Results
2.1. Construction of Datasets
2.2. Feature Extraction and Selection
2.3. Conotoxin Class Prediction
2.4. Prediction of Conotoxins That Target nAChRs
3. Discussion
4. Materials and Methods
4.1. Construction of Datasets
4.2. Feature Extraction
4.3. Dimensionality Reduction Procedures
4.3.1. F-Score
4.3.2. Redundant Feature Elimination
4.3.3. Principle Component Analysis
4.3.4. Regularization
4.4. Classifiers
4.5. SMOTE-Tomek
4.6. Performance Evaluation
4.7. Machine Learning Pipeline
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lewis, R.J. Conotoxins as Selective Inhibitors of Neuronal Ion Channels, Receptors and Transporters. IUBMB Life 2004, 56, 89–93. [Google Scholar] [CrossRef] [PubMed]
- Akondi, K.B.; Muttenthaler, M.; Dutertre, S.; Kaas, Q.; Craik, D.J.; Lewis, R.J.; Alewood, P.F. Discovery, Synthesis, and Structure–Activity Relationships of Conotoxins. Chem. Rev. 2014, 114, 5815–5847. [Google Scholar] [CrossRef] [PubMed]
- Olivera, B.M. Conus Peptides: Biodiversity-Based Discovery and Exogenomics. J. Biol. Chem. 2006, 281, 31173–31177. [Google Scholar] [CrossRef]
- McGivern, J.G. Ziconotide: A Review of Its Pharmacology and Use in the Treatment of Pain. Neuropsychiatr. Dis. Treat. 2007, 3, 69–85. [Google Scholar] [CrossRef] [PubMed]
- Krewski, D.; Acosta, D., Jr.; Andersen, M.; Anderson, H.; Bailar, J.C., III; Boekelheide, K.; Brent, R.; Charnley, G.; Cheung, V.G.; Green, S., Jr.; et al. Toxicity Testing in the 21st Century: A Vision and a Strategy. J. Toxicol. Environ. Health B Crit. Rev. 2010, 13, 51–138. [Google Scholar] [CrossRef]
- Monroe, L.K.; Truong, D.P.; Miner, J.C.; Adikari, S.; Sasiene, Z.J.; Fenimore, P.W.; Alexandrov, B.; Williams, R.F.; Nguyen, H.B. Conotoxin Prediction: New Features to Increase Prediction Accuracy. Toxins 2023, 15, 641. [Google Scholar] [CrossRef]
- Dutton, J.L.; Bansal, P.S.; Hogg, R.C.; Adams, D.J.; Alewood, P.F.; Craik, D.J. A New Level of Conotoxin Diversity, a Non-Native Disulfide Bond Connectivity in A-Conotoxin AuIB Reduces Structural Definition but Increases Biological Activity. J. Biol. Chem. 2002, 277, 48849–48857. [Google Scholar] [CrossRef]
- Chi, S.W.; Kim, D.H.; Olivera, B.M.; McIntosh, J.M.; Han, K.H. NMR Structure Determination of Alpha-Conotoxin BuIA, a Novel Neuronal Nicotinic Acetylcholine Receptor Antagonist with an Unusual 4/4 Disulfide Scaffold. Biochem. Biophys. Res. Commun. 2006, 349, 1228–1234. [Google Scholar] [CrossRef]
- Jin, A.H.; Brandstaetter, H.; Nevin, S.T.; Tan, C.C.; Clark, R.J.; Adams, D.J.; Alewood, P.F.; Craik, D.J.; Daly, N.L. Structure of A-Conotoxin BuIA: Influences of Disulfide Connectivity on Structural Dynamics. BMC Struct. Biol. 2007, 7, 28. [Google Scholar] [CrossRef]
- Gehrmann, J.; Alewood, P.F.; Craik, D.J. Structure Determination of the Three Disulfide Bond Isomers of A-Conotoxin GI: A Model for the Role of Disulfide Bonds in Structural Stability. J. Mol. Biol. 1998, 278, 401–415. [Google Scholar] [CrossRef]
- Kaas, Q.; Westermann, J.C.; Craik, D.J. Conopeptide Characterization and Classifications: An Analysis Using Conoserver. Toxicon 2010, 55, 1491–1509. [Google Scholar] [CrossRef] [PubMed]
- Yuan, L.F.; Ding, C.; Guo, S.H.; Ding, H.; Chen, W.; Lin, H. Prediction of the Types of Ion Channel-Targeted Conotoxins Based on Radial Basis Function Network. Toxicol. Vitr. 2013, 27, 852–856. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Wang, J.; Wang, X.; Zhang, Y. Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model. BioMed Res. Int. 2017, 2017, 2929807. [Google Scholar]
- Vu, T.T.D.; Jung, J. Protein Function Prediction with Gene Ontology: From Traditional to Deep Learning Models. PeerJ 2021, 9, e12019. [Google Scholar] [CrossRef]
- Dao, F.Y.; Yang, H.; Su, Z.D.; Yang, W.; Wu, Y.; Ding, H.; Chen, W.; Tang, H.; Lin, H. Recent Advances in Conotoxin Classification by Using Machine Learning Methods. Molecules 2017, 22, 1057. [Google Scholar] [CrossRef]
- Ho Thanh Lam, L.; Le, N.H.; Van Tuan, L.; Tran Ban, H.; Nguyen Khanh Hung, T.; Nguyen, N.T.K.; Huu Dang, L.; Le, N.Q.K. Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences. Biology 2020, 9, 325. [Google Scholar] [CrossRef]
- Atkinson, R.A.; Kieffer, B.; Dejaegere, A.; Sirockin, F.; Lefèvre, J. Structural and Dynamic Characterization of Ω-Conotoxin MVIIA: The Binding Loop Exhibits Slow Conformational Exchange. Biochemistry 2000, 39, 3908–3919. [Google Scholar] [CrossRef]
- Scanlon, M.J.; Naranjo, D.; Thomas, L.; Alewood, P.F.; Lewis, R.J.; Craik, D.J. Solution Structure and Proposed Binding Mechanism of a Novel Potassium Channel Toxin Κ-Conotoxin PVIIA. Structure 1997, 5, 1585–1597. [Google Scholar] [CrossRef]
- Celie, P.H.; Kasheverov, I.E.; Mordvintsev, D.Y.; Hogg, R.C.; van Nierop, P.; van Elk, R.; van Rossum-Fikkert, S.E.; Zhmak, M.N.; Bertrand, D.; Tsetlin, V.; et al. Crystal Structure of Nicotinic Acetylcholine Receptor Homolog Achbp in Complex with an A-Conotoxin PnIA Variant. Nat. Struct. Mol. Biol. 2005, 12, 582–588. [Google Scholar] [CrossRef]
- Pan, X.; Li, Z.; Huang, X.; Huang, G.; Gao, S.; Shen, H.; Liu, L.; Lei, J.; Yan, N. Molecular Basis for Pore Blockade of Human Na+ Channel Nav1. 2 by the Μ-Conotoxin KIIIA. Science 2019, 363, 1309–1313. [Google Scholar] [CrossRef]
- Gao, S.; Yao, X.; Yan, N. Structure of Human Cav2. 2 Channel Blocked by the Painkiller Ziconotide. Nature 2021, 596, 143–147. [Google Scholar] [CrossRef] [PubMed]
- Bro, R.; Smilde, A.K. Principal Component Analysis. Anal. Meth. 2014, 6, 2812–2831. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Zeng, M.; Zou, B.; Wei, F.; Liu, X.; Wang, L. Effective Prediction of Three Common Diseases by Combining Smote with Tomek Links Technique for Imbalanced Medical Data. In Proceedings of the 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 28–29 May 2016; pp. 225–228. [Google Scholar]
- Berman, H.M.; Henrick, K.; Nakamura, H. Announcing the Worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 2003, 10, 980. [Google Scholar] [CrossRef]
- Hoch, J.C.; Baskaran, K.; Burr, H.; Chin, J.; Eghbalnia, H.R.; Fujiwara, T.; Gryk, M.R.; Iwata, T.; Kojima, C.; Kurisu, G.; et al. Biological Magnetic Resonance Data Bank. Nucleic Acids Res. 2023, 51, D368–D376. [Google Scholar] [CrossRef]
- Touw, W.G.; Baakman, C.; Black, J.; Te Beek, T.A.; Krieger, E.; Joosten, R.P.; Vriend, G. A Series of Pdb-Related Databanks for Everyday Needs. Nucleic Acids Res. 2015, 43, D364–D368. [Google Scholar] [CrossRef]
- Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef]
- Heerdt, G.; Zanotto, L.; Souza, P.C.; Araujo, G.; Skaf, M.S. Collision Cross Section Calculations Using HPCCS. In Ion Mobility-Mass Spectrometry: Methods and Protocols; Paglia, G., Astarita, G., Eds.; Methods in Molecular Biology; Humana: New York, NY, USA, 2020; pp. 297–310. [Google Scholar]
- Li, Y.; Wu, F.X.; Ngom, A. A Review on Machine Learning Principles for Multi-View Biological Data Integration. Brief. Bioinform. 2018, 19, 325–340. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
- Vapnik, V.; Suykens, J.A.K.; Vandewalle, J. Nonlinear Modeling: Advanced Black-Box Techniques; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Le, K.; Radović, J.R.; MacCallum, J.L.; Larter, S.R.; Van Humbeck, J.F. Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets. J. Am. Chem. Soc. 2024, 146, 22563–22569. [Google Scholar] [CrossRef]
- Coudert, E.; Gehant, S.; de Castro, E.; Pozzato, M.; Baratin, D.; Neto, T.; Sigrist, C.J.; Redaschi, N.; Bridge, A. Annotation of Biologically Relevant Ligands in Uniprotkb Using Chebi. Bioinformatics 2023, 39, btac793. [Google Scholar] [CrossRef]
- Reddy, G.T.; Reddy, M.P.K.; Lakshmanna, K.; Kaluri, R.; Rajput, D.S.; Srivastava, G.; Baker, T. Analysis of Dimensionality Reduction Techniques on Big Data. IEEE Access 2020, 8, 54776–54788. [Google Scholar] [CrossRef]
- Appice, A.; Ceci, M.; Rawles, S.; Flach, P. Redundant Feature Elimination for Multi-Class Problems. In Proceedings of the 21st International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 5. [Google Scholar]
Datasets | Sample Sizes |
---|---|
alpha/mu/omega | 98 alpha/29 mu/21 omega |
nAChRs/non-nAChRs | 102 nAChR binders */82 non-nAChR binders |
F-Score PLR | F-Score SVM | SMOTE PLR | SMOTE-Tomek PLR | SMOTE-Tomek PCA PLR | SMOTE-Tomek PCA RF | SMOTE-Tomek PCA xGB | |
---|---|---|---|---|---|---|---|
P | 0.8920 | 0.8988 | 0.9520 | 0.9524 | 0.9520 | 0.9071 | 0.9391 |
SS | 0.8732 | 0.8934 | 0.8757 | 0.8902 | 0.8981 | 0.8948 | 0.8407 |
SS + CCS | 0.8598 | 0.8818 | 0.8621 | 0.8964 | 0.8823 | 0.8895 | 0.8902 |
P + CCS | 0.8897 | 0.8814 | 0.9492 | 0.9528 | 0.9459 | 0.9083 | 0.9311 |
P + SS | 0.9307 | 0.9112 | 0.9459 | 0.9524 | 0.9590 | 0.9083 | 0.9455 |
P + SS + CCS | 0.8965 | 0.8965 | 0.9519 | 0.9520 | 0.9453 | 0.9013 | 0.9237 |
P + P2 | 0.8976 | 0.8816 | 0.9449 | 0.9311 | 0.9377 | 0.9016 | 0.9131 |
SS + P2 | 0.9376 | 0.9244 | 0.9098 | 0.9100 | 0.9116 | 0.8935 | 0.8746 |
CCS + SS + P2 | 0.9379 | 0.9447 | 0.9173 | 0.9022 | 0.9177 | 0.8941 | 0.8912 |
P + SS + CCS + P2 | 0.9107 | 0.9306 | 0.9421 | 0.9449 | 0.9377 | 0.9149 | 0.8976 |
OA | AA | Sn-Alpha | Sn-Mu | Sn-Omega | f1 | |
---|---|---|---|---|---|---|
P | 0.9527 | 0.9189 | 0.9898 | 0.8621 | 0.9048 | 0.9520 |
SS | 0.8986 | 0.8674 | 0.9388 | 0.7586 | 0.9048 | 0.8981 |
SS + CCS | 0.8851 | 0.8363 | 0.9490 | 0.6552 | 0.9048 | 0.8823 |
P + CCS | 0.9459 | 0.9236 | 0.9694 | 0.8966 | 0.9048 | 0.9459 |
P + SS | 0.9595 | 0.9304 | 0.9898 | 0.8966 | 0.9048 | 0.9590 |
P + SS + CCS | 0.9459 | 0.9155 | 0.9796 | 0.8621 | 0.9048 | 0.9453 |
P + P2 | 0.9392 | 0.8959 | 0.9898 | 0.7931 | 0.9048 | 0.9377 |
SS + P2 | 0.9122 | 0.8742 | 0.9592 | 0.7586 | 0.9048 | 0.9116 |
CCS + SS + P2 | 0.9189 | 0.8776 | 0.9694 | 0.7586 | 0.9048 | 0.9177 |
P + SS + CCS + P2 | 0.9392 | 0.8959 | 0.9898 | 0.7931 | 0.9048 | 0.9377 |
SMOTE-Tomek PLR | SMOTE-Tomek PCA PLR | SMOTE-Tomek SVM | SMOTE-Tomek PCA SVM | SMOTE-Tomek PCA RF | SMOTE-Tomek PCA xGB | |
---|---|---|---|---|---|---|
P | 0.9024 | 0.9078 | 0.9077 | 0.8968 | 0.8241 | 0.8970 |
SS | 0.8474 | 0.8743 | 0.8527 | 0.8580 | 0.8860 | 0.8747 |
SS + CCS | 0.8743 | 0.8743 | 0.8690 | 0.8636 | 0.8697 | 0.8751 |
P + CCS | 0.9078 | 0.8969 | 0.9077 | 0.9131 | 0.8796 | 0.8862 |
P + SS | 0.8807 | 0.8860 | 0.9022 | 0.9022 | 0.8393 | 0.8970 |
P + SS + CCS | 0.8915 | 0.8915 | 0.8967 | 0.9076 | 0.8341 | 0.8807 |
P + P2 | 0.9023 | 0.8914 | 0.8965 | 0.9076 | 0.8912 | 0.9133 |
SS + P2 | 0.8743 | 0.8577 | 0.8524 | 0.8631 | 0.8570 | 0.8858 |
CCS + SS + P2 | 0.8468 | 0.8690 | 0.8524 | 0.8687 | 0.8796 | 0.8588 |
P + SS + CCS + P2 | 0.8859 | 0.8914 | 0.8856 | 0.8856 | 0.8634 | 0.8916 |
OA | AA | Sn-nAChR Binders | Sn-nAChR Non Binders | f1 | |
---|---|---|---|---|---|
P | 0.8967 | 0.8961 | 0.8902 | 0.9020 | 0.8968 |
SS | 0.8587 | 0.8534 | 0.8049 | 0.9020 | 0.8580 |
SS + CCS | 0.8641 | 0.8595 | 0.8171 | 0.9020 | 0.8636 |
P + CCS | 0.9130 | 0.9132 | 0.9146 | 0.9118 | 0.9131 |
P + SS | 0.9022 | 0.9010 | 0.8902 | 0.9118 | 0.9022 |
P + SS + CCS | 0.9076 | 0.9059 | 0.8902 | 0.9216 | 0.9076 |
P + P2 | 0.8967 | 0.8937 | 0.8659 | 0.9216 | 0.8965 |
SS + P2 | 0.8641 | 0.8571 | 0.7927 | 0.9216 | 0.8631 |
CCS + SS + P2 | 0.8696 | 0.8632 | 0.8049 | 0.9216 | 0.8687 |
P + SS + CCS + P2 | 0.8859 | 0.8827 | 0.8537 | 0.9118 | 0.8856 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Truong, D.P.; Monroe, L.K.; Williams, R.F.; Nguyen, H.B. Machine Learning Framework for Conotoxin Class and Molecular Target Prediction. Toxins 2024, 16, 475. https://doi.org/10.3390/toxins16110475
Truong DP, Monroe LK, Williams RF, Nguyen HB. Machine Learning Framework for Conotoxin Class and Molecular Target Prediction. Toxins. 2024; 16(11):475. https://doi.org/10.3390/toxins16110475
Chicago/Turabian StyleTruong, Duc P., Lyman K. Monroe, Robert F. Williams, and Hau B. Nguyen. 2024. "Machine Learning Framework for Conotoxin Class and Molecular Target Prediction" Toxins 16, no. 11: 475. https://doi.org/10.3390/toxins16110475
APA StyleTruong, D. P., Monroe, L. K., Williams, R. F., & Nguyen, H. B. (2024). Machine Learning Framework for Conotoxin Class and Molecular Target Prediction. Toxins, 16(11), 475. https://doi.org/10.3390/toxins16110475