High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures
<p>Scheme of the model construction.</p> "> Figure 2
<p>Charged and uncharged forms 100 random forest (RF) models were constructed for the charged, uncharged, and both forms of each descriptor. All models were involved in predicting the activities of the estrogen receptor ligand-binding domain for the compounds in the final evaluation set. 100 ROC_AUC values were plotted for each group. Green lines denote the averages and their 95% confidence intervals.</p> "> Figure 3
<p>Number of descriptors 100 RF models were constructed for both numbers of descriptors. All models were involved in predicting the activities of estrogen receptor ligand-binding domain for compounds in the final evaluation set. 100 ROC_AUC values were plotted for each group. Green lines denote the averages and their 95% confidence intervals.</p> "> Figure 4
<p>Relationship between ROC_AUC values in models constructed from the test set (50%) and the final evaluation set. Each point denotes the performance of the model. This figure is referred from [<a href="#B9-molecules-22-00675" class="html-bibr">9</a>].</p> "> Figure 5
<p>Effects of the hyperparameter Number of Terms on the RF modeling 190 RF models were constructed in each group, and all models were then involved in predicting the activities of the estrogen receptor ligand-binding domain for compounds in the final evaluation set. Plotted are the ROC_AUC values for the final evaluation set in each group. Green lines denote the averages and their 95% confidence intervals.</p> "> Figure 6
<p>Effects of the hyperparameter Maximum Splits per Tree on the RF modeling ROC_AUC values of the training set (50%) and final evaluation set are plotted in closed and open circles, respectively. Large Maximum Splits per Tree introduced model overfitting. The predictive ability was optimized for Maximum Splits per Tree = 6.</p> "> Figure 7
<p>ROC curves for predicting ER-LBD-activating compounds with the newly proposed model (left) and the best model of the Tox21 Data Challenge 2014 ROC-AUCs and hyperparameter values in the models are also described.</p> ">
Abstract
:1. Introduction
2. Methods
2.1. Conformations and Descriptors
2.2. Construction of Predictive Models
2.3. Effects of Descriptors
Number of Descriptors
2.4. Effects of Hyperparameters
2.5. Statistical Treatment
3. Results and Discussion
3.1. Effects of Descriptors
3.2. Effects of Hyperparameters
3.3. Discrimination Potential of Improved Models
3.4. Most Important Descriptors
4. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Katzenellenbogen, B.S.; Montano, M.M.; Ediger, T.R.; Sun, J.; Ekena, K.; Lazennec, G.; Martini, P.G.; McInerney, E.M.; Delage-Mourroux, R.; Weis, K.; et al. Estrogen receptors: selective ligands, partners, and distinctive pharmacology. Recent Prog. Horm. Res. 2000, 55, 163–193. [Google Scholar] [PubMed]
- Setchell, K.D. Soy isoflavones—Benefits and risks from nature's selective estrogen receptor modulators (SERMs). J. Am. Coll. Nutr. 2001, 20, 354S–362S. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Dong, S.; Wang, H.; Tao, S.; Kiyama, R. Biological Impact of Environmental Polycyclic Aromatic Hydrocarbons (ePAHs) as Endocrine Disruptors. Environ. Pollut. 2016, 213, 809–824. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, J.H.; Sedykh, A.; Huang, R.; Xia, M.; Tice, R.R. A Data Analysis Pipeline Accounting for Artifactsin Tox21 Quantitative High-Throughput Screening Assays. J. Biomol. Screen 2015, 20, 887–897. [Google Scholar] [CrossRef] [PubMed]
- United Environmental Protection Agency. Toxicology Testing in the 21st Century (Tox21). Available online: http://www.epa.gov/chemical-research/toxicology-testing-21st-century-tox21 (accessed on 16 April 2017).
- Attene-Ramos, M.S.; Miller, N.; Huang, R.; Michael, S.; Itkin, M.; Kavlock, R.J.; Austin, C.P.; Shinn, P.; Simeonov, A.; Tice, R.R.; et al. The Tox21 Robotic Platform for the Assessment of Environmental Chemicals-From Vision to Reality. Drug Discov. Today 2013, 18, 716–723. [Google Scholar] [CrossRef] [PubMed]
- Gohlke, J.M.; Thomas, R.; Zhang, Y.; Rosenstein, M.C.; Davis, A.P.; Murphy, C.; Becker, K.G.; Mattingly, C.J.; Portier, C.J. Genetic and environmental pathways to complex diseases. BMC Syst. Biol. 2009, 3, 46. [Google Scholar] [CrossRef] [PubMed]
- National Center for Advancing Translational Sciences. Tox21 Data Challenge 2014. Available online: https://tripod.nih.gov/tox21/challenge/index.jsp (accessed on 16 April 2017).
- Uesawa, Y. Rigorous Selection of Random Forest Models for Identifying Compounds that Activate Toxicity-Related Pathways. Front. Environ. Sci. 2016, 4. [Google Scholar] [CrossRef]
- Mansouri, K.; Abdelaziz, A.; Rybacka, A.; Roncaglioni, A.; Tropsha, A.; Varnek, A.; Zakharov, A.; Worth, A.; Richard, A.M.; Grulke, C.M.; et al. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ. Health Perspect. 2016, 124, 1023–1033. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef] [PubMed]
- Zhu, P.P.; Li, W.C.; Zhong, Z.J.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol. Biosyst. 2015, 11, 558–563. [Google Scholar] [CrossRef] [PubMed]
- Ding, C.; Yuan, L.F.; Guo, S.H.; Lin, H.; Chen, W. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J. Proteom. 2012, 77, 321–328. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Chen, W.; Yuan, L.F.; Li, Z.Q.; Ding, H. Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor. 2013, 61, 259–268. [Google Scholar] [CrossRef] [PubMed]
- Tang, H.; Su, Z.D.; Wei, H.H.; Chen, W.; Lin, H. Prediction of cell-penetrating peptides with feature selection techniques. Biochem. Biophys. Res. Commun. 2016, 477, 150–154. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Li, Q.Z. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci. 2011, 130, 91–100. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Zhao, X.; Zou, Q.; Liu, B.; Liu, X. Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteom. 2014, 11, 289–299. [Google Scholar] [CrossRef]
- Liao, Z.; Ju, Y.; Zou, Q. Prediction of G-protein-coupled receptors with SVM-Prot features and random forest. Scientifica 2016. [Google Scholar] [CrossRef] [PubMed]
- Chemical Computing Group. MOE: Molecular Operating Environment. Available online: http://www.chemcomp.com/ (accessed on 16 April 2017).
- ChemAxon Kft. Budapest, Hungary. Available online: http://www.chemaxon.com (accessed on 16 April 2017).
- SAS. JMP. Available online: http://www.jmp.com/ja_jp/home.html (accessed on 16 April 2017).
- Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition. Biomed. Res. Int. 2016, 2016, 5413903. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.J.; Tang, H.; Li, W.C.; Lin, H.; Chen, W.; Chou, K.C. iOri-Human: Identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2016, 7, 69783–69793. [Google Scholar] [CrossRef] [PubMed]
- Ding, H.; Li, D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2015, 47, 329–333. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J. Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein. Pept. Lett. 2008, 15, 739–744. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Ding, C.; Song, Q.; Yang, P.; Ding, H.; Deng, K.J.; Chen, W. The prediction of protein structural class using averaged chemical shifts. J. Biomol. Struct. Dyn. 2012, 29, 643–649. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.C.; Zhang, C.T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 1995, 30, 275–349. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Ding, H.; Guo, F.B.; Huang, J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol. Divers. 2010, 14, 667–671. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Chen, W. Prediction of thermophilic proteins using feature selection technique. J. Microbiol. Methods 2011, 84, 67–70. [Google Scholar] [CrossRef] [PubMed]
- Yuan, L.F.; Ding, C.; Guo, S.H.; Ding, H.; Chen, W.; Lin, H. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicol. In Vitro 2013, 27, 852–856. [Google Scholar] [CrossRef] [PubMed]
- Ding, H.; Feng, P.M.; Chen, W.; Lin, H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol. Biosyst. 2014, 10, 2229–2235. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition. Biomed. Res. Int. 2016, 2016, 1654623. [Google Scholar] [CrossRef] [PubMed]
- Ding, H.; Luo, L.; Lin, H. Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition. Protein Pept. Lett. 2009, 16, 351–355. [Google Scholar] [CrossRef] [PubMed]
- Rao, J.N.; Scott, A.J. On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data. Ann. Stat. 1984, 12, 46–60. [Google Scholar] [CrossRef]
- List of Molecular Descriptors Calculated by Dragon. Available online: http://www.talete.mi.it/products/dragon_molecular_descriptor_list.pdf (accessed on 16 April 2017).
Sample Availability: Not Available. |
Descriptor | Meaning | Software | State | Sprit | G2 | Sprit Ranking | G2 Ranking |
---|---|---|---|---|---|---|---|
SpMin1_Bh(m) | smallest eigenvalue n. 1 of Burden matrix weighted by mass | Dragon | Uncharged | 28.1 | 38.5 | 1 | 1 |
SpMin1_Bh(m) | smallest eigenvalue n. 1 of Burden matrix weighted by mass | Dragon | Charged | 17.2 | 21.1 | 2 | 2 |
SpMin1_Bh(s) | smallest eigenvalue n. 1 of Burden matrix weighted by I-state | Dragon | Uncharged | 9.3 | 11.4 | 6 | 3 |
SpMin1_Bh(i) | smallest eigenvalue n. 1 of Burden matrix weighted by ionization potential | Dragon | Uncharged | 5.0 | 5.7 | 13 | 9 |
nArOH | number of aromatic hydroxyls | Dragon | Charged | 17.0 | 10.5 | 3 | 4 |
nArOH | number of aromatic hydroxyls | Dragon | Uncharged | 13.3 | 7.7 | 4 | 6 |
O-057 | phenol / enol / carboxyl OH | Dragon | Charged | 13.1 | 7.8 | 5 | 5 |
Chi_Dt | Randic-like index from detour matrix | Dragon | Charged | 5.8 | 6.1 | 8 | 7 |
CATS2D_03_LL | CATS2D Lipophilic-Lipophilic at lag 03 | Dragon | Charged | 4.8 | 6.0 | 14 | 8 |
CATS2D_05_LL | CATS2D Lipophilic-Lipophilic at lag 05 | Dragon | Charged | 5.6 | 2.7 | 9 | 19 |
logd(pH = 5.5) | Lipophilicity under pH = 5.5 condition | Marvin | - | 5.5 | 5.7 | 10 | 10 |
vsurf_HB7 | H-bond donor capacity 7 | MOE | Charged | 6.5 | 3.2 | 7 | 17 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Asako, Y.; Uesawa, Y. High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures. Molecules 2017, 22, 675. https://doi.org/10.3390/molecules22040675
Asako Y, Uesawa Y. High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures. Molecules. 2017; 22(4):675. https://doi.org/10.3390/molecules22040675
Chicago/Turabian StyleAsako, Yuki, and Yoshihiro Uesawa. 2017. "High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures" Molecules 22, no. 4: 675. https://doi.org/10.3390/molecules22040675
APA StyleAsako, Y., & Uesawa, Y. (2017). High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures. Molecules, 22(4), 675. https://doi.org/10.3390/molecules22040675