[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3233547.3233607acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Graphic Encoding of Macromolecules for Efficient High-Throughput Analysis

Published: 15 August 2018 Publication History

Abstract

The function of a protein depends on its three-dimensional structure. Current approaches based on homology for predicting a given protein's function do not work well at scale. In this work, we propose a representation of proteins that explicitly encodes secondary and tertiary structure into fix-sized images. In addition, we present a neural network architecture that exploits our data representation to perform protein function prediction. We validate the effectiveness of our encoding method and the strength of our neural network architecture through a 5-fold cross validation over roughly 63 thousand images, achieving an accuracy of 80% across 8 distinct classes. Our novel approach of encoding and classifying proteins is suitable for real-time processing, leading to high-throughput analysis.

References

[1]
M. Ashburner, CA. Ball, JA. Blake, D. Botstein, H. Butler, JM. Cherry, AP. Davis, K. Dolinski, SS. Dwight, JT. Eppig, MA. Harris, DP. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, JC. Matese, JE. Richardson, M. Ringwald, GM. Rubin, and G. Sherlock . 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet., Vol. 25, 1 (2000).
[2]
David Barkan . 2002. A Parallel Implementation of the Needleman-Wunsch Algorithm for Global Gapped Pair-wise Alignment. J. Comput. Sci. Coll. Vol. 17, 6 (May . 2002), 238--239. http://dl.acm.org/citation.cfm"id=775742.775778
[3]
Helen Berman, Kim Henrick, and Haruki Nakamura . 2003. Announcing the worldwide Protein Data Bank. Nature Structural Biology Vol. 980, 10 (2003).
[4]
Helen Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne . 2000. Nucleic Acids Research. Nature Structural Biology Vol. 28, 1 (2000).
[5]
Marenglen Biba, Floriana Esposito, Stefano Ferilli, Teresa M. A. Basile, and Nicola Di Mauro . 2007. Multi-class Protein Fold Recognition Through a Symbolic-Statistical Framework. Springer Berlin Heidelberg, Berlin, Heidelberg, 666--673.
[6]
Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang, and Zhangxin Chen . 2017. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. arXiv:1710.07016 {cs, q-bio} (Oct. . 2017). http://arxiv.org/abs/1710.07016 arXiv: 1710.07016.
[7]
Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, and Thomas Blaschke . 2018. The rise of deep learning in drug discovery. Drug Discovery Today (Jan. . 2018).
[8]
The Gene Ontology Consortium Gene Ontology Consortium. http://www.geneontology.org/. (. ????).
[9]
The Gene Ontology Consortium . 2017. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res., Vol. 4, 45 (2017).
[10]
Isaac Elias . 2006. Settling the intractability of multiple alignment. J Comput Biol, Vol. 13, 7 (2006), 1323--1339.
[11]
Leif Ellingson and Jinfeng Zhang . 2011. An Efficient Algorithm for Matching Protein Binding Sites for Protein Function Prediction Proceedings of the 2Nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '11). ACM, New York, NY, USA, 289--293.
[12]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (04 . 2017).
[13]
Michael R. Garey and David S. Johnson . 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA.
[14]
Apostol Gramada and Philip E. Bourne . 2006. Multipolar representation of protein structure. BMC Bioinformatics, Vol. 67, 242 (2006).
[15]
Jie Hou, Badri Adhikari, and Jianlin Cheng . 2018. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, Vol. 34, 8 (April . 2018), 1295--1303.
[16]
Jingtong Hou, Gregory E. Sims, Chao Zhang, and Sung-Hou Kim . 2002. A global representation of the protein fold space. PNAS, Vol. 100, 5 (2002).
[17]
Eugene Ie, Jason Weston, William Stafford Noble, and Christina Leslie . 2005. Multi-class Protein Fold Recognition Using Adaptive Codes Proceedings of the 22Nd International Conference on Machine Learning (ICML '05). ACM, New York, NY, USA, 329--336.
[18]
Sungchul Kim, Sael Lee, and Hwanjo Yu . 2012. Indexing Methods for Efficient Protein 3D Surface Search Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO '12). ACM, New York, NY, USA, 41--48.
[19]
N. Kolker, R. Higdon, W. Broomall, L. Stanberry, D. Welch, W. Lu, W. Haynes, R. Barga, and E. Kolker . 2011. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins. OMICS, Vol. 15, 513 (2011).
[20]
Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf, and Jonathan Wren . 2018. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, Vol. 34, 4 (Feb. . 2018), 660--668.
[21]
Haiou Li, Jie Hou, Badri Adhikari, Qiang Lyu, and Jianlin Cheng . 2017. Deep learning methods for protein torsion angle prediction. BMC Bioinformatics Vol. 18 (Sept. . 2017), 417.
[22]
Zhen Li and Yizhou Yu . 2016. Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks (IJCAI'16). AAAI Press, New York, New York, USA, 2560--2567. http://dl.acm.org/citation.cfm?id=3060832.3060979
[23]
Michael N. Liebman, Carol A. Venanzi, and Harel Weinstein . 1985. Structural analysis of carboxypeptidase A and its complexes with inhibitors as a basis for modeling enzyme recognition and specificity. Biopolymers, Vol. 24, 9 (1985), 1721--1758.
[24]
Xueliang Liu . 2017. Deep Recurrent Neural Network for Protein Function Prediction from Sequence. arXiv:1701.08318 {cs, q-bio, stat} (Jan. . 2017). http://arxiv.org/abs/1701.08318 arXiv: 1701.08318.
[25]
Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz . 2016. Low-Rank Methods for Parallelizing Dynamic Programming Algorithms. ACM Trans. Parallel Comput. Vol. 2, 4, Article bibinfoarticleno26 (Feb. . 2016), 32 pages.
[26]
Richard J. Morris, Rafael J. Najmanovich, Abdullah Kahraman, and Janet M. Thornton . 2005. Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, Vol. 21, 10 (2005).
[27]
Yukari Nakamura, Ayaka Kaneko, and Takayuki Itoh . 2011. An Accelerated Pocket Extraction and Evaluation Technique for Druggability Analysis with Protein Surfaces. In SIGGRAPH Asia 2011 Posters (SA '11). ACM, New York, NY, USA, Article bibinfoarticleno31, 1 pages.
[28]
Saul B. Needleman and Christian D. Wunsch . 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology Vol. 48, 3 (1970), 443 -- 453.
[29]
S. P. Nguyen, Z. Li, D. Xu, and Y. Shang . 2017. New Deep Learning Methods for Protein Loop Modeling. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2017), 1--1.
[30]
M Novic and M Randic . 2008. Representation of proteins as walks in 20-D space. SAR QSAR Environ Res Vol. 19, 3 (2008).
[31]
T. Ooi and K. Nishikawa . 1973. Conformation of Biological Molecules and Polymers. E. D. and Pullman, B., Eds. (1973), 173--187.
[32]
Margarita Osadchy and Rachel Kolodny . 2011. Maps of protein structure space reveal a fundamental relationship between protein structure and function. Biophysics and Computational Biology Vol. 108, 30 (2011).
[33]
Kuldip Paliwal, James Lyons, and Rhys Heffernan . 2015. A Short Review of Deep Learning Neural Networks in Protein Structure Prediction Problems. Advanced Techniques in Biology & Medicine Vol. 3, 3 (Sept. . 2015), 1--2.
[34]
G.N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan . 1963. Multipolar representation of protein structure. Journal of Molecular Biology Vol. 7, 95 (1963).
[35]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei . 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), Vol. 115, 3 (2015), 211--252.
[36]
Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. (2014).
[37]
T.F. Smith and M.S. Waterman . 1981. Identification of common molecular subsequences. Journal of Molecular Biology Vol. 147, 1 (1981), 195 -- 197.
[38]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna . 2015. Rethinking the Inception Architecture for Computer Vision. CoRR Vol. abs/1512.00567 (2015).
[39]
Sheng Wang and Jinbo Xu . 2017. De Novo Protein Structure Prediction by Big Data and Deep Learning. Biophysical Journal Vol. 112, 3 (Feb. . 2017), 55a.
[40]
Yong Wang, Wu Ling-Yun, Ji-Hong Zhang, Zhong-Wei Zhan, Zhang Xiang-Sun, and Chen Luonan . 2009. Evaluating Protein Similarity from Coarse Structures. IEEE/ACM Trans. Comput. Biol. Bioinformatics, Vol. 6, 4 (Oct. . 2009), 583--593.
[41]
J.C. Whisstock and A.M. Lesk . 2003. Prediction of protein function from protein sequence and structure. Q Rev Biophys, Vol. 36, 3 (2003).
[42]
B. Zhang, T. Estrada, P. Cicotti, P. Balaji, and M. Taufer . 2015. Accurate Scoring of Drug Conformations at the Extreme Scale 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 817--822.
[43]
Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, and Michela Taufer . 2017 a. Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers. Parallel Comput. Vol. 63 (2017), 38 -- 60.
[44]
Mengying Zhang, Qiang Su, Yi Lu, Manman Zhao, and Bing Niu . 2017 b. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Medicinal Chemistry (Shariqah (United Arab Emirates)), Vol. 13, 6 (2017), 506--514.

Cited By

View all
  • (2022)Cronus: Computer Vision-based Machine Intelligent Hybrid Memory ManagementProceedings of the 2022 International Symposium on Memory Systems10.1145/3565053.3565063(1-11)Online publication date: 3-Oct-2022
  • (2022)Geometric Deep Learning for Protein–Protein Interaction PredictionsIEEE Access10.1109/ACCESS.2022.320154310(90045-90055)Online publication date: 2022
  • (2021)A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational ChangesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2019.294529118:4(1336-1349)Online publication date: 1-Jul-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2018
727 pages
ISBN:9781450357944
DOI:10.1145/3233547
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. protein function prediction
  3. protein representation
  4. structural biology

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '18
Sponsor:

Acceptance Rates

BCB '18 Paper Acceptance Rate 46 of 148 submissions, 31%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)17
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Cronus: Computer Vision-based Machine Intelligent Hybrid Memory ManagementProceedings of the 2022 International Symposium on Memory Systems10.1145/3565053.3565063(1-11)Online publication date: 3-Oct-2022
  • (2022)Geometric Deep Learning for Protein–Protein Interaction PredictionsIEEE Access10.1109/ACCESS.2022.320154310(90045-90055)Online publication date: 2022
  • (2021)A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational ChangesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2019.294529118:4(1336-1349)Online publication date: 1-Jul-2021
  • (2020) A survey of algorithms for transforming molecular dynamics data into metadata for in situ analytics based on machine learning methods Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0063378:2166(20190063)Online publication date: 20-Jan-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media