A parallel classification framework for protein fold recognition

Elham Hekmatnia¹,
Hedieh Sajedi² &
Ali Habib Agahi³

193 Accesses
1 Citation
Explore all metrics

Abstract

Proteins’ tertiary structure, which is determined by its amino acid sequence via the protein folding process, have essential role in the function of protein. Protein fold recognition is one of the interesting studies in bioinformatics. In this paper, to address this issue, we propose a Feature Selection (FS) method based on Map_Reduce framework and Vortex Search Algorithm (VSA). FS is one of the most important steps of pre-processing data, which aims to select a variable subset of relevant features. In unparalleled mode and typical data, over hundreds of feature selection and dimension reduction algorithms have been provided such as Principle Component Analysis, Linear Discriminant Analysis, and so on. Nevertheless, these algorithms are not implemented for real-world applications when data instances increasing in three-dimensional: volume, velocity and variety that called Big Data, actually if we want to use previous feature selection methods on Big Data, volume of large and complex computing will be required. VSA was inspired from the vortex pattern created by the vortical flow of the stirred fluids. In Map_Reduce framework, Map and Reduce functions executed in parallel mode. In the proposed method, in each step of Map function, a VSA is employed to find an optimized subset of features and decrease feature search space. In the light of the above consideration, we evaluate the proposed method in classification of a benchmark dataset for protein fold recognition. The experimental results indicate that the proposed method improves prediction accuracy considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Protein Folding Recognition

Protein fold recognition using Deep Kernelized Extreme Learning Machine and linear discriminant analysis

Article 15 January 2018

A novel fusion based on the evolutionary features for protein fold recognition using support vector machines

Article Open access 01 September 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Abbasi E, Ghatee M, Shiri ME (2013) FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 43(9):1182–1191
Google Scholar
Hashemi HB, Shakery A, Naeini MP, eds (2009) Protein fold pattern recognition using Bayesian ensemble of RBF neural networks. In: 2009 international conference of soft computing and pattern recognition. IEEE
Shenoy SR, Jayaram B (2010) Proteins: sequence to structure and function-current status. Curr Protein Pept Sci 11(7):498–514
Google Scholar
Lampros C, Papaloukas C, Exarchos K, Fotiadis DI, Tsalikakis D (2009) Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 39(10):907–914
Google Scholar
Aram RZ, Charkari NM (2015) A two-layer classification framework for protein fold recognition. J Theor Biol 365:32–39
MathSciNet MATH Google Scholar
Ibrahim W, Abadeh MS (2017) Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition. J Theor Biol 421:1–15
Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Manyika J (2011) Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation. Accessed 11 Jan 2020
Gartner (2017) Big data. https://www.gartner.com/en/information-technology/glossary/big-data. Accessed 11 Jan 2020
Shin K (ed) (2012) MapReduce algorithms for big data analysis. VLDB endowment. Springer, Berlin
Google Scholar
Zikopoulos P, Eaton C (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York
Google Scholar
Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, et al (2011) Challenges and opportunities with big data 2011-1
Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK (2009) The changing paradigm of data-intensive computing. Computer 42(1):26–34
Google Scholar
Hey AJ, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft research Redmond, Washington
Google Scholar
Wang Q, Wang C, Ren K, Lou W, Li J (2010) Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans Parallel Distrib Syst 22(5):847–859
Google Scholar
Oprea A, Reiter MK, Yang K (eds) (2005) Space-efficient block storage integrity. NDSS, San Diego
Google Scholar
Wang Q, Ren K, Yu S, Lou W (2011) Dependable and secure sensor data storage with dynamic integrity assurance. ACM Trans Sens Netw (TOSN) 8(1):9
Google Scholar
García A, Bourov S, Hammad A, Hartmann V, Jejkal T, Otte JC, et al (2011) Data-intensive analysis for scientific experiments at the large scale data facility. In: 2011 IEEE symposium on large data analysis and visualization. IEEE
Simeonidou D, Nejabati R, Zervas G, Klonidis D, Tzanakaki A, O’Mahony MJ (2005) Dynamic optical-network architectures and technologies for existing and emerging grid services. J Lightwave Technol 23(10):3347
Google Scholar
Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. arXiv preprint arXiv:09010131
Furht B, Escalante A (2010) Handbook of cloud computing. Springer, Berlin
MATH Google Scholar
Alpaydin E (2010) Introduction to machine learning. The MIT Press, London
MATH Google Scholar
Bikku T, Rao NS, Akepogu AR (2016) Hadoop based feature selection and decision making models on big data. Indian J Sci Technol. https://doi.org/10.17485/ijst/2016/v9i10/88905
Article Google Scholar
Ding CH, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358
Google Scholar
Hou J, Adhikari B, Cheng J (2017) DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34(8):1295–1303
Google Scholar
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Commun ACM. https://doi.org/10.1145/1327452.1327492
Article Google Scholar
Sudha P, Ramyachitra D, Manikandan P (2018) Enhanced artificial neural network for protein fold recognition and structural class prediction. Gene Rep 12:261–275
Google Scholar
Peyravi F, Latif A, Moshtaghioun SM (2019) A composite approach to protein tertiary structure prediction: hidden Markov model based on lattice. Bull Math Biol 81(3):899–918
MathSciNet MATH Google Scholar
García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):9
Google Scholar
Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM et al (2014) Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdiscip Rev Data Min Knowl Discov 4(5):380–409
Google Scholar
White T (2012) Hadoop: the definitive guide. O’Reilly Media Inc., Sebastopol
Google Scholar
Apache Hadoop Project (2015) Apache Hadoop
Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol
Google Scholar
Spark A (2015) Lightning-fast cluster computing. Apache Spark: official website
Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton
MATH Google Scholar
Razavi SF, Sajedi H (2019) SVSA: a semi vortex search algorithm for solving optimization problems. Int J Data Sci Anal 8(1):15–32
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
MATH Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Tauer G, Nagi R (2013) A map-reduce lagrangian heuristic for multidimensional assignment problems with decomposable costs. Parallel Comput 39(11):653–668
Google Scholar
UzZaman N (2007) Survey on Google file system. Survey Paper for CSC. p 456
Qian J, Lv P, Yue X, Liu C, Jing Z (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31
Google Scholar
Xu Y, Qu W, Li Z, Liu Z, Ji C, Li Y et al (2014) Balancing reducer workload for skewed data using sampling-based partitioning. Comput Electr Eng 40(2):675–687
Google Scholar
Rastrigin L (1963) The convergence of the random search method in the extremal control of a many parameter system. Autom Remote Control 24:1337–1342
Google Scholar
Schumer M, Steiglitz K (1968) Adaptive step size random search. IEEE Trans Autom Control 13(3):270–276
Google Scholar
Schrack G, Choit M (1976) Optimized relative step size random searches. Math Progr 10(1):230–244
MathSciNet MATH Google Scholar
Sajedi H, Razavi SF (2016) MVSA: multiple vortex search algorithm. In: 2016 IEEE 17th international symposium on computational intelligence and informatics (CINTI), Hungary
Göktepe YE, Kodaz H (2018) Prediction of protein–protein interactions using an effective sequence based combined method. Neurocomputing 303:68–74
Google Scholar
Doğan B, Ölmez T (2015) A new metaheuristic for numerical function optimization: vortex search algorithm. Inf Sci 293:125–145
Google Scholar
Hooda N, Seema B, Prashant SR (2018) Fraudulent firm classification: a case study of an external audit. Appl Artif Intell 32(1):48–64
Google Scholar
Göktepe YE, İlhan İ, Kahramanlı Ş (2016) Predicting protein–protein interactions by weighted pseudo amino acid composition. Int J Data Min Bioinform 15(3):272–290
Google Scholar
Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, Tutuncu M, Aydin T, Isenkul ME, Apaydin H (2019) A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 74:255–263
Google Scholar
Shen H-B, Chou K-C (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722
Google Scholar
Nanni L (2006) A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69(16–18):2434–2437
Google Scholar
Nanni L (2006) Ensemble of classifiers for protein fold recognition. Neurocomputing 69(7–9):850–853
Google Scholar
Chen Y, Chen F, Yang JY, Yang MQ (2008) Ensemble voting system for multiclass protein fold recognition. Int J Pattern Recognit Artif Intell 22(04):747–763
Google Scholar
Guo X, Gao X (2008) A novel hierarchical ensemble classifier for protein fold recognition. Protein Eng Des Sel 21(11):659–664
Google Scholar
Chmielnicki W, Sta K (2012) A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 75(1):194–198
Google Scholar
Martin S, Roe D, Faulon J-L (2004) Predicting protein–protein interactions using signature products. Bioinformatics 21(2):218–226
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, School of Engineering, Azad University, Science and Research Branch, Tehran, Iran
Elham Hekmatnia
Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
Hedieh Sajedi
Department of Computer Engineering, School of Engineering, Azad University, South Tehran Branch, Tehran, Iran
Ali Habib Agahi

Authors

Elham Hekmatnia
View author publications
You can also search for this author in PubMed Google Scholar
Hedieh Sajedi
View author publications
You can also search for this author in PubMed Google Scholar
Ali Habib Agahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hedieh Sajedi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hekmatnia, E., Sajedi, H. & Habib Agahi, A. A parallel classification framework for protein fold recognition. Evol. Intel. 13, 525–535 (2020). https://doi.org/10.1007/s12065-020-00350-7

Download citation

Received: 17 May 2019
Revised: 15 December 2019
Accepted: 06 January 2020
Published: 13 January 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s12065-020-00350-7

A parallel classification framework for protein fold recognition

Abstract

Access this article

Subscribe and save

Buy Now