[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

A parallel classification framework for protein fold recognition

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Proteins’ tertiary structure, which is determined by its amino acid sequence via the protein folding process, have essential role in the function of protein. Protein fold recognition is one of the interesting studies in bioinformatics. In this paper, to address this issue, we propose a Feature Selection (FS) method based on Map_Reduce framework and Vortex Search Algorithm (VSA). FS is one of the most important steps of pre-processing data, which aims to select a variable subset of relevant features. In unparalleled mode and typical data, over hundreds of feature selection and dimension reduction algorithms have been provided such as Principle Component Analysis, Linear Discriminant Analysis, and so on. Nevertheless, these algorithms are not implemented for real-world applications when data instances increasing in three-dimensional: volume, velocity and variety that called Big Data, actually if we want to use previous feature selection methods on Big Data, volume of large and complex computing will be required. VSA was inspired from the vortex pattern created by the vortical flow of the stirred fluids. In Map_Reduce framework, Map and Reduce functions executed in parallel mode. In the proposed method, in each step of Map function, a VSA is employed to find an optimized subset of features and decrease feature search space. In the light of the above consideration, we evaluate the proposed method in classification of a benchmark dataset for protein fold recognition. The experimental results indicate that the proposed method improves prediction accuracy considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://archive.ics.uci.edu/ml/datasets/Audit+Data

  2. https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

References

  1. Abbasi E, Ghatee M, Shiri ME (2013) FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 43(9):1182–1191

    Google Scholar 

  2. Hashemi HB, Shakery A, Naeini MP, eds (2009) Protein fold pattern recognition using Bayesian ensemble of RBF neural networks. In: 2009 international conference of soft computing and pattern recognition. IEEE

  3. Shenoy SR, Jayaram B (2010) Proteins: sequence to structure and function-current status. Curr Protein Pept Sci 11(7):498–514

    Google Scholar 

  4. Lampros C, Papaloukas C, Exarchos K, Fotiadis DI, Tsalikakis D (2009) Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 39(10):907–914

    Google Scholar 

  5. Aram RZ, Charkari NM (2015) A two-layer classification framework for protein fold recognition. J Theor Biol 365:32–39

    MathSciNet  MATH  Google Scholar 

  6. Ibrahim W, Abadeh MS (2017) Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition. J Theor Biol 421:1–15

    Google Scholar 

  7. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  8. Manyika J (2011) Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation. Accessed 11 Jan 2020

  9. Gartner (2017) Big data. https://www.gartner.com/en/information-technology/glossary/big-data. Accessed 11 Jan 2020

  10. Shin K (ed) (2012) MapReduce algorithms for big data analysis. VLDB endowment. Springer, Berlin

    Google Scholar 

  11. Zikopoulos P, Eaton C (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York

    Google Scholar 

  12. Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, et al (2011) Challenges and opportunities with big data 2011-1

  13. Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK (2009) The changing paradigm of data-intensive computing. Computer 42(1):26–34

    Google Scholar 

  14. Hey AJ, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft research Redmond, Washington

    Google Scholar 

  15. Wang Q, Wang C, Ren K, Lou W, Li J (2010) Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans Parallel Distrib Syst 22(5):847–859

    Google Scholar 

  16. Oprea A, Reiter MK, Yang K (eds) (2005) Space-efficient block storage integrity. NDSS, San Diego

    Google Scholar 

  17. Wang Q, Ren K, Yu S, Lou W (2011) Dependable and secure sensor data storage with dynamic integrity assurance. ACM Trans Sens Netw (TOSN) 8(1):9

    Google Scholar 

  18. García A, Bourov S, Hammad A, Hartmann V, Jejkal T, Otte JC, et al (2011) Data-intensive analysis for scientific experiments at the large scale data facility. In: 2011 IEEE symposium on large data analysis and visualization. IEEE

  19. Simeonidou D, Nejabati R, Zervas G, Klonidis D, Tzanakaki A, O’Mahony MJ (2005) Dynamic optical-network architectures and technologies for existing and emerging grid services. J Lightwave Technol 23(10):3347

    Google Scholar 

  20. Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. arXiv preprint arXiv:09010131

  21. Furht B, Escalante A (2010) Handbook of cloud computing. Springer, Berlin

    MATH  Google Scholar 

  22. Alpaydin E (2010) Introduction to machine learning. The MIT Press, London

    MATH  Google Scholar 

  23. Bikku T, Rao NS, Akepogu AR (2016) Hadoop based feature selection and decision making models on big data. Indian J Sci Technol. https://doi.org/10.17485/ijst/2016/v9i10/88905

    Article  Google Scholar 

  24. Ding CH, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358

    Google Scholar 

  25. Hou J, Adhikari B, Cheng J (2017) DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34(8):1295–1303

    Google Scholar 

  26. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. Commun ACM. https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  27. Sudha P, Ramyachitra D, Manikandan P (2018) Enhanced artificial neural network for protein fold recognition and structural class prediction. Gene Rep 12:261–275

    Google Scholar 

  28. Peyravi F, Latif A, Moshtaghioun SM (2019) A composite approach to protein tertiary structure prediction: hidden Markov model based on lattice. Bull Math Biol 81(3):899–918

    MathSciNet  MATH  Google Scholar 

  29. García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):9

    Google Scholar 

  30. Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM et al (2014) Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdiscip Rev Data Min Knowl Discov 4(5):380–409

    Google Scholar 

  31. White T (2012) Hadoop: the definitive guide. O’Reilly Media Inc., Sebastopol

    Google Scholar 

  32. Apache Hadoop Project (2015) Apache Hadoop

  33. Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  34. Spark A (2015) Lightning-fast cluster computing. Apache Spark: official website

  35. Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton

    MATH  Google Scholar 

  36. Razavi SF, Sajedi H (2019) SVSA: a semi vortex search algorithm for solving optimization problems. Int J Data Sci Anal 8(1):15–32

    Google Scholar 

  37. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    MATH  Google Scholar 

  38. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    MATH  Google Scholar 

  39. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  40. Tauer G, Nagi R (2013) A map-reduce lagrangian heuristic for multidimensional assignment problems with decomposable costs. Parallel Comput 39(11):653–668

    Google Scholar 

  41. UzZaman N (2007) Survey on Google file system. Survey Paper for CSC. p 456

  42. Qian J, Lv P, Yue X, Liu C, Jing Z (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31

    Google Scholar 

  43. Xu Y, Qu W, Li Z, Liu Z, Ji C, Li Y et al (2014) Balancing reducer workload for skewed data using sampling-based partitioning. Comput Electr Eng 40(2):675–687

    Google Scholar 

  44. Rastrigin L (1963) The convergence of the random search method in the extremal control of a many parameter system. Autom Remote Control 24:1337–1342

    Google Scholar 

  45. Schumer M, Steiglitz K (1968) Adaptive step size random search. IEEE Trans Autom Control 13(3):270–276

    Google Scholar 

  46. Schrack G, Choit M (1976) Optimized relative step size random searches. Math Progr 10(1):230–244

    MathSciNet  MATH  Google Scholar 

  47. Sajedi H, Razavi SF (2016) MVSA: multiple vortex search algorithm. In: 2016 IEEE 17th international symposium on computational intelligence and informatics (CINTI), Hungary

  48. Göktepe YE, Kodaz H (2018) Prediction of protein–protein interactions using an effective sequence based combined method. Neurocomputing 303:68–74

    Google Scholar 

  49. Doğan B, Ölmez T (2015) A new metaheuristic for numerical function optimization: vortex search algorithm. Inf Sci 293:125–145

    Google Scholar 

  50. Hooda N, Seema B, Prashant SR (2018) Fraudulent firm classification: a case study of an external audit. Appl Artif Intell 32(1):48–64

    Google Scholar 

  51. Göktepe YE, İlhan İ, Kahramanlı Ş (2016) Predicting protein–protein interactions by weighted pseudo amino acid composition. Int J Data Min Bioinform 15(3):272–290

    Google Scholar 

  52. Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, Tutuncu M, Aydin T, Isenkul ME, Apaydin H (2019) A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 74:255–263

    Google Scholar 

  53. Shen H-B, Chou K-C (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722

    Google Scholar 

  54. Nanni L (2006) A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69(16–18):2434–2437

    Google Scholar 

  55. Nanni L (2006) Ensemble of classifiers for protein fold recognition. Neurocomputing 69(7–9):850–853

    Google Scholar 

  56. Chen Y, Chen F, Yang JY, Yang MQ (2008) Ensemble voting system for multiclass protein fold recognition. Int J Pattern Recognit Artif Intell 22(04):747–763

    Google Scholar 

  57. Guo X, Gao X (2008) A novel hierarchical ensemble classifier for protein fold recognition. Protein Eng Des Sel 21(11):659–664

    Google Scholar 

  58. Chmielnicki W, Sta K (2012) A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 75(1):194–198

    Google Scholar 

  59. Martin S, Roe D, Faulon J-L (2004) Predicting protein–protein interactions using signature products. Bioinformatics 21(2):218–226

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hedieh Sajedi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hekmatnia, E., Sajedi, H. & Habib Agahi, A. A parallel classification framework for protein fold recognition. Evol. Intel. 13, 525–535 (2020). https://doi.org/10.1007/s12065-020-00350-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00350-7

Keywords

Navigation