Abstract
Gene clustering is one of the most important problems in bioinformatics. In the sequential data clustering, hidden Markov models (HMMs) have been widely used to find similarity between sequences, due to their capability of handling sequence patterns with various lengths. In this paper, a novel gene clustering scheme based on HMMs optimized by particle swarm optimization algorithm is introduced. In this approach, each gene sequence is described by a specific HMM, and then for each model, its probability to generate individual sequence is evaluated. A hierarchical clustering algorithm based on a new definition of a distance measure has been applied to find the best clusters. Experiments carried out on lung cancer-related genes dataset show that the proposed approach can be successfully utilized for gene clustering.
Similar content being viewed by others
References
http://healthfinder.gov/orgs/HR3150.htm. visited Nov 2011
Krogh A, Brown M, Mian I.S, Sjolander K, Haussler D (1993) Hidden Markov models in computational biology: application to protein modeling. UCSC-CRL-93-32
Zhang ZY, Li T, Ding C, Ren XW, Zhang XS (2010) Binary matrix factorization for analyzing gene expression data. Data Min Knowl Discov 20:28–52
Vignes M, Forbes F (2009) Gene clustering via integrated Markov models combining individual and pairwise features. IEEE/ACM Trans Comput Biol Bioinform 6:260–270
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286
Rabiner LR, Lee CH, Juang BH, Wilpon JG (1989) HMM clustering for connected word recognition. In: Proceedings of IEEE ICASSP, pp 405–408
Lee KF (1990) Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition. IEEE Trans Acoust Speech Signal Process 38:599–609
Al-Hajj R, Mokbel C, Likforman-Sulem L (2007) Combination of HMM-based classifiers for the recognition of arabic handwritten words. In: 9th International conference on document analysis and recognition, pp 959–963
Panuccio A, Bicego M, Murino V (2002) A hidden Markov model-based approach to sequential data clustering. Struct Synt Stat Pattern Recognit 2396:734–743
Bicego M, Murino V, Figueiredo MAT (2004) Similarity-based classification of sequences using hidden Markov models. Pattern Recognit Soc 37:2281–2291
Li C, Biswas G (2000) A Bayesian approach to temporal data clustering using hidden Markov models. In: Proceedings of the 17th international conference on machine learning, pp 543–550
Ferles C, Stafylopatis A (2008) Sequence clustering with the self-organizing hidden Markov model map. In: 8th IEEE international conference on bioinformatics and bioengineering, pp 1–7
Mesa A, Basterrech S, Guerberoff G, Alveraz-Valin F (2015) Hidden Markov models for gene sequence classification. Pattern Anal Appl 19:793–805
Kennedy J, Eberhart RC (1995) Particle swarm optimization. Process IEEE Int Conf Neural Netw 4:1942–1948
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3:95–99
Angeline PJ (1998) Evolutionary optimization versus particle swarm optimization: philosophy and performance differences. Evolut Program VII 1447:601–610
Xue L, Yin J, Ji Z, Jiang L (2006) A particle swarm optimization for hidden Markov model training. In: Proceedings of 8th international conference on signal processing
Banu PK, Andrews S (2015) Gene clustering using metaheuristic optimization algorithms. Int J Appl Metaheur Comput 6(4):14–38
Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Soruri, M., Sadri, J. & Zahiri, S.H. Gene clustering with hidden Markov model optimized by PSO algorithm. Pattern Anal Applic 21, 1121–1126 (2018). https://doi.org/10.1007/s10044-018-0680-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-018-0680-9