A Fast and Memory-Efficient Hierarchical Graph Clustering Algorithm

László Szilágyi²⁰,
Sándor Miklós Szilágyi²¹ &
Béat Hirsbrunner²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8834))

Included in the following conference series:

International Conference on Neural Information Processing

4927 Accesses
1 Citations

Abstract

In this paper we propose a quick and memory-efficient implementation of the TRIBE-MCL clustering algorithm, suitable for accurate classification of large-scale protein sequence data sets. A symmetric sparse matrix structure is introduced that can efficiently handle most operations of the main loop. The reduction of memory requirements is achieved by regrouping the operations performed during the expansion matrix squaring. The proposed algorithm is tested on synthetic protein sequence data sets of up to 250 thousand items. The validation process revealed that the proposed method performs in 30% less time than previous efficient Markov clustering algorithms, without losing anything from the partition quality. This novel implementation makes it possible for the user of an ordinary PC to process protein sequences sets of 100,000 items in reasonable time.

Research supported by the Hungarian National Research Funds (OTKA), Project no. PD103921. The work of S.M. Szilágyi was supported by the TÁMOP-4.2.2.C-11/1/KONV-2012-0001 project, which is funded by the European Union, co-financed by the European Social Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

Fast Implementations of Markov Clustering for Protein Sequence Grouping

Clustering of Biological Sequences

References

Altschul, S.F., Madden, T.L., Schaffen, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search program. Nucl. Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Andreeva, A., Howorth, D., Chadonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data growth and its impact on the SCOP database: New developments. Nucl. Acids Res. 36, D419–D425 (2008)
Google Scholar
Enright, A.J., van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucl. Acids Res. 30, 1575–1584 (2002)
Article Google Scholar
Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G., Chothia, C.: SCOP: A structural classification of protein database. Nucl. Acids Res. 28, 257–259 (2000)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Structural Classification of Proteins database, http://scop.mrc-lmb.cam.ac.uk/scop
Szilágyi, L., Kovács, L., Szilágyi, S.M.: Synthetic test data generation for hierarchical graph clustering methods. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014, Part II. LNCS, vol. 8835, pp. 303–310. Springer, Heidelberg (2014)
Google Scholar
Szilágyi, L., Medvés, L., Szilágyi, S.M.: A modified Markov clustering approach to unsupervised classification of protein sequences. Neurocomput. 73, 2332–2345 (2010)
Article Google Scholar
Szilágyi, S.M., Szilágyi, L.: A fast hierarchical clustering algorithm for large-scale protein sequence data sets. Comput. Biol. Med. 48, 94–101 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Control Engineering and Information Technology, Budapest University of Technology and Economics, Hungary
László Szilágyi
Dept. of Informatics, Petru Maior University of Tîrgu-Mureş, Romania
Sándor Miklós Szilágyi
University of Fribourg, Switzerland
Béat Hirsbrunner

Authors

László Szilágyi
View author publications
You can also search for this author in PubMed Google Scholar
Sándor Miklós Szilágyi
View author publications
You can also search for this author in PubMed Google Scholar
Béat Hirsbrunner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology Building, University of Malaya, 50603, Kuala Lumpur, Malaysia
Chu Kiong Loo
Department of Electronics and Communication Engineering,College of Engineering, Jalan IKRAM-UNITEN, Universiti Tenaga Nasional, 43009, Kajang, Selangor, Malaysia
Keem Siah Yap
School of Engineering and Information Technology, Murdoch University, South St, 6150, Murdoch, Western Australia, Australia
Kok Wai Wong
Department of Electrical and Electronics Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 120-749, Seoul, South Korea
Andrew Teoh
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Ren’ai Road 111, SIP 215123, Suzhou, Jiangsu Province, China
Kaizhu Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szilágyi, L., Szilágyi, S.M., Hirsbrunner, B. (2014). A Fast and Memory-Efficient Hierarchical Graph Clustering Algorithm. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8834. Springer, Cham. https://doi.org/10.1007/978-3-319-12637-1_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-12637-1_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12636-4
Online ISBN: 978-3-319-12637-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Fast and Memory-Efficient Hierarchical Graph Clustering Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

Fast Implementations of Markov Clustering for Protein Sequence Grouping

Clustering of Biological Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Fast and Memory-Efficient Hierarchical Graph Clustering Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

Fast Implementations of Markov Clustering for Protein Sequence Grouping

Clustering of Biological Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation