Abstract
This paper investigates the similarity of two sequences, one of the main issues for fragments clustering and classification when sequencing the genomes of microbial communities directly sampled from natural environment. In this paper, we use the relative entropy as a criterion of similarity of two sequences and discuss its characteristics in DNA sequences. A method for evaluating the relative entropy is presented and applied to the comparison between two sequences. With combination of the relative entropy and the length of variables defined in this paper, the similarity of sequences is easily obtained. The SOM and PCA are applied to cluster subsequences from different genomes. Computer simulations verify that the method works well.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tyson, G.W., Chapman, J., et al.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004)
Steuer, R., Kurths, J., et al.: The mutual information- Detecting and evaluation dependencies between variables. Bioinformatics 18(Suppl. 2), 231–240 (2002)
Thomas, M.C., Joy, A.T.: Elements of Information Theory. Wiley, New York (2001)
Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. BIoinformatics 19, 513–523 (2003)
Basu, S., Burma, D.P., et al.: Words in DNA sequences- some case studies based on their frequency statistics. Mathematical Biology 46(6), 479–503 (2003)
Strickert, M.: Self-Organizing Neural Networks for Sequence Processing. University of Osnabruck 7, 68 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, W., Pi, X., Zhang, L. (2005). Similarity Analysis of DNA Sequences Based on the Relative Entropy. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3610. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539087_137
Download citation
DOI: https://doi.org/10.1007/11539087_137
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28323-2
Online ISBN: 978-3-540-31853-8
eBook Packages: Computer ScienceComputer Science (R0)