Abstract
Previous algorithms for motif discovery and protein alignment have used a variety of scoring functions, each specialized to find certain types of similarity in preference to others. Here we present a novel scoring function that combines the relative entropy score with a sensitivity to amino acid similarities, producing a score that is highly sensitive to the types of weakly-conserved patterns that are typically seen in proteins. We investigate the performance of the hybrid score compared to existing scoring functions. We conclude that the hybrid is more sensitive than previous protein scoring functions, both in the initial detection of a weakly conserved region of similarity, and given such a similarity, in the detection of weakly-conserved instances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Blanchette, B. Schwikowski, and M. Tompa. Algorithms for phylogenetic footprinting. J. Comp. Bio., 9(2):211–223, 2002.
K. S. Chan. Asymptotic behavior of the gibbs sampler. J. Amer. Statist. Assoc., 88:320–326, 1993.
M. 0. Dayhoff, R. M. Schwartz, and B. C. Orcutt. A model of evolutionary change in proteins. In M. O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, suppl. 3, pages 345–352. Natl. Biomed. Res. Found., Washington, 1978.
A. Dembo and S. Karlin. Strong limit theorems of empirical functionals for large exceedances of partial sums of iid variables. Annals of Probability, 19(4):1737–1755, 1991.
R. Laskowski et. al. Pdbsum. http://www.biochem.ucl.ac.uk/bsm/pdbsum/, 2002.
Schwartz et al. Pipmaker—a web server for aligning two genomic dna sequences. Genome Research, 10:577–586, April 2000.
ExPASy. Prosite. http://www.expasy.ch/prosite/, 2002. hosted by the Swiss Insitute of Bioinformatics.
ExPASy. Swiss-prot. http://www.expasy.ch/sprot/, 2002. hosted by the Swiss Insitute of Bioinformatics.
J. G. Henikoff and S. Henikoff. Using substitution probabilities to improve position-specific scoring matrices. Comput. Appl. Biosci., 12(2):135–43, 1996.
S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89:10915–10919, 1992.
S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl Acad. Sci. USA, 87:2264–2268, 1990.
S. Karlin and S. F. Altschul. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl Acad. Sci. USA, 90:5873–5877, 1993.
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science, 262:208–214, 1993.
B. Morgenstern. Dialign 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15:211–218, 1999.
B. Morgenstern, A. Dress, and T. Werner. Multiple dna and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. USA, 93:12098–12103, 1996.
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.
S. Pietrokovski, J. G. Henikoff, and S. Henikoff. The blocks database—a system for protein classification. Nucl. Acids Res., 24(1):197–200, 1996.
E. Rocke and M. Tompa. An algorithm for finding novel gapped motifs in dna sequences. In Proc. of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB 1998), pages 228–233, March 1998.
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.
J. D. Thompson, D. G. Higgins, and T. J. Gibson. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res., 22:4673–4680, 1994.
T. D. Wu and D. L. Brutlag. Discovering empirically conserved amino acid substitution groups in databases of protein families. In Proc. of the 4th International Conference on Intelligent Systems for Molecular Biology (ISMB 1996), pages 230–240, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rocke, E. (2002). A Hybrid Scoring Function for Protein Multiple Alignment. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_19
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive