Abstract
This paper presents a generalization of the notion of longest repeats with a block of k don’t care symbols introduced by [8] (for k fixed) to longest motifs composed of three parts: a first and last that parameterize match (that is, match via some symbol renaming, initially unknown), and a functionally equivalent central block. Such three-part motifs are called longest block motifs. Different types of functional equivalence, and thus of matching criteria for the central block are considered, which include as a subcase the one treated in [8] and extend to the case of regular expressions with no Kleene closure or complement operation. We show that a single general algorithmic tool that is a non-trivial extension of the ideas introduced in [8] can handle all the various kinds of longest block motifs defined in this paper. The algorithm complexity is, in all cases, in O(n log n).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Aumann, Y., Cole, R., Lewenstein, M., Porat, E.: Function matching: Algorithms, applications, and a lower bound. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 929–942. Springer, Heidelberg (2003)
Baker, B.S.: Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52(1), 28–42 (1996)
Baker, B.S.: Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Computing 26(5), 1343–1362 (1997)
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. J. of Computational Biology 5, 277–304 (1997)
Brodal, G.S., Lyngsø, R.B., Pederson, C.N.S., Stoye, J.: Finding maximal pairs with bounded gaps. J. of Discrete Algorithms 1(1), 1–27 (2000)
Chan, H.S., Dill, K.A.: Compact polymers. Macromolecules 22, 4559–4573 (1989)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (1998)
Crochemore, M., Iliopoulos, C.S., Mohamed, M., Sagot, M.-F.: Longest repeated motif with a block of don’t cares. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 271–278. Springer, Heidelberg (2004)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. on Computing 13, 338–355 (1984)
Karlin, S., Ghandour, G.: Multiple-alphabet amino acid sequence comparisons of the immunoglobulin kappa-chain constant domain. Proc. Natl. Acad. Sci. USA 82(24), 8597–8601 (1985)
Kolpakov, R., Kucherov, G.: Finding repeats with fixed gaps. In: Proc. of SPIRE 2002, pp. 162–168 (2002)
Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. (5), 323–330 (2003)
Liu, X., Liu, D., Qi, J., Zheng, W.M.: Simplified amino acid alphabets based on deviation of conditional probability from random background. Phys. Rev. E 66, 1–9 (2002)
Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge (1997)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)
Lothaire, M.: Applied Combinatorics on Words. in preparation (2004), http://igm.univ-mlv.fr/~berstel/Lothaire/index.html
McCreight, E.M.: A space economical suffix tree construction algorithm. J. of ACM 23, 262–272 (1976)
Murphy, L.R., Wallqvist, A., Levy, R.M.: Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein. Eng. 13, 149–152 (2000)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. Siam J. on Computing 17, 1253–1262 (1988)
Spitzer, M., Fuellen, G., Cullen, P., Lorkowsk, S.: Viscose: Visualisation and comparison of consensus sequences. Bioinformatics (2004) (to appear)
Wang, J., Wang, W.: A computational approach to simplifying the protein folding alphabet. Nat. Struct. Biol. 11, 1033–1038 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crochemore, M., Giancarlo, R., Sagot, MF. (2004). Longest Motifs with a Functionally Equivalent Central Block. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive