[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2506583.2506638acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures

Published: 22 September 2013 Publication History

Abstract

With the rapid growth in available genomic data, robust and efficient methods for identifying RNA secondary structure elements, such as hairpins, have become a significant challenge in computational biology, with potential applications in prediction of RNA secondary and tertiary structures, functional classification of RNA structures, micro RNA target prediction, and discovery of RNA structure motifs. In this work, we propose the Forward Stem Matrix (FSM), a data structure to efficiently represent all k-length stem options, for k ∈ K, within an n-length RNA sequence T. We show that the FSM structure is of size O(n|K|) and still permits efficient access to stems. In this paper, we provide a linear O(n|K|) construction for the FSM using suffix arrays and data structures related to the Longest Previous Factor (LPF), namely, the Furthest Previous Non-Overlapping Factor (FPnF) and Furthest Previous Factor (FPF) arrays. We also provide new constructions for the FPnF and FPF via a novel application of parameterized string (p-string) theory and suffix trees. As an application of the FSM, we show how to efficiently find all hairpin structures in an RNA sequence. Experimental results show the practical performance of the proposed data structures.

References

[1]
Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, New York (2008)
[2]
Baker, B.: Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences. 52(1), 28--42 (1996)
[3]
Batey, R., Rambo, R., Doudna, J.: Tertiary motifs in RNA structure and folding. Angewandte Chemie International Edition. 38, 2326--2343 (1999)
[4]
Beal, R., Adjeroh, D.: Border array for structural strings. In: IWOCA'12, pp. 189--205 (2012)
[5]
Beal, R., Adjeroh, D.: Parameterized longest previous factor. Theoretical Computer Science. 437, 21--34 (2012)
[6]
Beal, R., Adjeroh, D.: Variations of the parameterized longest previous factor. Journal of Discrete Algorithms. 16, 129--150 (2012)
[7]
Capriotti, E., Marti-Renom, M.: Computational RNA structure prediction. Current Bioinformatics. 3, 32--45 (2008)
[8]
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. McGraw-Hill, Boston (2002)
[9]
Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Information Processing Letters. 106(2), 75--80 (2008)
[10]
Crochemore, M., Ilie, L., Smyth, W.: A simple algorithm for computing the Lempel Ziv factorization. In: DCC'08, pp. 482--488 (2008)
[11]
Crochemore, M., Iliopoulos, C., Kubica, M., Rytter, W., Waleń, T.: Efficient algorithms for three variants of the LPF table. Journal of Discrete Algorithms. 11, 51--61 (2012)
[12]
Crochemore, M., Tischler, G.: Computing longest previous non-overlapping factors. Information Processing Letters. 111(6), 291--295 (2011)
[13]
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.: Rfam: An RNA family database. Nucleic Acids Research. 31(1), 439--441 (2003)
[14]
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, UK (1997)
[15]
Han, B., Dost, B., Bafna, V., Zhang, S.: Structural alignment of pseudoknotted RNA. Journal of Computational Biology. 15(5), 489--504 (2008)
[16]
Hofacker, I., Schuster, P., Stadler, P.: Combinatorics of RNA secondary structures. Discrete Applied Mathematics. 88(1-3), 207--237 (1998)
[17]
Laing, C., Schlick, T.: Computational approaches to RNA structure prediction, analysis, and design. Current Opinion in Structural Biology. 21(3), 306--318 (2011)
[18]
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing. 22, 935--948 (1993)
[19]
Mauri, G., Pavesi, G.: Algorithms for pattern matching and discovery in RNA secondary structure. Theoretical Computer Science. 335(1), 29--51 (2005)
[20]
Meyer, F., Kurtz, S., Backofen, R., Will, S., Beckstette, M.: Structator: Fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics. 12, 214 (2011)
[21]
Novikova, I., Hennelly, S., Tung, C., Sanbonmatsu, K.: Rise of the RNA machines: Exploring the structure of long non-coding RNAs. Journal of Molecular Biology. (In Press)
[22]
Pedersen, J., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E., Kent, J., Miller, W., Haussler, D.: Identification and classification of conserved RNA secondary structures in the human genome. PLoS Computational Biology. 2(4), 0251--0262 (2006)
[23]
Penner, R., Waterman, M.: Spaces of RNA secondary structures. Advances in Mathematics. 101(1), 31--49 (1993)
[24]
Rabani, M., Kertesz, M., Segal, E.: Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. PNAS. 105(39), 14885--14890 (2008)
[25]
Reidys, C., Huang, F., Andersen, J., Penner, R., Stadler, P., Nebel, M.: Topology and prediction of RNA pseudoknots. Bioinformatics. 27(8), 1076--1085 (2011)
[26]
Rinn, J., Chang, H.: Genome regulation by long noncoding RNAs. Annual Review of Biochemistry. 81, 145--166 (2012)
[27]
Rivas, E., Eddy, S.: A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology. 285(5), 2053--68 (1999)
[28]
Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching. Algorithmica. 39(1), 1--19 (2004)
[29]
Smyth, W.: Computing Patterns in Strings. Pearson, New York (2003)
[30]
Strothmann, D.: The affix array data structure and its applications to RNA secondary structure analysis. Theoretical Computer Science. 389(1-2), 278--294 (2007)
[31]
Svoboda, P., Di Cara, A.: Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences. 63, 901--918 (2006)
[32]
Wang, X.: Computational prediction of microRNA targets. Methods in Molecular Biology. 667, 283--295 (2010)
[33]
Zeidman, B. Software v. software. IEEE Spectrum. 47, 32--53 (Oct. 2010)

Index Terms

  1. The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
      September 2013
      987 pages
      ISBN:9781450324342
      DOI:10.1145/2506583
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 September 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FPF
      2. FPnF
      3. FSM
      4. Forward Stem Matrix
      5. Furthest Previous Factor
      6. Furthest Previous Non-Overlapping Factor
      7. RNA secondary structures
      8. hairpins
      9. p-match
      10. parameterized pattern matching
      11. pattern matching

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      BCB'13
      Sponsor:
      BCB'13: ACM-BCB2013
      September 22 - 25, 2013
      Wshington DC, USA

      Acceptance Rates

      BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 55
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 31 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media