[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Designing Filters for Fast-Known NcRNA Identification

Published: 01 May 2012 Publication History

Abstract

Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSP-based filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSP-based filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/

References

[1]
S.R. Eddy, "Non-Coding RNA Genes and the Modern RNA World," Nature Rev. Genetics, vol. 2, pp. 919-929, 2001.
[2]
S.R. Eddy, "A Memory-Efficient Dynamic Programming Algorithm for Optimal Alignment of a Sequence to an RNA Secondary Structure," BMC Bioinformatics, vol. 3, pp. 3-18, 2002.
[3]
S.R. Eddy and R. Durbin, "RNA Sequence Analysis Using Covariance Models," Nucleic Acids Research, vol. 22, pp. 2079- 2088, 1994.
[4]
R. Durbin, S.R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[5]
D.H. Younger, "Recognition and Parsing of Context-Free Languages in Time n3," Information and Control, vol. 10, pp. 189-208, 1967.
[6]
S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S.R. Eddy, and A. Bateman, "Rfam: Annotating Non-coding RNAs in Complete Genomes," Nucleic Acids Research, vol. 33, pp. D121- D124, 2005.
[7]
Z. Weinberg Z and W.L. Ruzzo, "Faster Genome Annotation of Non-Coding RNA Families without Loss of Accuracy," Proc. Eighth Ann. Int'l Conf. Research Computational Moleculer Biology (RECOMB '04), pp. 243-51, 2004.
[8]
B. Brejova, D.G. Brown, and T. Vinar, "Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions," Proc. 14th Ann. Symp. Combinatorial Pattern Matching (CPM '03), pp. 42-54, 2003.
[9]
J. Buhler, U. Keich, and Y. Sun, "Designing Seeds for Similarity Search in Genomic DNA," Proc. Seventh Ann. Int'l Conf. Research Computational Moleculer Biology (RECOMB '03), pp. 67-75, 2003.
[10]
M. Li, B. Ma, D. Kisman, and J. Tromp, "PatternHunter II: Highly Sensitive and Fast Homology Search," J. Bioinformatics and Computational Biology, vol. 2, pp. 417-39, 2004.
[11]
L. Noe and G. Kucherov, "Improved Hit Criteria for DNA Local Alignment," BMC Bioinformatics, vol. 5, pp. 149-158, 2004.
[12]
Y. Sun and J. Buhler, "Designing Multiple Simultaneous Seeds for DNA Similarity Search," Proc. Eighth Ann. Int'l Conf. Research Computational Moleculer Biology (RECOMB '04), pp. 76-84, 2004.
[13]
Y. Sun and J. Buhler, "Designing Patterns and Profiles for Profile HMM Search," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 232-243, Apr.-June, 2008.
[14]
T. Lowe and S.R. Eddy, "TRNAscan-SE: A Program For Improved Detection of Transfer RNA Genes in Genomic Sequence," Nucleic Acids Research, vol. 25, pp. 955-64, 1997.
[15]
Z. Weinberg and W.L. Ruzzo, "Sequence-Based Heuristics for Faster Annotation of Non-Coding RNA Families," Bioinformatics, vol. 22, pp. 35-39, 2006.
[16]
Z. Weinberg and W.L. Ruzzo, "Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without Loss of Accuracy," Bioinformatics, vol. 20, no. 1, pp. i334-i340, 2004.
[17]
S. Zhang, B. Haas, E. Eskin, and V. Bafna, "Searching Genomes for Noncoding RNA Using FastR," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 366-379, Oct.-Dec. 2005.
[18]
S. Zhang, I. Borovok, Y. Aharonowitz, R. Sharan, and V. Bafna, "A Sequence-Based Filtering Method for ncRNA Identification and Its Application to Searching for Riboswitch Elements," Bioinformatics, vol. 22, pp. e557-e565, 2006.
[19]
D. Gautheret and A. Lambert, "Direct DNA Motif Definition and Identification from Multiple Sequence Alignments Using Secondary Structure Profiles," J. Moleculer Biology, vol. 313, pp. 1003-1011, 2001.
[20]
V. Bafna and S. Zhang, "FastR: Fast Database Search Tool for Non-Coding RNA," Proc. IEEE Computational Systems Bioinformatics Conf. (CSB '04), pp. 52-61, 2004.
[21]
E.K. Freyhult, J.B. Bollback, and P.P. Gardner, "Exploring Genomic Dark Matter: A Critical Assessment of the Performance of Homology Search Methods on Noncoding RNA," Genome Research, vol. 17, pp. 117-25, 2006.
[22]
S.G. Tringe, C.v. Mering, A. Kobayashi, A.A. Salamov, K. Chen, H.W. Chang, M. Podar, J.M. Short, E.J. Mathur, J.C. Detter, P. Bork, P. Hugenholtz, and E.M. Rubin, "Comparative Metagenomics of Microbial Communities," Science, vol. 308, pp. 554-557, 2005.
[23]
E.P. Nawrocki, "Structural RNA Homology Search and Alignment Using Covariance Models," PhD thesis, Washington University's School of Medicine, 2009.
[24]
M. Beckstette, R. Homann, R. Giegerich, and S. Kurtz, "Fast Index Based Algorithms and Software for Matching Position Specific Scoring Matrices," BMC Bioinformatics, vol. 7, article 389, 2006.
[25]
J. Oosterhoff, "Combination of One-Sided Statistical Tests," Mathematisch Centrum, Amsterdm, 1969.
[26]
T.L. Bailey and W.N. Grundy, "Classifying Proteins by Family Using the Product of Correlated p-Values," Proc. Third Ann. Int'l Conf. Computational Molecular Biology, pp. 10-14, 1999.
[27]
E.P. Nawrocki, D.L. Kolbe, and S.R. Eddy, "Infernal 1.0: Inference of RNA alignments," Bioinformatics, vol. 25, pp. 1335-1337, 2009.
[28]
Y. Sun and J. Buhler, "Designing Secondary Structure Profiles for Fast ncRNA Identification," Proc. Computational Systems Bioinformatics (CSB '08), pp. 145-156, 2008.
[29]
P.P. Gardner, J. Daub, J.G. Tate, E.P. Nawrocki, D.L. Kolbe, S. Lindgreen, A.C. Wilkinson, R.D. Finn, S. Griffiths-Jones, S.R. Eddy, and A. Bateman, "Rfam: Updates to the RNA Families Database," Nucleic Acids Research, vol. 37, no. database issue, pp. D136-D140, 2008.
[30]
R.J. Klein and S.R. Eddy, "RSEARCH: Finding Homologs of Single Structured RNA Sequences," BMC Bioinformatics, vol. 4, article 44, 2003.

Cited By

View all
  • (2013)glu-RNAProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506617(508-517)Online publication date: 22-Sep-2013
  1. Designing Filters for Fast-Known NcRNA Identification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 9, Issue 3
    May 2012
    302 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 May 2012
    Published in TCBB Volume 9, Issue 3

    Author Tags

    1. Algorithms for data and knowledge
    2. bioinformatics (genome or protein)
    3. feature extraction or construction
    4. formal languages.

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)glu-RNAProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506617(508-517)Online publication date: 22-Sep-2013

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media