[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2506583.2506617acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

glu-RNA: aliGn highLy strUctured ncRNAs using only sequence similarity

Published: 22 September 2013 Publication History

Abstract

Generating reliable alignments for ncRNAs is an important step in ncRNA secondary structure prediction and ncRNA gene finding. Existing sequence alignment programs can generate reliable alignments for ncRNAs with high sequence conservation. For highly structured ncRNAs that may lack strong sequence similarity, structural alignment programs are required. However, conducting reliable structural alignment is much more expensive than sequence alignment and is not ideal for large-scale input such as whole genomes or next-generation sequencing data.
In this paper, we propose an accurate ncRNA alignment approach to align highly structured ncRNAs using only sequence similarity. By incorporating posterior probability and a machine learning approach, we can generate accurate alignments of highly structured ncRNAs without using structural information. We tested our approach on over three hundreds of pairs of highly structured ncRNAs from BRAliBase 2.1. The experimental results show that our approach can achieve more accurate alignments than commonly used sequence alignment programs and a popular structural alignment tool.
The source codes of glu-RNA can be downloaded at http://sourceforge.net/projects/glu-rna/.

References

[1]
Stefan Washietl, Ivo L. Hofacker, and Peter F. Stadler. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA, 102(7):2454--2459, 2005.
[2]
Jakob Skou Pedersen, Gill Bejerano, Adam Siepel, Kate Rosenbloom, Kerstin Lindblad-Toh, Eric S Lander, Jim Kent, Webb Miller, and David Haussler. Identification and classification of conserved RNA secondary structures in the Human Genome. PLoS Comput Biol, 2(4):251--262, 2006.
[3]
Elena Rivas and Sean R. Eddy. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2(1):8--26, 2001.
[4]
Ewan Birney et al. Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature, 447(7146):799--816, 2007.
[5]
Elfar Torarinsson, Milena Sawera, Jakob H. Havgaard, Merete Fredholm, and Jan Gorodkin. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Research, 16:885--889, 2006.
[6]
Jakob H. Havgaard, Rune B. Lyngso, and Jan Gorodkin. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucl. Acids Res., 33(suppl. 2):W650--653, 2005.
[7]
David Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM Journal on Applied Mathematics, 45(5):810--825, 1985.
[8]
Yasuo Tabei, Koji Tsuda, Taishin Kin, and Kiyoshi Asai. SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments. Bioinformatics, (22):1723--1729, 2006.
[9]
David H. Mathews and Douglas H. Turner. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology, 317(2):191--203, 2002.
[10]
Sebastian Will, Michael Yu, and Bonnie Berger. Structure-based whole genome realignment reveals many novel non-coding RNAs. Genome Research, 2013.
[11]
Arif Harmanci, Gaurav Sharma, and David Mathews. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics, 8(1):130, 2007.
[12]
Robin D Dowell and Sean R Eddy. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics, 7(400), 2006.
[13]
Kun-Mao Chao, William R. Pearson, and Webb Miller. Aligning two sequences within a specified diagonal band. Comput Appl Biosci., 8(5):481--7, 1992.
[14]
Paul P. Gardner, Andreas Wilm, and Stefan Washietl. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Research, 33(8):2433--2439, 2005.
[15]
Satish Chikkagoudar, Dennis R. Livesay, and Usman Roshan. PLAST-ncRNA: Partition function Local Alignment Search Tool for non-coding RNA sequences. Nucleic Acids Research, 38:W59--W63, 2010.
[16]
Ulrike Mückstein, Ivo L. Hofacker, and Peter F. Stadler. Stochastic pairwise alignments. Bioinformatics, 18(suppl 2):S153--S160, 2002.
[17]
Ian Holmes. Studies in probabilistic sequence alignment and evolution, 1998.
[18]
Jikai Lei, Prapaporn Techa-angkoon, and Yanni Sun. NCRNA homology search based on an extended two-dimensional chain algorithm. IEEE/ACM Trans Comput Biol Bioinform., 2012.
[19]
Saul B Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443--53, 1970.
[20]
Julie D.Thompson, Desmond G.Higgins, and Toby J.Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673--4680, 1994.
[21]
Bjarne Knudsen and Jotun Hein. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, 31(13):3423--3428, 2003.
[22]
Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, UK, 1998.
[23]
Sarah W. Burge, Jennifer Daub, Ruth Eberhardt, John Tate, Lars Barquist, Eric P. Nawrocki, Sean R. Eddy, Paul P. Gardner, and Alex Bateman. Rfam 11.0: 10 years of rna families. Nucleic Acids Research, 41(D1):D226--D232, 2013.
[24]
Andreas Wilm, Indra Mainz, and Gerhard Steger. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms for Molecular Biology, 1:19, 2006.
[25]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1):10--18, 2009.
[26]
Yanni Sun, Jeremy Buhler, and Cheng Yuan. Designing filters for fast-known NcRNA identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3):774--787, 2012.
[27]
Shaojie Zhang, Ilya Borovok, Yair Aharonowitz, Roded Sharan, and Vineet Bafna. Sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements. Bioinf., 22:e557--65, 2006.

Cited By

View all
  • (2022)A universal model of RNA.DNA:DNA triplex formation accurately predicts genome-wide RNA–DNA interactionsBriefings in Bioinformatics10.1093/bib/bbac44523:6Online publication date: 13-Oct-2022

Index Terms

  1. glu-RNA: aliGn highLy strUctured ncRNAs using only sequence similarity

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
    September 2013
    987 pages
    ISBN:9781450324342
    DOI:10.1145/2506583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 September 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning methods
    2. ncRNAs
    3. sequence alignment

    Qualifiers

    • Tutorial
    • Research
    • Refereed limited

    Conference

    BCB'13
    Sponsor:
    BCB'13: ACM-BCB2013
    September 22 - 25, 2013
    Wshington DC, USA

    Acceptance Rates

    BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A universal model of RNA.DNA:DNA triplex formation accurately predicts genome-wide RNA–DNA interactionsBriefings in Bioinformatics10.1093/bib/bbac44523:6Online publication date: 13-Oct-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media