[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2649387.2649427acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

FStitch: a fast and simple algorithm for detecting nascent RNA transcripts

Published: 20 September 2014 Publication History

Abstract

We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels, which are affected by transcription, post-transcriptional processing, and RNA stability. A detailed study of GRO-seq data has the potential to inform on many aspects of the transcription process. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, a hidden Markov model (HMM) and logistic regression to robustly classify which regions of the genome are transcribed. Our algorithm builds on the strengths of previous approaches but is accurate, dependent on very little training data, robust to varying read depth, annotation agnostic, and fast.

References

[1]
M. A. Allen, Z. Andrysik, V. L. Dengler, H. S. Mellert, A. Guarnieri, J. A. Freeman, K. D. Sullivan, M. D. Galbraith, X. Luo, W. L. Kraus, R. D. Dowell, and J. M. Espinosa. Global analysis of p53-regulated transcription identifies its direct targets and unexpected regulatory mechanisms. eLife, 3, 2014.
[2]
K. A. Allison, M. U. Kaikkonen, T. Gaasterland, and C. K. Glass. Vespucci: a system for building annotated databases of nascent transcripts. Nucleic Acids Res., 42(4):2433--2447, Feb 2014.
[3]
K. Anamika, A. Gyenis, and L. Tora. How to stop: The mysterious links among RNA polymerase II occupancy 3' of genes, mRNA 3' processing and termination. Transcription, 4(1):7--12, 2013.
[4]
A. G. Arimbasseri, K. Rijal, and R. J. Maraia. Comparative overview of RNA polymerase II and III transcription cycles, with focus on RNA polymerase III termination and reinitiation. Transcription, 4(6), Dec 2013.
[5]
N. Bouguila and D. Ziou. A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Trans Image Process, 15(9):2657--2668, Sep 2006.
[6]
L. H. Chadwick. The NIH Roadmap Epigenomics Program data resource. Epigenomics, 4(3):317--324, Jun 2012.
[7]
L. J. Core, J. J. Waterfall, and J. T. Lis. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science, 322(5909):1845--1848, Dec 2008.
[8]
S. Dreiseitl and L. Ohno-Machado. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform, 35(5--6):352--359, 2002.
[9]
S. Frietze, R. Wang, L. Yao, Y. G. Tak, Z. Ye, M. Gaddis, H. Witt, P. J. Farnham, and V. X. Jin. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome Biol., 13(9):R52, 2012.
[10]
M. J. Fullwood, M. H. Liu, Y. F. Pan, J. Liu, H. Xu, Y. B. Mohamed, Y. L. Orlov, S. Velkov, A. Ho, P. H. Mei, E. G. Chew, P. Y. Huang, W. J. Welboren, Y. Han, H. S. Ooi, P. N. Ariyaratne, V. B. Vega, Y. Luo, P. Y. Tan, P. Y. Choy, K. D. Wansa, B. Zhao, K. S. Lim, S. C. Leow, J. S. Yow, R. Joseph, H. Li, K. V. Desai, J. S. Thomsen, Y. K. Lee, R. K. Karuturi, T. Herve, G. Bourque, H. G. Stunnenberg, X. Ruan, V. Cacheux-Rataboul, W. K. Sung, E. T. Liu, C. L. Wei, E. Cheung, and Y. Ruan. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462(7269):58--64, Nov 2009.
[11]
N. Hah, C. G. Danko, L. Core, J. J. Waterfall, A. Siepel, J. T. Lis, and W. L. Kraus. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell, 145(4):622--634, May 2011.
[12]
H. H. He, C. A. Meyer, M. W. Chen, V. C. Jordan, M. Brown, and X. S. Liu. Differential DNase I hypersensitivity reveals factor-dependent chromatin dynamics. Genome Res., 22(6):1015--1025, Jun 2012.
[13]
D. Hu, E. R. Smith, A. S. Garruss, N. Mohaghegh, J. M. Varberg, C. Lin, J. Jackson, X. Gao, A. Saraf, L. Florens, M. P. Washburn, J. C. Eissenberg, and A. Shilatifard. The little elongation complex functions at initiation and elongation phases of snRNA gene transcription. Mol. Cell, 51(4):493--505, Aug 2013.
[14]
X. Ji, Y. Zhou, S. Pandit, J. Huang, H. Li, C. Y. Lin, R. Xiao, C. B. Burge, and X. Fu. SR proteins collaborate with 7SK and promoter-associated nascent RNA to release paused polymerase. Cell, 153(4):855--868, 2013.
[15]
R. Joseph, Y. L. Orlov, M. Huss, W. Sun, S. L. Kong, L. Ukil, Y. F. Pan, G. Li, M. Lim, J. S. Thomsen, Y. Ruan, N. D. Clarke, S. Prabhakar, E. Cheung, and E. T. Liu. Integrative model of genomic factors for determining binding site selection by estrogen receptor. Mol. Syst. Biol., 6:456, Dec 2010.
[16]
B. Langmead. Aligning short sequencing reads with Bowtie. Curr Protoc Bioinformatics, Chapter 11:Unit 11.7, Dec 2010.
[17]
E. Larschan, E. P. Bishop, P. V. Kharchenko, L. J. Core, J. T. Lis, P. J. Park, and M. I. Kuroda. X chromosome dosage compensation via enhanced transcriptional elongation in drosophila. Nature, 471(7336):115--118, March 2011.
[18]
W. Li, D. Notani, Q. Ma, B. Tanasa, E. Nunez, A. Y. Chen, D. Merkurjev, J. Zhang, K. Ohgi, X. Song, S. Oh, H. S. Kim, C. K. Glass, and M. G. Rosenfeld. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature, 498(7455):516--520, Jun 2013.
[19]
A. McCallum, D. Freitag, and F. Pereira. Maximum Entropy Markov Models for Information Extraction and Segmentation. 17th International Conf. on Machine Learning, 2000.
[20]
G. J. McLachlan and P. N. Jones. Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics, 44(2):571--578, Jun 1988.
[21]
M. F. Melgar, F. S. Collins, and P. Sethupathy. Discovery of active enhancers through bidirectional expression of short transcripts. Genome Biol., 12(11):R113, 2011.
[22]
I. M. Min, J. J. Waterfall, L. J. Core, R. J. Munroe, J. Schimenti, and J. T. Lis. Regulating rna polymerase pausing and transcription elongation in embryonic stem cells. Genes & Development, 25(7):742--754, 2011.
[23]
S. Moon and J. N. Hwang. Robust speech recognition based on joint model and feature space optimization of hidden Markov models. IEEE Trans Neural Netw, 8(2):194--204, 1997.
[24]
K. Ogoshi, S. Hashimoto, Y. Nakatani, W. Qu, K. Oshima, K. Tokunaga, S. Sugano, M. Hattori, S. Morishita, and K. Matsushima. Genome-wide profiling of DNA methylation in human cancer cells. Genomics, 98(4):280--287, Oct 2011.
[25]
A. Podsiado, M. Wrzesie, W. Paja, W. Rudnicki, and B. Wilczyski. Active enhancer positions can be accurately predicted from chromatin marks and collective sequence motif data. BMC Syst Biol, 7 Suppl 6:S16, 2013.
[26]
D. Wang, I. Garcia-Bassets, C. Benner, W. Li, X. Su, Y. Zhou, J. Qiu, W. Liu, M. U. Kaikkonen, K. A. Ohgi, C. K. Glass, M. G. Rosenfeld, and X. Fu. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature, 474(7351):390--394, May 2011.

Cited By

View all
  • (2022)Integrated genomics approaches identify transcriptional mediators and epigenetic responses to Afghan desert particulate matter in small airway epithelial cellsPhysiological Genomics10.1152/physiolgenomics.00063.202254:10(389-401)Online publication date: 1-Oct-2022
  • (2020)Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq dataPLOS ONE10.1371/journal.pone.023233215:4(e0232332)Online publication date: 30-Apr-2020
  • (2018)Detecting Differential Transcription Factor Activity from ATAC-Seq DataMolecules10.3390/molecules2305113623:5(1136)Online publication date: 10-May-2018
  • Show More Cited By

Index Terms

  1. FStitch: a fast and simple algorithm for detecting nascent RNA transcripts

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
    September 2014
    851 pages
    ISBN:9781450328944
    DOI:10.1145/2649387
    • General Chairs:
    • Pierre Baldi,
    • Wei Wang
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 September 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hidden markov models
    2. logisitic regression
    3. nascent transcription

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    BCB '14
    Sponsor:
    BCB '14: ACM-BCB '14
    September 20 - 23, 2014
    California, Newport Beach

    Acceptance Rates

    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Integrated genomics approaches identify transcriptional mediators and epigenetic responses to Afghan desert particulate matter in small airway epithelial cellsPhysiological Genomics10.1152/physiolgenomics.00063.202254:10(389-401)Online publication date: 1-Oct-2022
    • (2020)Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq dataPLOS ONE10.1371/journal.pone.023233215:4(e0232332)Online publication date: 30-Apr-2020
    • (2018)Detecting Differential Transcription Factor Activity from ATAC-Seq DataMolecules10.3390/molecules2305113623:5(1136)Online publication date: 10-May-2018
    • (2017)An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-SeqIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2016.252091914:5(1070-1081)Online publication date: 1-Sep-2017
    • (2016)A generative model for the behavior of RNA polymeraseBioinformatics10.1093/bioinformatics/btw59933:2(227-234)Online publication date: 23-Sep-2016
    • (2016)RNA Pol II transcription model and interpretation of GRO-seq dataJournal of Mathematical Biology10.1007/s00285-016-1014-474:1-2(77-97)Online publication date: 3-May-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media