Abstract
Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
£139.00 per year
only £11.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Change history
25 November 2013
In the version of this article initially published, the accession code for data was left out. The error has been corrected in the HTML and PDF versions of the article.
References
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Au, K.F., Underwood, J.G., Lee, L. & Wong, W.H. Improving PacBio long read accuracy by short read alignment. PLoS ONE 7, e46679 (2012).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).
Tilgner, H. et al. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3 (Bethesda) 3, 387–397 (2013).
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
R Development Core Team. R: A language and environment for statistical computing http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2012).
van Bakel, H., Nislow, C., Blencowe, B.J. & Hughes, T.R. Most “dark matter” transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010).
Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).
Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).
Fagnani, M. et al. Functional coordination of alternative splicing in the mammalian central nervous system. Genome Biol. 8, R108 (2007).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Parra, G., Blanco, E. & Guigó, R. GeneID in Drosophila. Genome Res. 10, 511–515 (2000).
Eyras, E., Caccamo, M., Curwen, V. & Clamp, M. ESTGenes: alternative splicing from ESTs in Ensembl. Genome Res. 14, 976–987 (2004).
Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Gingeras, T. Missing lincs in the transcriptome. Nat. Biotechnol. 27, 346–347 (2009).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Himmelmann, L. & R Development Core Team. R: A language and environment for statistical computing http://cran.r-project.org/web/packages/HMM/HMM.pdf (R Foundation for Statistical Computing, Vienna, Austria, 2010).
Acknowledgements
We thank J. Eid and L. Hickey at Pacific Biosciences for providing alignment statistics of reads to reference genomes. We thank J. Kelley, C. Araya, D. Phanstiel, S. Shringarpure and M. Sikora at Stanford as well as J. Korlach at Pacific Biosciences for comments on this manuscript. We would like to also thank T. Daley and A. Smith at USC for advice on modeling library complexity. This work was supported by US National Institutes of Health grants 5P01GM099130-02, 5U54HG00699602-02 and 5U01HL107393-03, and by the US National Institues of Health training grant 5 T32 HD07149.
Author information
Authors and Affiliations
Contributions
All authors proposed the project. D.S. devised and performed experiments and wrote the first version of the introduction. H.T. devised and performed analysis, prepared figures and wrote the first version of results and discussion. All authors discussed experiments and analysis and collaborated on the final version.
Corresponding author
Ethics declarations
Competing interests
M.S. is on the scientific advisory board of Personalis and GenapSys. All other authors declare no competing interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–6 (PDF 904 kb)
Rights and permissions
About this article
Cite this article
Sharon, D., Tilgner, H., Grubert, F. et al. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31, 1009–1014 (2013). https://doi.org/10.1038/nbt.2705
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.2705
This article is cited by
-
Full-length transcriptome and RNA-Seq analyses reveal the resistance mechanism of sesame in response to Corynespora cassiicola
BMC Plant Biology (2024)
-
Direct transposition of native DNA for sensitive multimodal single-molecule sequencing
Nature Genetics (2024)
-
Full-length transcriptome characterization and comparative analysis of Gleditsia sinensis
BMC Genomics (2023)
-
De novo full-length transcriptome analysis of two ecotypes of Phragmites australis (swamp reed and dune reed) provides new insights into the transcriptomic complexity of dune reed and its long-term adaptation to desert environments
BMC Genomics (2023)
-
NtMYB12 requires for competition between flavonol and (pro)anthocyanin biosynthesis in Narcissus tazetta tepals
Molecular Horticulture (2023)