Abstract
There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts1,2,3,4. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise5,6. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified ∼1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
£199.00 per year
only £3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004)
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002)
Rinn, J. L. et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003)
Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007)
Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)
Brannan, C. I., Dees, E. C., Ingram, R. S. & Tilghman, S. M. The product of the H19 gene may function as an RNA. Mol. Cell. Biol. 10, 28–36 (1990)
Brown, C. J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991)
Lee, J. T., Davidow, L. S. & Warshawsky, D. Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genet. 21, 400–404 (1999)
Sotomaru, Y. et al. Unregulated expression of the imprinted genes H19 and Igf2r in mouse uniparental fetuses. J. Biol. Chem. 277, 12474–12478 (2002)
Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007)
Willingham, A. T. et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005)
Wang, J. et al. Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature 431 1–2 10.1038/nature03016 (2004)
Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140–D144 (2006)
Tam, O. H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008)
Watanabe, T. et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539–543 (2008)
Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl Acad. Sci. USA 104, 19428–19433 (2007)
Lin, M. F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007)
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)
Su, A. I. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA 99, 4465–4470 (2002)
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)
Tanay, A., Sharan, R. & Shamir, R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 (Suppl 1). S136–S144 (2002)
Chang, H. Y. et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl Acad. Sci. USA 102, 3738–3743 (2005)
Carrio, M., Arderiu, G., Myers, C. & Boudreau, N. J. Homeobox D10 induces phenotypic reversion of breast tumor cells in a three-dimensional culture model. Cancer Res. 65, 7177–7185 (2005)
Ventura, A. et al. Cre-lox-regulated conditional RNA interference from transgenes. Proc. Natl Acad. Sci. USA 101, 10380–10385 (2004)
Loh, Y. H. et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genet. 38, 431–440 (2006)
Ivanova, N. et al. Dissecting self-renewal in stem cells with RNA interference. Nature 442, 533–538 (2006)
Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008)
Acknowledgements
We would like to thank our colleagues at the Broad Institute, especially J. P. Mesirov for discussions and statistical insights, X. Xie for statistical help with conservation analyses, J. Robinson for visualization help, M. Ku, E. Mendenhall and X. Zhang for help generating ChIP samples, and N. Novershtern and A. Levy for providing transcription factor lists. M. Guttman is a Vertex scholar, I.A. acknowledges the support of the Human Frontier Science Program Organization. This work was funded by Beth Israel Deaconess Medical Center, National Human Genome Research Institute, and the Broad Institute of MIT and Harvard.
Author Contributions J.L.R., E.S.L., A.R. and M. Guttman conceived and designed experiments. The manuscript was written by M. Guttman, A.R., J.L.R. and E.S.L. J.L.R., I.A., C.F., D.F., M.H., B.W.C., J.P.C. and M. Guttman performed molecular biology experiments. All data analyses were performed by M. Guttman in conjunction with M. Garber (conservation analyses), M.F.L. (codon substitution frequency), T.S.M. (ChlP-seq data), O.Z. (motif analysis) and M.N.C. (lincRNA genomic location analysis). Reagents were provided by M. Garber (pre-published conservation analysis tools); T.J. and D.F. (p53 wild-type and knockout MEFs); N.H., A.R. and I.A. (dendritic cell stimulated time course); B.E.B. (ChlP data); R.J., B.W.C. and J.P.C. (luciferase assays); and M.K. and M.F.L. (codon substitution frequency code).
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary Figures
This file contains Supplementary Figures 1-11 with Legends (PDF 2081 kb)
Supplementary Information
This file contains Supplementary Methods and Supplementary References (PDF 147 kb)
Supplementary Table 1
In Supplementary Table 1 the K4-K36 domain coordinates are shown and the K4-K36 enriched domains in the 4 mouse cell types are listed. Coordinates are indicated in mouse genome build MM8. (XLS 107 kb)
Supplementary Table 2
In Supplementary Table 2 the lincRNA Exon Coordinates and Pi LOD Enrichment Score are shown. lincRNA exons defined by Nimbelegen tiling micorarrays are listed in mouse genome build MM9. Each exon has an associated Pi LOD Enrichment Score (Methods) reported. (XLS 174 kb)
Supplementary Table 3
In Supplementary Table 3 the characteristic properties of lincRNAs are shown. (DOC 36 kb)
Supplementary Table 4
In Supplementary Table 4 the PCR validation primer sequences are shown. Primer sequences used for validation of lincRNA expression by PCR and qPCR are reported. (XLS 31 kb)
Supplementary Table 5
In Supplementary Table 5 the Northern blot analysis probe sequences and primers are shown. Primers and amplicons for Northern blot analyses are provided. The correct file for Supplementary Table 5 was uploaded on 4th March, 2009. (XLS 27 kb)
Supplementary Table 6
In Supplementary Table 6 the Codon Substitution Frequency (CSF) Scores are shown. The CSF score for each K4-K36 domain is provided. Coordinates are reported in mouse genome build MM9. An updated version for Suplementary Table 6 was uploaded on 4th March, 2009 (XLS 122 kb)
Supplementary Table 7
In Supplementary Table 7 the Exon conservation for lincRNAs and other annotations are shown. Pi LOD Enrichment scores are provided for lincRNA exons and other annotations compared in the text. The coordinates are provided in Mouse genome MM9 and the max 12-mer LOD score as well as the randomized average max 12-mer LOD score is indicated along with the enrichment score. (XLS 836 kb)
Supplementary Table 8
In Supplementary Table 8 the lincRNA Promoter Conservation is shown. Pi LOD Enrichment scores are provided for each lincRNA promoter region, protein coding gene promoters, and random intergenic regions. Coordinates are provided in Mouse genome build MM9. (XLS 634 kb)
Supplementary Table 9
In Supplementary Table 9 the Human and Mouse orthologous lincRNAs are shown. lincRNAs defined in Human Lung Fibroblasts were lifted into the mouse genome (MM8) and enrichment statistics were computed for Mouse Lung Fibroblasts (Methods). The enrichment p-values and fold are indicated. (XLS 28 kb)
Supplementary Table 10
In Supplementary Table 10 the lincRNA expression across mouse tissue compendium is shown. lincRNA expression levels across various mouse cell types, tissues, and conditions are provided. The values are log values of the relative expression of each lincRNA. (XLS 420 kb)
Supplementary Table 11
In Supplementary Table 11 the Gene Set Enrichment Analysis (GSEA) association matrix is shown. Functional associations between lincRNAs (columns) and MSigDB terms (rows) are indicated. Positive association is indicated by a 1, negative association is indicated by an -1, and no association is indicated by a 0. (TXT 6203 kb)
Supplementary Table 12
In Supplementary Table 12 the P53 regulated lincRNAs upon DNA Damage Induction are shown. lincRNAs that temporally increase inP53 wild-type cells compared with P53 Knock-out cells upon stimulation with DNA damage are indicated along with their expression levels across the DNA damage time course. (XLS 26 kb)
Supplementary Table 13
In Supplementary Table 13 the P53 Motif Enrichments in induced lincRNAs are shown. P53 motif scores are provided for each lincRNA promoter along with the sequence of the best motif hit and its conservation. P53 induced lincRNAs are indicated in the last column. (XLS 347 kb)
Supplementary Table 14
In Supplementary Table 14 the NFKB regulated lincRNAs are shown. lincRNAs that are differentially expressed in TLR4 stimulation of BMDC cells compared with unstimulated BMDC cells are provided. (XLS 23 kb)
Supplementary Table 15
In Supplementary Table 15 the ES cells lincRNAs bound by Oct4 and/or Nanog are shown: The coordinates of the lincRNAs bound by Oct4/Nanog in ES cells is provided. (XLS 17 kb)
Supplementary Table 16
In Supplementary Table 16 the functional association of lincENC1 is shown. GSEA results for lincENC1 is provided for both profiled exons in the transcript. (XLS 23 kb)
Supplementary Table 17
In Supplementary Table 17the Enrichment of Gene Ontology (GO) terms for lincRNA neighbors is shown. Significant GO terms (FDR<.05) are indicated along with their associated p-values. (XLS 22 kb)
Rights and permissions
About this article
Cite this article
Guttman, M., Amit, I., Garber, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009). https://doi.org/10.1038/nature07672
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature07672