Abstract
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
£169.00 per year
only £14.08 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Primary accessions
Sequence Read Archive
Referenced accessions
NCBI Reference Sequence
References
Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).
Parkhill, J. & Wren, B.W. Bacterial epidemiology and biology—lessons from genome sequencing. Genome Biol. 12, 230 (2011).
Gagarinova, A. & Emili, A. Genome-scale genetic manipulation methods for exploring bacterial molecular biology. Mol. Biosyst. 8, 1626–1638 (2012)10.1039/C2MB25040C .
Loman, N.J. et al. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat. Rev. Microbiol. 10, 599–606 (2012).
Ricker, N., Qian, H. & Fulthorpe, R.R. The limitations of draft assemblies for understanding prokaryotic adaptation and evolution. Genomics 100, 167–175 (2012)10.1016/j.ygeno.2012.06.009.
Siguier, P., Filée, J. & Chandler, M. Insertion sequences in prokaryotic genomes. Curr. Opin. Microbiol. 9, 526–531 (2006).
Srikhanta, Y.N., Fox, K.L. & Jennings, M.P. The phasevarion: phase variation of type III DNA methyltransferases controls coordinated switching in multiple genes. Nat. Rev. Microbiol. 8, 196–206 (2010)10.1038/nrmicro2283.
Toussaint, A. & Chandler, M. Prokaryote genome fluidity: toward a system approach of the mobilome. Methods Mol. Biol. 804, 57–80 (2012)10.1007/978-1-61779-361-5_4.
Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
Salzberg, S.L. et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).
Fraser, C.M., Eisen, J.A., Nelson, K.E., Paulsen, I.T. & Salzberg, S.L. The value of complete microbial genome sequencing (you get what you pay for). J. Bacteriol. 184, 6403–6405 (2002).
English, A.C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7, e47768 (2012)10.1371/journal.pone.0047768.
Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).
Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 30, 701–707 (2012).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Ribeiro, F.J. et al. Finished bacterial genomes from shotgun sequence data. Genome Res. 22, 2270–2277 (2012).
Sommer, D.D., Delcher, A.L., Salzberg, S.L. & Pop, M. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).
Treangen, T.J., Sommer, D.D., Angly, F.E., Koren, S. & Pop, M. Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics 33, 11.8 (2011)10.1002/0471250953.bi1108s33.
Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Tindall, B.J. et al. Complete genome sequence of Meiothermus ruber type strain (21T). Stand. Genomic Sci. 3, 26–36 (2010)10.4056/sigs.1032748.
Han, C. et al. Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T). Stand. Genomic Sci. 1, 54–62 (2009)10.4056/sigs.22138.
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Ariyadasa, R. & Stein, N. Advances in BAC-based physical mapping and map integration strategies in plants. J. Biomed. Biotechnol. 2012, 184854 (2012)10.1155/2012/184854.
Liu, G.E., Alkan, C., Jiang, L., Zhao, S. & Eichler, E.E. Comparative analysis of Alu repeats in primate genomes. Genome Res. 19, 876–885 (2009).
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Rieder, M.J., Taylor, S.L., Tobe, V.O. & Nickerson, D.A. Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome. Nucleic Acids Res. 26, 967–973 (1998).
Loomis, E.W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013)10.1101/gr.141705.112.
Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012)10.2144/0000113891.
Carneiro, M.O. et al. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012)10.1186/1471-2164-13-375.
Chain, P.S.G. et al. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).
Murray, I.A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).
Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2010).
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007)10.1093/bioinformatics/btm039.
Chaisson, M.J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012)10.1186/1471-2105-13-238.
Lee, C., Grasso, C. & Sharlow, M.F. Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002).
Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Rausch, T. et al. A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics 25, 1118–1124 (2009)10.1093/bioinformatics/btp131.
Huang, X. An improved sequence assembly program. Genomics 33, 21–31 (1996).
Kelley, D.R., Schatz, M.C. & Salzberg, S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116 (2010).
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Korlach, J. et al. Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 472, 431–455 (2010).
Acknowledgements
We thank S. Clingenpeel (Joint Genome Institute) for growing cultures and performing DNA extraction for M. ruber and P. heparinus; B. Munson and F. Antonacci for assistance with the BAC library construction; and K. Travers, S. McCalmon, M. Wang, U. Nguyen, S. Ranade, M. Ashby, L. Hon and L. Hickey (Pacific Biosciences) for assistance in sample preparation, sequencing and data analysis. The authors acknowledge the ATCC for providing the E. coli K-12 MG1655 strain. We thank S. Koren and A. Phillippy for pointing out to us the SMRT sequencing–based gap-filling functionality development in the Celera Assembler. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.
Author information
Authors and Affiliations
Contributions
C.-S.C., A. Copeland., E.E.E., S.W.T. & J.K. designed the experiments; C.-S.C., D.H.A., P.M., A.A.K., J.D., A. Clum and J.H. analyzed data; C.H. performed the validation sequencing; and C.-S.C., D.H.A., P.M., A.A.K., A. Copeland., E.E.E. and J.K. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
C.-S.C., D.H.A., P.M., A.A.K., J.D., C.H., S.W.T. and J.K. are employees of Pacific Biosciences, a company commercializing DNA sequencing technologies.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–15, Supplementary Tables 1–5 and Supplementary Notes 1 and 2 (PDF 3942 kb)
Rights and permissions
About this article
Cite this article
Chin, CS., Alexander, D., Marks, P. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569 (2013). https://doi.org/10.1038/nmeth.2474
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2474
This article is cited by
-
Genomic insights into Penicillium chrysogenum adaptation to subseafloor sedimentary environments
BMC Genomics (2024)
-
Genomic features of a plant growth-promoting endophytic Enterobacter cancerogenus JY65 dominant in microbiota of halophyte Suaeda salsa
Plant and Soil (2024)
-
Genomic and biological characteristics of a novel lytic phage, vB_MscM-PMS3, infecting the opportunistic zoonotic pathogen Mammaliicoccus sciuri
Archives of Virology (2024)
-
Activation of secondary metabolite gene clusters in Chaetomium olivaceum via the deletion of a histone deacetylase
Applied Microbiology and Biotechnology (2024)
-
Conserved chromatin and repetitive patterns reveal slow genome evolution in frogs
Nature Communications (2024)