Review Article
Published: 23 September 2014

Sequencing pools of individuals — mining genome-wide polymorphism data without big funding

Christian Schlötterer¹,
Raymond Tobler^1,2,
Robert Kofler¹ &
…
Viola Nolte¹

Nature Reviews Genetics volume 15, pages 749–763 (2014)Cite this article

53k Accesses
125 Altmetric
Metrics details

Subjects

Key Points

Whole-genome sequencing of pools of individuals (Pool-seq) is a cost-effective approach to determine genome-wide allele frequencies in an unbiased manner from a large number of individuals.
Once minimum quality criteria have been met, Pool-seq-based allele frequency estimates are accurate and reliable.
Typical issues of Pool-seq are alignment problems due to copy number variation or problems in the reference genome. The calling of low-frequency alleles is challenging owing to the difficulty in distinguishing them from sequencing errors.
Pool-seq has been successfully applied to a wide range of applications, including bulk segregant analyses, evolve and resequence studies, evolutionary genome analyses, analyses of time-series data and cancer genomics.
Owing to its cost-effectiveness, Pool-seq will continue to be a powerful tool for studies that require genome-wide allele frequency data in a large number of population samples. New technological and analytical advances will facilitate the extraction of haplotype information from Pool-seq data.

Abstract

The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Cost-effectiveness of Pool-seq.**

**Figure 2: Comparison of sequencing strategies.**

Towards population-scale long-read sequencing

Article 28 May 2021

Next-generation data filtering in the genomics era

Article 14 June 2024

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

Article Open access 29 June 2023

References

Ellegren, H. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29, 51–63 (2014).
PubMed Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
CAS PubMed PubMed Central Google Scholar
International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Google Scholar
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Weigel, D. & Mott, R. The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10, 107 (2009).
PubMed PubMed Central Google Scholar
Daetwyler, H. D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46, 858–865 (2014).
CAS PubMed Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
CAS PubMed PubMed Central Google Scholar
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Sheridan, C. Illumina claims $1,000 genome win. Nature Biotech. 32, 115 (2014).
Google Scholar
Weinstock, G. M. Genomic approaches to studying the human microbiota. Nature 489, 250–256 (2012).
CAS PubMed PubMed Central Google Scholar
Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010). This study is the first to provide a statistical framework for the analysis of Pool-seq data in population genetics.
CAS PubMed PubMed Central Google Scholar
Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol. 22, 3766–3779 (2013).
CAS PubMed Google Scholar
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).
CAS PubMed Google Scholar
Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Disease gene identification strategies for exome sequencing. Eur. J. Hum. Genet. 20, 490–497 (2012).
CAS PubMed PubMed Central Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).
CAS PubMed Google Scholar
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011).
CAS PubMed Google Scholar
Pihlstrom, L., Rengmark, A., Bjornara, K. A. & Toft, M. Effective variant detection by targeted deep sequencing of DNA pools: an example from Parkinson's disease. Ann. Hum. Genet. 78, 243–252 (2014).
CAS PubMed Google Scholar
Suvorov, A. et al. Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS ONE 8, e83547 (2013).
PubMed PubMed Central Google Scholar
Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346–350 (2008).
CAS PubMed Google Scholar
Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. & Babik, W. Accuracy of allele frequency estimation using pooled RNA-seq. Mol. Ecol. Resour. 14, 381–392 (2014).
CAS PubMed Google Scholar
Gross, J. B., Furterer, A., Carlson, B. M. & Stahl, B. A. An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE 8, e55659 (2013).
CAS PubMed PubMed Central Google Scholar
Kozak, G. M., Brennan, R. S., Berdan, E. L., Fuller, R. C. & Whitehead, A. Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution 68, 63–80 (2014).
CAS PubMed Google Scholar
Sloan, D. B. et al. De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol. Ecol. Resour. 12, 333–343 (2012).
CAS PubMed Google Scholar
Gautier, M. et al. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22, 3165–3178 (2013).
CAS PubMed Google Scholar
Arnold, B., Corbett-Detig, R. B., Hartl, D. & Bomblies, K. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22, 3179–3190 (2013).
CAS PubMed Google Scholar
Karczewski, K. J. et al. Systematic functional regulatory assessment of disease-associated variants. Proc. Natl Acad. Sci. USA 110, 9607–9612 (2013).
CAS PubMed PubMed Central Google Scholar
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
PubMed PubMed Central Google Scholar
Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
CAS PubMed PubMed Central Google Scholar
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).
CAS PubMed Google Scholar
Qanbari, S. et al. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10, e1004148 (2014).
PubMed PubMed Central Google Scholar
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631–635 (2012).
CAS PubMed Google Scholar
Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013).
CAS PubMed PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).
CAS PubMed Google Scholar
Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).
CAS PubMed PubMed Central Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
CAS PubMed PubMed Central Google Scholar
Robasky, K., Lewis, N. E. & Church, G. M. The role of replicates for error mitigation in next-generation sequencing. Nature Rev. Genet. 15, 56–62 (2014).
CAS PubMed Google Scholar
Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002). This is a comprehensive review of pooling strategies.
CAS PubMed Google Scholar
Zhu, Y., Bergland, A. O., Gonzalez, J. & Petrov, D. A. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS ONE 7, e41901 (2012).
CAS PubMed PubMed Central Google Scholar
Kofler, R. et al. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011).
CAS PubMed PubMed Central Google Scholar
Schrider, D. R., Begun, D. J. & Hahn, M. W. Detecting highly differentiated copy-number variants from pooled population sequencing. Pac. Symp. Biocomput 1, 344–344 (2013).
Google Scholar
Kapun, M., van Schalkwyk, H., McAllister, B., Flatt, T. & Schlötterer, C. Inference of chromosomal inversion dynamics from Pool-seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23, 1813–1827 (2014).
CAS PubMed Google Scholar
Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8, e1002487 (2012). This study is the first to infer TE insertion sites and the population frequency of TE insertions from Pool-seq data.
CAS PubMed PubMed Central Google Scholar
Sax, K. The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8, 552–560 (1923).
CAS PubMed PubMed Central Google Scholar
Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods 6, 550–551 (2009). This paper is the first to show that Pool-seq can be used to map induced mutations.
CAS PubMed Google Scholar
Schneeberger, K. Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nature Rev. Genet. 15, 662–676 (2014).
CAS PubMed Google Scholar
Hill, J. T. et al. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res. 23, 687–697 (2013).
CAS PubMed PubMed Central Google Scholar
Miller, A. C., Obholzer, N. D., Shah, A. N., Megason, S. G. & Moens, C. B. RNA-seq-based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23, 679–686 (2013).
CAS PubMed PubMed Central Google Scholar
Galvao, V. C. et al. Synteny-based mapping-by-sequencing enabled by targeted enrichment. Plant J. 71, 517–526 (2012).
CAS PubMed Google Scholar
Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039–1042 (2010). This study provides proof that Pool-seq provides enough power to map complex traits.
CAS PubMed PubMed Central Google Scholar
Wenger, J. W., Schwartz, K. & Sherlock, G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 6, e1000942 (2010).
PubMed PubMed Central Google Scholar
Swinnen, S. et al. Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 22, 975–984 (2012).
CAS PubMed PubMed Central Google Scholar
Wade, M. J. Epistasis, complex traits, and mapping genes. Genetica 112–113, 59–69 (2001).
Earley, E. J. & Jones, C. D. Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics 189, 1203–1209 (2011).
CAS PubMed PubMed Central Google Scholar
Bastide, H. et al. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9, e1003534 (2013). This papershows that Pool-seq allows highly accurate fine mapping using natural population samples.
CAS PubMed PubMed Central Google Scholar
Jeong, S. et al. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793 (2008).
CAS PubMed Google Scholar
Kelly, J. K., Koseva, B. & Mojica, J. P. The genomic signal of partial sweeps in Mimulus guttatus. Genome Biol. Evol. 5, 1457–1469 (2013).
CAS PubMed PubMed Central Google Scholar
Beissinger, T. M. et al. A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number. Genetics 196, 829–840 (2014).
CAS PubMed Google Scholar
Johansson, A. M., Pettersson, M. E., Siegel, P. B. & Carlborg, O. Genome-wide effects of long-term divergent selection. PLoS Genet. 6, e1001188 (2010).
PubMed PubMed Central Google Scholar
Rubin, C. J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010). This is a particularly nice demonstration of the power of Pool-seq to detect selected loci in population samples.
CAS PubMed Google Scholar
Burke, M. K. et al. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587–590 (2010). The is the first experimental evolution study measuring allele frequency changes using Pool-seq.
CAS PubMed Google Scholar
Remolina, S. C., Chang, P. L., Leips, J., Nuzhdin, S. V. & Hughes, K. A. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution 66, 3390–3403 (2012).
PubMed PubMed Central Google Scholar
Turner, T. L., Stewart, A. D., Fields, A. T., Rice, W. R. & Tarone, A. M. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011).
CAS PubMed PubMed Central Google Scholar
Zhou, D. et al. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proc. Natl Acad. Sci. USA 108, 2349–2354 (2011).
CAS PubMed PubMed Central Google Scholar
Turner, T. L. & Miller, P. M. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633–642 (2012).
PubMed PubMed Central Google Scholar
Tobler, R. et al. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31, 364–375 (2013).
PubMed PubMed Central Google Scholar
Orozco-terWengel, P. et al. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21, 4931–4941 (2012).
PubMed PubMed Central Google Scholar
Reed, L. K. et al. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197, 781–793 (2014).
CAS PubMed PubMed Central Google Scholar
Martins, N. et al. Host adaptation to viruses relies on few genes with different cross-resistance properties. Proc. Natl Acad. Sci. USA 111, 5938–5943 (2014).
CAS PubMed PubMed Central Google Scholar
Jalvingh, K. M., Chang, P. L., Nuzhdin, S. V. & Wertheim, B. Genomic changes under rapid evolution: selection for parasitoid resistance. Proc. Biol. Sci. 281, 20132303 (2014).
PubMed PubMed Central Google Scholar
Magwire, M. M. et al. Genome-wide association studies reveal a simple genetic basis of resistance to naturally coevolving viruses in Drosophila melanogaster. PLoS Genet. 8, e1003057 (2012).
CAS PubMed PubMed Central Google Scholar
Turner, T. L., Bourne, E. C., Von Wettberg, E. J., Hu, T. T. & Nuzhdin, S. V. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nature Genet. 42, 260–263 (2010). The study is the first to show that ecologically important traits can be mapped with Pool-seq by comparing two functionally diverged populations.
CAS PubMed Google Scholar
Lamichhaney, S. et al. Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 19345–19350 (2012).
CAS PubMed PubMed Central Google Scholar
Fabian, D. K. et al. Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America. Mol. Ecol. 21, 4748–4769 (2012).
PubMed PubMed Central Google Scholar
Kolaczkowski, B., Kern, A. D., Holloway, A. K. & Begun, D. J. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187, 245–260 (2011).
CAS PubMed PubMed Central Google Scholar
Cheng, C. et al. Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics 190, 1417–1432 (2012).
PubMed PubMed Central Google Scholar
Hancock, A. M. et al. Adaptations to climate in candidate genes for common metabolic disorders. PLoS Genet. 4, e32 (2008).
PubMed PubMed Central Google Scholar
Hancock, A. M. et al. Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86 (2011).
CAS PubMed Google Scholar
Fischer, M. C. et al. Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps. Mol. Ecol. 22, 5594–5607 (2013). This is a nice application of Pool-seq to find selected loci in a non-model organism.
CAS PubMed PubMed Central Google Scholar
Günther, T. & Coop, G. Robust identification of local adaptation from allele frequencies. Genetics 195, 205–220 (2013). This paper presents the first statistical framework to identify significant associations of a given locus with one or more environmental variables using Pool-seq data.
PubMed PubMed Central Google Scholar
Rubin, C. J. et al. Strong signatures of selection in the domestic pig genome. Proc. Natl Acad. Sci. USA 109, 19529–19536 (2012).
CAS PubMed PubMed Central Google Scholar
Axelsson, E. et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360–364 (2013).
CAS PubMed Google Scholar
He, Z. et al. Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 7, e1002100 (2011).
CAS PubMed PubMed Central Google Scholar
Nolte, V., Pandey, R. V., Kofler, R. & Schlötterer, C. Genome-wide patterns of natural variation reveal strong selective sweeps and ongoing genomic conflict in Drosophila mauritiana. Genome Res. 23, 99–110 (2013).
CAS PubMed PubMed Central Google Scholar
True, J. R., Mercer, J. M. & Laurie, C. C. Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142, 507–523 (1996).
CAS PubMed PubMed Central Google Scholar
Casacuberta, E. & Gonzalez, J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 1503–1517 (2013).
CAS PubMed Google Scholar
Kazazian, H. H. Jr Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).
CAS PubMed Google Scholar
Boitard, S., Schlötterer, C., Nolte, V., Pandey, R. V. & Futschik, A. Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186 (2012).
CAS PubMed PubMed Central Google Scholar
Clément, J. A. et al. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains. PLoS Negl Trop. Dis. 7, e2591 (2013).
PubMed PubMed Central Google Scholar
Foll, M. et al. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10, e1004185 (2014).
PubMed PubMed Central Google Scholar
Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013).
CAS PubMed PubMed Central Google Scholar
Barrick, J. E. & Lenski, R. E. Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb. Symp. Quant. Biol. 74, 119–129 (2009).
CAS PubMed PubMed Central Google Scholar
Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013).
PubMed PubMed Central Google Scholar
Parts, L. et al. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21, 1131–1138 (2011).
CAS PubMed PubMed Central Google Scholar
Illingworth, C. J., Parts, L., Schiffels, S., Liti, G. & Mustonen, V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol. Biol. Evol. 29, 1187–1197 (2012).
CAS PubMed Google Scholar
Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S. & Petrov, D. A. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. arXiv 1303.5044 (2014).
Traverse, C. C., Mayo-Smith, L. M., Poltak, S. R. & Cooper, V. S. Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections. Proc. Natl Acad. Sci. USA 110, E250–E259 (2013).
CAS PubMed Google Scholar
Versace, E., Nolte, V., Pandey, R. V., Tobler, R. & Schlötterer, C. Experimental evolution reveals habitat-specific fitness dynamics among Wolbachia clades in Drosophila melanogaster. Mol. Ecol. 23, 802–814 (2014).
PubMed PubMed Central Google Scholar
Barcellos-Hoff, M. H., Lyden, D. & Wang, T. C. The evolution of the cancer niche during multistage carcinogenesis. Nature Rev. Cancer 13, 511–518 (2013).
CAS Google Scholar
Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nature Rev. Cancer 6, 924–935 (2006).
CAS Google Scholar
Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).
CAS PubMed PubMed Central Google Scholar
Newburger, D. E. et al. Genome evolution during progression to breast cancer. Genome Res. 23, 1097–1108 (2013).
CAS PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
CAS PubMed PubMed Central Google Scholar
Aparicio, S. & Caldas, C. The implications of clonal genome evolution for cancer medicine. New Engl. J. Med. 368, 842–851 (2013).
CAS PubMed Google Scholar
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
CAS PubMed PubMed Central Google Scholar
Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011).
PubMed PubMed Central Google Scholar
Long, Q. et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE 6, e15292 (2011).
CAS PubMed PubMed Central Google Scholar
Kessner, D., Turner, T. L. & Novembre, J. Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol. 30, 1145–1158 (2013).
CAS PubMed PubMed Central Google Scholar
Burke, M. K., King, E. G., Shahrestani, P., Rose, M. R. & Long, A. D. Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome Biol. Evol. 6, 1–11 (2014).
PubMed Google Scholar
Eskin, I. et al. eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J. Computat. Biol. 20, 861–877 (2013).
CAS Google Scholar
Kofler, R. & Schlötterer, C. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31, 474–483 (2014).
CAS PubMed Google Scholar
Imsland, F. et al. The Rose-comb mutation in chickens constitutes a structural rearrangement causing both altered comb morphology and defective sperm motility. Plos Genetics 8, e1002775 (2012).
CAS PubMed PubMed Central Google Scholar
Del Fabbro, C., Scalabrin, S., Morgante, M. & Giorgi, F. M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE 8, e85024 (2013).
PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Google Scholar
Nevado, B., Ramos-Onsins, S. E. & Perez-Enciso, M. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23, 1764–1779 (2014).
CAS PubMed Google Scholar
Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).
CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).
CAS PubMed PubMed Central Google Scholar
Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).
CAS PubMed PubMed Central Google Scholar
Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinformatics 13, 239 (2012).
PubMed PubMed Central Google Scholar
Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, i318–i324 (2010).
CAS PubMed PubMed Central Google Scholar
Altmann, A. et al. vipR: variant identification in pooled DNA using R. Bioinformatics 27, I77–I84 (2011).
CAS PubMed PubMed Central Google Scholar
Zhou, B. Y. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 28, 2569–2575 (2012).
CAS PubMed Google Scholar
Chen, Q. & Sun, F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 14 (Suppl. 1), S1 (2013).
PubMed PubMed Central Google Scholar
Druley, T. E. et al. Quantification of rare allelic variants from pooled genomic DNA. Nature Methods 6, 263–265 (2009).
CAS PubMed PubMed Central Google Scholar
Vallania, F. L. et al. High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res. 20, 1711–1718 (2010).
CAS PubMed PubMed Central Google Scholar
Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011).
CAS PubMed PubMed Central Google Scholar
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012).
Calvo, S. E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nature Genet. 42, 851–858 (2010).
CAS PubMed Google Scholar
Fiston-Lavier, A.-S., Barron, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. bioRxiv http://dx.doi.org/10.1101/002964 (2014).
Zhuang, J., Wang, J., Theurkauf, W. & Weng, Z. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).
CAS PubMed PubMed Central Google Scholar
Kofler, R., Pandey, R. V. & Schlötterer, C. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-seq). Bioinformatics 27, 3435–3436 (2011).
CAS PubMed PubMed Central Google Scholar
Boitard, S. et al. Pool-HMM: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13, 337–340 (2013).
PubMed PubMed Central Google Scholar
Ferretti, L., Ramos-Onsins, S. E. & Perez-Enciso, M. Population genomics from pool sequencing. Mol. Ecol. 22, 5561–5576 (2013).
PubMed Google Scholar
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).
PubMed PubMed Central Google Scholar
Vitalis, R., Gautier, M., Dawson, K. J. & Beaumont, M. A. Detecting and measuring selection from gene frequency data. Genetics 196, 799–817 (2014).
PubMed Google Scholar
Gautier, M. & Vitalis, R. Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013).
CAS PubMed Google Scholar
Feder, A. F., Petrov, D. A. & Bergland, A. O. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS ONE 7, e48588 (2012).
CAS PubMed PubMed Central Google Scholar
Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J. & Hobert, O. CloudMap: a cloud-based pipeline for analysis of mutant genome sequences. Genetics 192, 1249–1269 (2012).
CAS PubMed PubMed Central Google Scholar
Edwards, M. D. & Gifford, D. K. High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (Suppl. 6), S8 (2012).
PubMed PubMed Central Google Scholar
Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L. & Harris, M. P. Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, 1017–1024 (2012).
CAS PubMed PubMed Central Google Scholar
Austin, R. S. et al. Next-generation mapping of Arabidopsis genes. Plant J. 67, 715–725 (2011).
CAS PubMed Google Scholar
Leshchiner, I. et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 22, 1541–1548 (2012).
CAS PubMed PubMed Central Google Scholar
Prosperi, M. C. & Salemi, M. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics 28, 132–133 (2012).
CAS PubMed Google Scholar
Zagordi, O., Bhattacharya, A., Eriksson, N. & Beerenwinkel, N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011).
PubMed PubMed Central Google Scholar
Eyre, D. W. et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput. Biol. 9, e1003059 (2013).
CAS PubMed PubMed Central Google Scholar
Astrovskaya, I. et al. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12 (Suppl. 6), S1 (2011).
PubMed PubMed Central Google Scholar
Yang, X., Charlebois, P., Macalalad, A., Henn, M. R. & Zody, M. C. V-Phaser 2: variant inference for viral populations. BMC Genomics 14, 674 (2013).
CAS PubMed PubMed Central Google Scholar
Töpfer, A. et al. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput. Biol. 10, e1003515 (2014).
PubMed PubMed Central Google Scholar
Töpfer, A. et al. Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20, 113–123 (2013).
PubMed PubMed Central Google Scholar
Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N. & Roth, V. HIV haplotype inference using a constraint-based Dirichlet process mixture model. Machine Learning in Computational Biology NIPS Workshop (2010).
Google Scholar
Pandey, R. V., Kofler, R., Orozco-terWengel, P., Nolte, V. & Schlötterer, C. PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011).
CAS PubMed PubMed Central Google Scholar
Chen, X., Listman, J. B., Slack, F. J., Gelernter, J. & Zhao, H. Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet. Epidemiol. 36, 549–560 (2012).
PubMed PubMed Central Google Scholar
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 11, 396–398 (2014).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors apologize to all colleagues who were not cited owing to space limitations. They are grateful to all colleagues who shared unpublished manuscripts, especially D. Kessner, Q. Long, M. Pérez Enciso, A. S. Fiston-Lavier and K. Schneeberger for comments and discussions. They thank members of the Institut für Populationsgenetik, in particular A. Betancourt, M. Dolezal, A. Futschik and A. Kalinka for discussion and comments on earlier versions of the manuscript. This work has been supported by the ERC (ArchAdapt) and the Austrian Science Funds (FWF, W1225).

Author information

Authors and Affiliations

Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, Vienna, 1210, Austria
Christian Schlötterer, Raymond Tobler, Robert Kofler & Viola Nolte
Vienna Graduate School of Population Genetics,
Raymond Tobler

Authors

Christian Schlötterer
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Tobler
View author publications
You can also search for this author in PubMed Google Scholar
Robert Kofler
View author publications
You can also search for this author in PubMed Google Scholar
Viola Nolte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Schlötterer.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Table 1

PowerPoint slide for Table 2

PowerPoint slide for Table 3

Glossary

Next-generation sequencing: (NGS; also known as second-generation sequencing). An umbrella term for different sequencing platforms delivering millions of short DNA sequence reads.
Reads: DNA sequences that are generated by next-generation sequencing.
Pool-seq: A sequencing technique in which sequencing libraries are not prepared from DNA of a single individual or cell but from a mixture of DNA fragments originating from different individuals or cells. In the context of this Review, Pool-seq is used to describe the unbiased sequencing of the entire genome.
Coverage: The number of reads that span a given genomic position.
Sequencing libraries: Sets of fragmented DNA extracted from one or more individuals that serve as the template for subsequent sequencing.
Exome sequencing: A sequencing approach in which the complexity of the genome is reduced through hybridization to exonic sequences, which results in a higher sequence coverage of protein-coding regions.
Restriction-site-associated DNA markers: Sequence polymorphisms in close proximity to a restriction enzyme recognition site.
Linkage disequilibrium: (LD). Nonrandom association between alleles at two loci. In outcrossing diploid individuals, the genotypes need to be sorted into haplotypes in a statistical procedure called phasing.
Genetic markers: Polymorphic loci that could be scored with a genotyping technique.
F₂ analysis: Analysis of mapping populations generated by the F₂ design. The F₁ progeny from crossing two phenotypically different parental strains are themselves crossed to produce an F₂ population that is segregating for the phenotype of interest. The F₂ mapping population may carry up to three genotypes at every marker and therefore allows the detection of additive and dominance effects, as well as interactions between loci.
Phased genomic sequences: Genome sequences for which the haplotype phase (that is, the combination of alleles or genetic markers that coexist on a single chromosome) has been determined.
Imputation: In statistics, it refers to the replacement of missing data with values. In genomics, it describes the use of haplotype sequences to fill in missing sequence information.
Haplotypes: The combination of alleles or genetic markers that coexist on a single chromosome. Chromosomal regions carrying a haplotype are inherited as intact physical units until they are broken up by recombination.
Pool genome-wide association studies: (Pool-GWASs). Genotype–phenotype mapping studies in which phenotypically extreme individuals are grouped and sequenced as pools. Causative variants are identified by contrasting the allele frequencies between the pools.
Evolve and resequence studies: Studies that combine experimental evolution with next-generation sequencing. They make use of controlled environmental, demographic and selective variables to facilitate genotype–phenotype mapping.
Forward genetics: An approach in which mutations induced by random mutagenesis that lead to the disruption of gene function are identified based on their phenotypes. The causative mutation is traditionally identified by positional cloning or by a candidate-gene approach.
Bulk segregant analysis: (BSA). Analysis in which offspring from diverged parents are phenotyped and the DNA of individuals from opposing tails of the phenotypic distribution is combined (pooled). Causative variants are identified by contrasting allele frequency differences among the pools.
Epistatic interactions: Non-additive interactions between genes in which the effect of an allele at one locus is modified by the genotypes at other loci in the genome. The resulting phenotype is different from that expected by summing the independent effects of the individual loci.
Introgress: Introducing a genomic region from one strain or species into that of another by repeated backcrossing. By selecting for the phenotype of interest, the genomes become isogenic except for the chromosomal regions causing the selected phenotype.
Paired-end reads: DNA fragments that were sequenced from both ends, yielding pairs of reads that are separated by a defined distance that is dependent on the library preparation protocol.
Soft clipping: Substrings at either end of reads that were not aligned with a local alignment algorithm and are thereby excluded in the subsequent analysis.
Proper pairs: Paired-end reads where both pairs can be mapped to the same chromosomes within a distance pre-specified by the insert size chosen during library preparation.
Broken pairs: Paired-end reads that do not map as proper pairs.
Mapping quality: Log (base 10) transformed measure of the probability that a read is incorrectly mapped multiplied by 10.
Base quality: Log (base 10) transformed measure of the probability that a given base call is incorrect multiplied by 10.
Insertions and deletions: (Indels). DNA sequences that have been inserted or deleted from a genomic region. As only phylogenetic analysis allows the distinction between insertions and deletions, indel has been used as an indifferent term.
Strand bias: A variant that is significantly more likely to occur within reads that originate from one of the two strands of DNA.
GWASs: Trait mapping studies that rely on a statistical test to determine associations between sequence variants and a given phenotype in natural populations.
Cline: The gradual change in phenotypes or allele frequencies along a geographical or environmental gradient.
Hitchhiking: The population genetic mechanism by which a neutral, or in some cases slightly deleterious, mutation increases in population frequency solely as a result of physical linkage with a positively selected mutation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schlötterer, C., Tobler, R., Kofler, R. et al. Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nat Rev Genet 15, 749–763 (2014). https://doi.org/10.1038/nrg3803

Download citation

Published: 23 September 2014
Issue Date: November 2014
DOI: https://doi.org/10.1038/nrg3803

Sequencing pools of individuals — mining genome-wide polymorphism data without big funding

Subjects

Key Points

Abstract

Access options

Similar content being viewed by others

Towards population-scale long-read sequencing

Next-generation data filtering in the genomics era

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Table 1

PowerPoint slide for Table 2

PowerPoint slide for Table 3

Glossary

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

PowerPoint slides

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links