Key Points
-
Whole-genome sequencing of pools of individuals (Pool-seq) is a cost-effective approach to determine genome-wide allele frequencies in an unbiased manner from a large number of individuals.
-
Once minimum quality criteria have been met, Pool-seq-based allele frequency estimates are accurate and reliable.
-
Typical issues of Pool-seq are alignment problems due to copy number variation or problems in the reference genome. The calling of low-frequency alleles is challenging owing to the difficulty in distinguishing them from sequencing errors.
-
Pool-seq has been successfully applied to a wide range of applications, including bulk segregant analyses, evolve and resequence studies, evolutionary genome analyses, analyses of time-series data and cancer genomics.
-
Owing to its cost-effectiveness, Pool-seq will continue to be a powerful tool for studies that require genome-wide allele frequency data in a large number of population samples. New technological and analytical advances will facilitate the extraction of haplotype information from Pool-seq data.
Abstract
The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
£139.00 per year
only £11.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Ellegren, H. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29, 51–63 (2014).
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Weigel, D. & Mott, R. The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10, 107 (2009).
Daetwyler, H. D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46, 858–865 (2014).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Sheridan, C. Illumina claims $1,000 genome win. Nature Biotech. 32, 115 (2014).
Weinstock, G. M. Genomic approaches to studying the human microbiota. Nature 489, 250–256 (2012).
Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010). This study is the first to provide a statistical framework for the analysis of Pool-seq data in population genetics.
Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol. 22, 3766–3779 (2013).
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).
Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Disease gene identification strategies for exome sequencing. Eur. J. Hum. Genet. 20, 490–497 (2012).
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011).
Pihlstrom, L., Rengmark, A., Bjornara, K. A. & Toft, M. Effective variant detection by targeted deep sequencing of DNA pools: an example from Parkinson's disease. Ann. Hum. Genet. 78, 243–252 (2014).
Suvorov, A. et al. Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS ONE 8, e83547 (2013).
Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346–350 (2008).
Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. & Babik, W. Accuracy of allele frequency estimation using pooled RNA-seq. Mol. Ecol. Resour. 14, 381–392 (2014).
Gross, J. B., Furterer, A., Carlson, B. M. & Stahl, B. A. An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE 8, e55659 (2013).
Kozak, G. M., Brennan, R. S., Berdan, E. L., Fuller, R. C. & Whitehead, A. Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution 68, 63–80 (2014).
Sloan, D. B. et al. De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol. Ecol. Resour. 12, 333–343 (2012).
Gautier, M. et al. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22, 3165–3178 (2013).
Arnold, B., Corbett-Detig, R. B., Hartl, D. & Bomblies, K. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22, 3179–3190 (2013).
Karczewski, K. J. et al. Systematic functional regulatory assessment of disease-associated variants. Proc. Natl Acad. Sci. USA 110, 9607–9612 (2013).
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).
Qanbari, S. et al. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10, e1004148 (2014).
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631–635 (2012).
Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).
Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Robasky, K., Lewis, N. E. & Church, G. M. The role of replicates for error mitigation in next-generation sequencing. Nature Rev. Genet. 15, 56–62 (2014).
Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002). This is a comprehensive review of pooling strategies.
Zhu, Y., Bergland, A. O., Gonzalez, J. & Petrov, D. A. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS ONE 7, e41901 (2012).
Kofler, R. et al. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011).
Schrider, D. R., Begun, D. J. & Hahn, M. W. Detecting highly differentiated copy-number variants from pooled population sequencing. Pac. Symp. Biocomput 1, 344–344 (2013).
Kapun, M., van Schalkwyk, H., McAllister, B., Flatt, T. & Schlötterer, C. Inference of chromosomal inversion dynamics from Pool-seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23, 1813–1827 (2014).
Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8, e1002487 (2012). This study is the first to infer TE insertion sites and the population frequency of TE insertions from Pool-seq data.
Sax, K. The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8, 552–560 (1923).
Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods 6, 550–551 (2009). This paper is the first to show that Pool-seq can be used to map induced mutations.
Schneeberger, K. Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nature Rev. Genet. 15, 662–676 (2014).
Hill, J. T. et al. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res. 23, 687–697 (2013).
Miller, A. C., Obholzer, N. D., Shah, A. N., Megason, S. G. & Moens, C. B. RNA-seq-based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23, 679–686 (2013).
Galvao, V. C. et al. Synteny-based mapping-by-sequencing enabled by targeted enrichment. Plant J. 71, 517–526 (2012).
Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039–1042 (2010). This study provides proof that Pool-seq provides enough power to map complex traits.
Wenger, J. W., Schwartz, K. & Sherlock, G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 6, e1000942 (2010).
Swinnen, S. et al. Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 22, 975–984 (2012).
Wade, M. J. Epistasis, complex traits, and mapping genes. Genetica 112–113, 59–69 (2001).
Earley, E. J. & Jones, C. D. Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics 189, 1203–1209 (2011).
Bastide, H. et al. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9, e1003534 (2013). This papershows that Pool-seq allows highly accurate fine mapping using natural population samples.
Jeong, S. et al. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793 (2008).
Kelly, J. K., Koseva, B. & Mojica, J. P. The genomic signal of partial sweeps in Mimulus guttatus. Genome Biol. Evol. 5, 1457–1469 (2013).
Beissinger, T. M. et al. A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number. Genetics 196, 829–840 (2014).
Johansson, A. M., Pettersson, M. E., Siegel, P. B. & Carlborg, O. Genome-wide effects of long-term divergent selection. PLoS Genet. 6, e1001188 (2010).
Rubin, C. J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010). This is a particularly nice demonstration of the power of Pool-seq to detect selected loci in population samples.
Burke, M. K. et al. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587–590 (2010). The is the first experimental evolution study measuring allele frequency changes using Pool-seq.
Remolina, S. C., Chang, P. L., Leips, J., Nuzhdin, S. V. & Hughes, K. A. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution 66, 3390–3403 (2012).
Turner, T. L., Stewart, A. D., Fields, A. T., Rice, W. R. & Tarone, A. M. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011).
Zhou, D. et al. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proc. Natl Acad. Sci. USA 108, 2349–2354 (2011).
Turner, T. L. & Miller, P. M. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633–642 (2012).
Tobler, R. et al. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31, 364–375 (2013).
Orozco-terWengel, P. et al. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21, 4931–4941 (2012).
Reed, L. K. et al. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197, 781–793 (2014).
Martins, N. et al. Host adaptation to viruses relies on few genes with different cross-resistance properties. Proc. Natl Acad. Sci. USA 111, 5938–5943 (2014).
Jalvingh, K. M., Chang, P. L., Nuzhdin, S. V. & Wertheim, B. Genomic changes under rapid evolution: selection for parasitoid resistance. Proc. Biol. Sci. 281, 20132303 (2014).
Magwire, M. M. et al. Genome-wide association studies reveal a simple genetic basis of resistance to naturally coevolving viruses in Drosophila melanogaster. PLoS Genet. 8, e1003057 (2012).
Turner, T. L., Bourne, E. C., Von Wettberg, E. J., Hu, T. T. & Nuzhdin, S. V. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nature Genet. 42, 260–263 (2010). The study is the first to show that ecologically important traits can be mapped with Pool-seq by comparing two functionally diverged populations.
Lamichhaney, S. et al. Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 19345–19350 (2012).
Fabian, D. K. et al. Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America. Mol. Ecol. 21, 4748–4769 (2012).
Kolaczkowski, B., Kern, A. D., Holloway, A. K. & Begun, D. J. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187, 245–260 (2011).
Cheng, C. et al. Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics 190, 1417–1432 (2012).
Hancock, A. M. et al. Adaptations to climate in candidate genes for common metabolic disorders. PLoS Genet. 4, e32 (2008).
Hancock, A. M. et al. Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86 (2011).
Fischer, M. C. et al. Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps. Mol. Ecol. 22, 5594–5607 (2013). This is a nice application of Pool-seq to find selected loci in a non-model organism.
Günther, T. & Coop, G. Robust identification of local adaptation from allele frequencies. Genetics 195, 205–220 (2013). This paper presents the first statistical framework to identify significant associations of a given locus with one or more environmental variables using Pool-seq data.
Rubin, C. J. et al. Strong signatures of selection in the domestic pig genome. Proc. Natl Acad. Sci. USA 109, 19529–19536 (2012).
Axelsson, E. et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360–364 (2013).
He, Z. et al. Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 7, e1002100 (2011).
Nolte, V., Pandey, R. V., Kofler, R. & Schlötterer, C. Genome-wide patterns of natural variation reveal strong selective sweeps and ongoing genomic conflict in Drosophila mauritiana. Genome Res. 23, 99–110 (2013).
True, J. R., Mercer, J. M. & Laurie, C. C. Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142, 507–523 (1996).
Casacuberta, E. & Gonzalez, J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 1503–1517 (2013).
Kazazian, H. H. Jr Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).
Boitard, S., Schlötterer, C., Nolte, V., Pandey, R. V. & Futschik, A. Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186 (2012).
Clément, J. A. et al. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains. PLoS Negl Trop. Dis. 7, e2591 (2013).
Foll, M. et al. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10, e1004185 (2014).
Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013).
Barrick, J. E. & Lenski, R. E. Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb. Symp. Quant. Biol. 74, 119–129 (2009).
Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013).
Parts, L. et al. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21, 1131–1138 (2011).
Illingworth, C. J., Parts, L., Schiffels, S., Liti, G. & Mustonen, V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol. Biol. Evol. 29, 1187–1197 (2012).
Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S. & Petrov, D. A. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. arXiv 1303.5044 (2014).
Traverse, C. C., Mayo-Smith, L. M., Poltak, S. R. & Cooper, V. S. Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections. Proc. Natl Acad. Sci. USA 110, E250–E259 (2013).
Versace, E., Nolte, V., Pandey, R. V., Tobler, R. & Schlötterer, C. Experimental evolution reveals habitat-specific fitness dynamics among Wolbachia clades in Drosophila melanogaster. Mol. Ecol. 23, 802–814 (2014).
Barcellos-Hoff, M. H., Lyden, D. & Wang, T. C. The evolution of the cancer niche during multistage carcinogenesis. Nature Rev. Cancer 13, 511–518 (2013).
Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nature Rev. Cancer 6, 924–935 (2006).
Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).
Newburger, D. E. et al. Genome evolution during progression to breast cancer. Genome Res. 23, 1097–1108 (2013).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Aparicio, S. & Caldas, C. The implications of clonal genome evolution for cancer medicine. New Engl. J. Med. 368, 842–851 (2013).
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011).
Long, Q. et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE 6, e15292 (2011).
Kessner, D., Turner, T. L. & Novembre, J. Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol. 30, 1145–1158 (2013).
Burke, M. K., King, E. G., Shahrestani, P., Rose, M. R. & Long, A. D. Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome Biol. Evol. 6, 1–11 (2014).
Eskin, I. et al. eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J. Computat. Biol. 20, 861–877 (2013).
Kofler, R. & Schlötterer, C. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31, 474–483 (2014).
Imsland, F. et al. The Rose-comb mutation in chickens constitutes a structural rearrangement causing both altered comb morphology and defective sperm motility. Plos Genetics 8, e1002775 (2012).
Del Fabbro, C., Scalabrin, S., Morgante, M. & Giorgi, F. M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE 8, e85024 (2013).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Nevado, B., Ramos-Onsins, S. E. & Perez-Enciso, M. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23, 1764–1779 (2014).
Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).
Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).
Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinformatics 13, 239 (2012).
Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, i318–i324 (2010).
Altmann, A. et al. vipR: variant identification in pooled DNA using R. Bioinformatics 27, I77–I84 (2011).
Zhou, B. Y. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 28, 2569–2575 (2012).
Chen, Q. & Sun, F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 14 (Suppl. 1), S1 (2013).
Druley, T. E. et al. Quantification of rare allelic variants from pooled genomic DNA. Nature Methods 6, 263–265 (2009).
Vallania, F. L. et al. High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res. 20, 1711–1718 (2010).
Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012).
Calvo, S. E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nature Genet. 42, 851–858 (2010).
Fiston-Lavier, A.-S., Barron, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. bioRxiv http://dx.doi.org/10.1101/002964 (2014).
Zhuang, J., Wang, J., Theurkauf, W. & Weng, Z. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).
Kofler, R., Pandey, R. V. & Schlötterer, C. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-seq). Bioinformatics 27, 3435–3436 (2011).
Boitard, S. et al. Pool-HMM: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13, 337–340 (2013).
Ferretti, L., Ramos-Onsins, S. E. & Perez-Enciso, M. Population genomics from pool sequencing. Mol. Ecol. 22, 5561–5576 (2013).
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).
Vitalis, R., Gautier, M., Dawson, K. J. & Beaumont, M. A. Detecting and measuring selection from gene frequency data. Genetics 196, 799–817 (2014).
Gautier, M. & Vitalis, R. Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013).
Feder, A. F., Petrov, D. A. & Bergland, A. O. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS ONE 7, e48588 (2012).
Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J. & Hobert, O. CloudMap: a cloud-based pipeline for analysis of mutant genome sequences. Genetics 192, 1249–1269 (2012).
Edwards, M. D. & Gifford, D. K. High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (Suppl. 6), S8 (2012).
Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L. & Harris, M. P. Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, 1017–1024 (2012).
Austin, R. S. et al. Next-generation mapping of Arabidopsis genes. Plant J. 67, 715–725 (2011).
Leshchiner, I. et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 22, 1541–1548 (2012).
Prosperi, M. C. & Salemi, M. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics 28, 132–133 (2012).
Zagordi, O., Bhattacharya, A., Eriksson, N. & Beerenwinkel, N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011).
Eyre, D. W. et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput. Biol. 9, e1003059 (2013).
Astrovskaya, I. et al. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12 (Suppl. 6), S1 (2011).
Yang, X., Charlebois, P., Macalalad, A., Henn, M. R. & Zody, M. C. V-Phaser 2: variant inference for viral populations. BMC Genomics 14, 674 (2013).
Töpfer, A. et al. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput. Biol. 10, e1003515 (2014).
Töpfer, A. et al. Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20, 113–123 (2013).
Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N. & Roth, V. HIV haplotype inference using a constraint-based Dirichlet process mixture model. Machine Learning in Computational Biology NIPS Workshop (2010).
Pandey, R. V., Kofler, R., Orozco-terWengel, P., Nolte, V. & Schlötterer, C. PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011).
Chen, X., Listman, J. B., Slack, F. J., Gelernter, J. & Zhao, H. Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet. Epidemiol. 36, 549–560 (2012).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 11, 396–398 (2014).
Acknowledgements
The authors apologize to all colleagues who were not cited owing to space limitations. They are grateful to all colleagues who shared unpublished manuscripts, especially D. Kessner, Q. Long, M. Pérez Enciso, A. S. Fiston-Lavier and K. Schneeberger for comments and discussions. They thank members of the Institut für Populationsgenetik, in particular A. Betancourt, M. Dolezal, A. Futschik and A. Kalinka for discussion and comments on earlier versions of the manuscript. This work has been supported by the ERC (ArchAdapt) and the Austrian Science Funds (FWF, W1225).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
FURTHER INFORMATION
Glossary
- Next-generation sequencing
-
(NGS; also known as second-generation sequencing). An umbrella term for different sequencing platforms delivering millions of short DNA sequence reads.
- Reads
-
DNA sequences that are generated by next-generation sequencing.
- Pool-seq
-
A sequencing technique in which sequencing libraries are not prepared from DNA of a single individual or cell but from a mixture of DNA fragments originating from different individuals or cells. In the context of this Review, Pool-seq is used to describe the unbiased sequencing of the entire genome.
- Coverage
-
The number of reads that span a given genomic position.
- Sequencing libraries
-
Sets of fragmented DNA extracted from one or more individuals that serve as the template for subsequent sequencing.
- Exome sequencing
-
A sequencing approach in which the complexity of the genome is reduced through hybridization to exonic sequences, which results in a higher sequence coverage of protein-coding regions.
- Restriction-site-associated DNA markers
-
Sequence polymorphisms in close proximity to a restriction enzyme recognition site.
- Linkage disequilibrium
-
(LD). Nonrandom association between alleles at two loci. In outcrossing diploid individuals, the genotypes need to be sorted into haplotypes in a statistical procedure called phasing.
- Genetic markers
-
Polymorphic loci that could be scored with a genotyping technique.
- F2 analysis
-
Analysis of mapping populations generated by the F2 design. The F1 progeny from crossing two phenotypically different parental strains are themselves crossed to produce an F2 population that is segregating for the phenotype of interest. The F2 mapping population may carry up to three genotypes at every marker and therefore allows the detection of additive and dominance effects, as well as interactions between loci.
- Phased genomic sequences
-
Genome sequences for which the haplotype phase (that is, the combination of alleles or genetic markers that coexist on a single chromosome) has been determined.
- Imputation
-
In statistics, it refers to the replacement of missing data with values. In genomics, it describes the use of haplotype sequences to fill in missing sequence information.
- Haplotypes
-
The combination of alleles or genetic markers that coexist on a single chromosome. Chromosomal regions carrying a haplotype are inherited as intact physical units until they are broken up by recombination.
- Pool genome-wide association studies
-
(Pool-GWASs). Genotype–phenotype mapping studies in which phenotypically extreme individuals are grouped and sequenced as pools. Causative variants are identified by contrasting the allele frequencies between the pools.
- Evolve and resequence studies
-
Studies that combine experimental evolution with next-generation sequencing. They make use of controlled environmental, demographic and selective variables to facilitate genotype–phenotype mapping.
- Forward genetics
-
An approach in which mutations induced by random mutagenesis that lead to the disruption of gene function are identified based on their phenotypes. The causative mutation is traditionally identified by positional cloning or by a candidate-gene approach.
- Bulk segregant analysis
-
(BSA). Analysis in which offspring from diverged parents are phenotyped and the DNA of individuals from opposing tails of the phenotypic distribution is combined (pooled). Causative variants are identified by contrasting allele frequency differences among the pools.
- Epistatic interactions
-
Non-additive interactions between genes in which the effect of an allele at one locus is modified by the genotypes at other loci in the genome. The resulting phenotype is different from that expected by summing the independent effects of the individual loci.
- Introgress
-
Introducing a genomic region from one strain or species into that of another by repeated backcrossing. By selecting for the phenotype of interest, the genomes become isogenic except for the chromosomal regions causing the selected phenotype.
- Paired-end reads
-
DNA fragments that were sequenced from both ends, yielding pairs of reads that are separated by a defined distance that is dependent on the library preparation protocol.
- Soft clipping
-
Substrings at either end of reads that were not aligned with a local alignment algorithm and are thereby excluded in the subsequent analysis.
- Proper pairs
-
Paired-end reads where both pairs can be mapped to the same chromosomes within a distance pre-specified by the insert size chosen during library preparation.
- Broken pairs
-
Paired-end reads that do not map as proper pairs.
- Mapping quality
-
Log (base 10) transformed measure of the probability that a read is incorrectly mapped multiplied by 10.
- Base quality
-
Log (base 10) transformed measure of the probability that a given base call is incorrect multiplied by 10.
- Insertions and deletions
-
(Indels). DNA sequences that have been inserted or deleted from a genomic region. As only phylogenetic analysis allows the distinction between insertions and deletions, indel has been used as an indifferent term.
- Strand bias
-
A variant that is significantly more likely to occur within reads that originate from one of the two strands of DNA.
- GWASs
-
Trait mapping studies that rely on a statistical test to determine associations between sequence variants and a given phenotype in natural populations.
- Cline
-
The gradual change in phenotypes or allele frequencies along a geographical or environmental gradient.
- Hitchhiking
-
The population genetic mechanism by which a neutral, or in some cases slightly deleterious, mutation increases in population frequency solely as a result of physical linkage with a positively selected mutation.
Rights and permissions
About this article
Cite this article
Schlötterer, C., Tobler, R., Kofler, R. et al. Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nat Rev Genet 15, 749–763 (2014). https://doi.org/10.1038/nrg3803
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg3803