Article
Open access
Published: 05 July 2017

Genome-wide association study and genomic prediction in citrus: Potential of genomics-assisted breeding for fruit quality traits

Mai F. Minamikawa¹,
Keisuke Nonaka²,
Eli Kaminuma³,
Hiromi Kajiya-Kanegae ORCID: orcid.org/0000-0002-5719-7559¹,
Akio Onogi¹,
Shingo Goto²,
Terutaka Yoshioka²,
Atsushi Imai⁴,
Hiroko Hamada²,
Takeshi Hayashi⁵,
Satomi Matsumoto⁶,
Yuichi Katayose⁶,
Atsushi Toyoda ORCID: orcid.org/0000-0002-0728-7548^7,8,
Asao Fujiyama⁸,
Yasukazu Nakamura³,
Tokurou Shimizu² &
…
Hiroyoshi Iwata¹

Scientific Reports volume 7, Article number: 4721 (2017) Cite this article

10k Accesses
111 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Novel genomics-based approaches such as genome-wide association studies (GWAS) and genomic selection (GS) are expected to be useful in fruit tree breeding, which requires much time from the cross to the release of a cultivar because of the long generation time. In this study, a citrus parental population (111 varieties) and a breeding population (676 individuals from 35 full-sib families) were genotyped for 1,841 single nucleotide polymorphisms (SNPs) and phenotyped for 17 fruit quality traits. GWAS power and prediction accuracy were increased by combining the parental and breeding populations. A multi-kernel model considering both additive and dominance effects improved prediction accuracy for acidity and juiciness, implying that the effects of both types are important for these traits. Genomic best linear unbiased prediction (GBLUP) with linear ridge kernel regression (RR) was more robust and accurate than GBLUP with non-linear Gaussian kernel regression (GAUSS) in the tails of the phenotypic distribution. The results of this study suggest that both GWAS and GS are effective for genetic improvement of citrus fruit traits. Furthermore, the data collected from breeding populations are beneficial for increasing the detection power of GWAS and the prediction accuracy of GS.

Genome-wide association studies provide insights into the genetic determination of fruit traits of pear

Article Open access 18 February 2021

Genome-wide association study and candidate gene identification for agronomic traits in 182 upward-growing fruits of C. frutescens and C. annuum

Article Open access 26 June 2024

Genome-wide association study identifies QTL for eight fruit traits in cultivated tomato (Solanum lycopersicum L.)

Article Open access 01 September 2021

Introduction

Citrus are among the most produced fruits in the world¹ and contain various functional compounds beneficial for human health such as vitamins, limonoids, and carotenoids². Total citrus fruit production in 2012 was estimated at 131.3 million tons (20% of total fruit production; 8.8 kha), of which 52.6% were sweet oranges (Citrus sinensis (L.) Osbeck), 21.1% mandarins (C. reticulata Blanco), 11.2% lemons (C. limon (L.) Burm. f.) and limes (C. aurantifolia (Christm.) Swingle), and 6.2% grapefruit (C. paradisi Macfad.) and pummelos (C. maxima Merr.)³. Widely distributed commercial cultivars have been selected from various indigenous cultivars as somatic mutants (bud sports or nucellar seedlings)⁴. Although mutant selection is a simple approach to improve a trait, cross-breeding is beneficial for developing novel cultivars with unprecedented trait combinations. Swingle and Webber first initiated a systematic cross-breeding program of citrus at the U.S. Department of Agriculture in Florida, USA in 1893, and they selected some important cultivars by cross-breeding⁵. The available combinations of crosses and the number of individuals in the population limit the efficiency of selection. Large plant size limits the number of fruit trees in orchards, and their long juvenile period lengthens the time required for breeding, thus increasing the total cost of cross-breeding. During implementation of the citrus breeding program at the National Agriculture and Food Research Organization Institute of Fruit Tree and Tea Science (NIFTS) in Japan, various techniques to promote flowering and fruiting of the seedlings to shorten the breeding cycles and grafting and training methods were established^{6, 7}; however, the average breeding period from the cross to the release of a cultivar is still 25 years. The number of seedlings to be grafted at the orchard is the key step limiting breeding efficiency. Therefore, selecting promising seedlings at an early stage before grafting would greatly increase the probability of obtaining promising candidates.

In recent years, next-generation sequencing technologies have decreased the cost of single nucleotide polymorphism (SNP) genotyping and expanded the availability of numerous markers⁸. Genome-wide association studies (GWAS) and genomic selection (GS), both performed by using genome-wide markers, are becoming important and effective tools for plant breeding^{9, 10}. GWAS enable the detection of quantitative trait loci (QTLs) or causal genes from association between genome-wide markers and trait phenotypes, and outperform bi-parental QTL mapping because GWAS do not require the development of segregating populations^{11, 12}. GS enables the selection of superior genotypes based on genomic estimated breeding values (GEBV) estimated from information on genome-wide markers, and is more effective than marker-assisted selection (MAS), especially for traits controlled by a large number of genes^{13, 14}. In general, the power of GWAS and the accuracy of GS are boosted when sample size is large^{11, 15}. However, accumulating large data sets sufficient for these analyses is difficult for fruit trees because of their long juvenile period, large plant size, and difficulties in phenotyping. Poland suggested the idea of “breeding-assisted genomics”, in which the data collected from actual breeding populations is used for functional genomics in plants¹⁶. In fruit tree breeding, phenotypic data accumulated every year in conventional breeding programs could be useful for GWAS and GS. Although a number of GWAS and GS studies have been reported in animals and plants^{11, 13, 14}, the potential for using multiple actual breeding populations in combination with the parental population has not yet been explored.

The objective of this study was to assess the potential of GWAS and GS for the genetic improvement of citrus and to evaluate the potential of using multiple actual breeding populations in addition to the parental population for GWAS and GS.

Results

Linkage disequilibrium and population structure

The levels of linkage disequilibrium (LD) in the parental population (111 varieties; Supplementary Table S1) and combined parental and breeding populations (Supplementary Table S2; 787 genotypes in total) were evaluated by calculating the squared allele-frequency correlation (r ²) between each pair of SNPs on the same chromosome. The r ² values were plotted against the physical distance for the parental population and the combined population; curves fitted for the relationships between r ² and physical map distance indicated that LD extended over 1 Mb (r ² > 0.25) and decayed to relatively low levels (r ² < 0.10) within 20 Mb in both populations (Fig. 1). The mean LD between adjacent SNPs was similar (r ² = 0.45) in both populations, although the decay of LD was slightly faster in the combined population. The patterns of LD decay were similar in all chromosomes (Supplementary Fig. S1).

The genetic structure of the parental population was estimated by using hierarchical clustering, ADMIXTURE clustering, and principal component analysis (PCA). Hierarchical clustering divided the parental population into two major clusters (Fig. 2A). The smaller cluster contained mainly pummelos, which are large-fruit varieties (Fig. 2C; Supplementary Table S1). The larger cluster was further divided into two major sub-clusters (Fig. 2A). The smaller sub-cluster contained mainly mandarins, which are small-fruit varieties (Fig. 2C; Supplementary Table S1). According to ADMIXTURE clustering, the optimal number (K) was estimated as 4, because cross-validation (CV) error rapidly decreased until K = 4 and changed little thereafter, although K = 6 corresponded to the lowest CV error (Supplementary Fig. S2). The deduced optimal K value was similar to those reported by Curk et al.¹⁷. The cluster containing large-fruit varieties but not the sub-cluster containing small-fruit varieties showed clear genetic differentiation (Fig. 2B). Similarly, PCA indicated the presence of two main clusters, one with large and the other one with small fruit size (Supplementary Fig. S3). Whereas part of the parental population belonged to the clusters of large- or small-fruit varieties, most varieties showed no clear separation in hierarchical clustering, ADMIXTURE clustering, or PCA. PCA also suggested low stratification in the combined population (Supplementary Fig. S4).

Genome-wide association study

GWAS was carried out for 17 fruit quality traits (Table 1) in the combined and parental populations (Fig. 3; Supplementary Table S3; Supplementary Figs S5 and S6). Higher –log₁₀(p) values and/or more significant SNPs (false discovery rate <0.05) were detected for 15 traits (except for aroma intensity (Aroma) and bitterness (Bitter)) by using the combined population rather than the parental population. In the combined population, GWAS detected significant SNPs associated with 11 traits: fruit weight (Weight), fruit hardness (FruH), color of pericarp (ColorP), easiness of peeling (Peeling), color of flesh (ColorF), flesh hardness (FleH), juiciness (Juicy), firmness of locule membrane (FirmLM), number of seeds (Seed), Bitter, and acidity (Acid) (Supplementary Table S4). For the highly correlated phenotypic traits (r > 0.75; Supplementary Table S5), several common significant SNPs were detected: seven on chromosomes (Chr.) 2, 4, and 6 for ColorP and ColorF (r = 0.81) and three on Chr. 3 for FruH and Peeling (r = 0.76). Single common significant SNPs were detected for FleH and Juicy (Chr. 4; r = 0.68); FirmLM and Seed (Chr. 4; r = 0.35); and ColorP, ColorF, and Bitter (Chr. 6). A single common significant SNP for Weight, ColorP, Bitter, and Acid (Chr. 4) is consistent with the largest (although not significant) peak SNP for sugar content (Brix).

Table 1 Fruit quality traits evaluated in this study.

Full size table

Two genes, Ciclev10031587 m.g and Ciclev10031003 m.g, were found between the significant SNPs on Chr. 4 for ColorP and ColorF; these genes were located 490 kb and 17 kb apart from the nearest significant SNPs, respectively, within a high-LD region (Supplementary Fig. S7). These genes are annotated as phytoene desaturation 1 (pds1) and carotenoid cleavage dioxygenase 4 (ccd4), respectively. A significant SNP on Chr. 3 for FruH and Peeling resided in the gene Ciclev10018456 m.g, which is annotated as callose synthase. This gene was also located in the region flanking a significant SNP for FleH. The common significant SNP on Chr. 4 for Weight, ColorP, Bitter, and Acid resided in the gene Ciclev10031681 m.g, which is annotated as glutamate dehydrogenase.

Accuracy of prediction models for 17 fruit traits in the parental population

Ten-fold CV showed that prediction accuracy was high for Weight, FruH, ColorP, Peeling, ColorF, and FirmLM (r ≥ 0.7), intermediate for fruit shape (Shape), smoothness of pericarp (SmoothP), Aroma, FleH, Juicy, Seed, Bitter, taste (Taste), Brix, and Acid (0.3 ≤ r < 0.7), and low for appearance (Appear) (r < 0.3) (Fig. 4A). GBLUP (RR and GAUSS), Ridge Regression, and Bayesian regressions were more accurate for most traits than the other methods, whereas the accuracies of Random Forest, Elastic Net, and Lasso were trait-dependent. While the prediction accuracy level of a single method considerably varied among traits, the mean prediction of all 11 methods showed consistent accuracy, i.e., an upper-middle level of accuracy among the methods, for all traits. The non-linear method GBLUP (GAUSS) had the highest prediction accuracy for Acid. Because most of the traits (except Weight, Brix, and Acid) were scored as ordinal categorical traits, several prediction models specific for categorical responses were used. Because the prediction accuracy of these models was lower than that of models for continuous responses (Supplementary Fig. S8), the latter models were used for subsequent analyses.

We built prediction models that included the information from GWAS for the 17 traits. For 15 traits (except Appear and Bitter), GBLUP (RR) that treats all SNPs as random effects was more accurate than multiple linear regression (MLR) that treats the three most significant SNPs (r < 0.6; red points in Supplementary Fig. S5) as fixed effects, or GBLUP (RR) that treats the three most significant SNPs as fixed effects and other SNPs as random effects, even though MLR had high prediction accuracies (r > 0.7) for ColorP, ColorF, and FirmLM (Fig. 4B).

A multi-kernel model considering dominance and additive genetic effects was also evaluated. For Acid and Juicy, this model was more accurate than a model with only additive effects, whereas opposite results were obtained for Appear, Shape, SmoothP, FleH, Seed, and Taste (Fig. 4C). For the other traits, accuracies were similar for both models. The additive genetic effects were the major factor contributing to genetic variation for 14 traits (except Appear, Juicy, and Bitter) (Supplementary Table S6).

Prediction for distribution tails

As described above, a diversity of fruit weight was found in the parental population (Fig. 2; Supplementary Fig. S9A). Because the fruit weight of pummelos, such as ‘Egami buntan’, ‘Banpeiyu’, ‘Mato buntan’, and ‘Hirado buntan’, was very large (>800 g), we hypothesized that a genetic system differs between the varieties with extremely large fruits and other varieties. To validate the hypothesis, we tried to predict fruit weight in the large-fruit group (Fig. 2) by using a model built based on the data of other citrus varieties. The results showed a high accuracy of such prediction (Fig. 5A). Higher accuracy was attained by GBLUP (RR) (r = 0.89) than by GBLUP (GAUSS) (r = 0.74), whereas the accuracy of GBLUP (RR) was slightly lower than that of GBLUP (GAUSS) for all ranges of fruit weight (Fig. 4A). The extremely large fruit weight in pummelos was properly predicted, especially with GBLUP (RR). Similarly, GBLUP (RR) had a slightly higher accuracy for the small-fruit group (Fig. 5B). We also applied rank-ordered 4-fold CV for all 17 traits. GBLUP (RR) attained higher accuracy than did GBLUP (GAUSS) (including the number of outliers) for distribution tails for all the traits (Supplementary Fig. S9).

Prediction for the breeding population

To evaluate the prediction accuracy in the breeding population, three different training populations for modeling were compared. For 13 traits (except Shape, ColorP, Bitter, and Taste; Fig. 6, Supplementary Fig. S10), the accuracy of the model based on the breeding population was on average 1.35 times that of the model based on the parental population, indicating that information from the breeding population improved prediction accuracy. The highest accuracy was attained in the model based on the combined population for 14 traits (except Aroma, Bitter, and Taste). The results were similar when prediction accuracy was evaluated on a single-family basis (Supplementary Fig. S11).

Discussion

The LD pattern is a key factor influencing the power of GWAS and the accuracy of GS, because these two approaches are based on LD between markers and causal polymorphisms^{9, 11, 13, 18}. Generally, the range of LD is smaller in heterozygous outcrossing species than in homozygous selfing species^{19, 20}, and therefore a large number of markers are needed to cover the genome of outcrossing species. In the citrus population used in this study, we observed a wide range of LD (r ² > 0.25 at 1 Mb; Fig. 1), even though most genotypes obtained from conventional cross-breeding are highly heterozygous²¹. Because the citrus population has experienced a population bottleneck, i.e. it had a limited number of founders³, the wide range of LD^{19, 20} was expected to increase. Citrus cultivars are developed through a small number of generations, and are mainly maintained through vegetative propagation. These breeding and management systems make recombination unlikely to occur and may also maintain LD²². The mean r ² between adjacent markers has been used to determine target marker densities for GS. For high-heritability traits, a mean r ² of 0.15 is sufficient, but for low-heritability traits increasing the r ² to 0.2 improved prediction accuracy²³. In the citrus population used in this study, mean r ² (0.45) was higher than the above value, suggesting that marker densities in the population would be sufficient for GS.

Population structure extends the range of LD²⁴ and can cause spurious associations in GWAS¹¹. Hierarchical clustering, ADMIXTURE clustering, and PCA showed that some varieties in the parental population formed clusters related to fruit weight, but most varieties showed no clear separation. The wide range of LD observed in the parental population was probably due to its small size rather than to the population structure. Given the wide range and high level of LD and the weak population structure, GWAS and GS would be effective in citrus populations.

For most traits, the number of significant SNPs and the –log₁₀(p) values of peaks detected in GWAS were greater with the combined (787 genotypes in total) than with the parental population (111 varieties), which would increase the chance of finding good markers for MAS. These results indicate that GWAS using the combined population has higher statistical power for detecting QTLs than GWAS using only the parental population, and that the QTLs detected were shared between the parental and breeding populations. For some traits, the significant SNPs differ between the combined and parental populations, possibly because of the differences between the populations in detection power, allele frequency, or population structure.

In addition to the sample size, the difference in the LD pattern between the combined and parental populations can influence the resolution of GWAS. For example, multi-breed GWAS had a higher resolution than within-breed GWAS in dairy cattle²⁵. This difference was explained by narrower QTL regions because of the short conserved regions of LD across breeds²⁶. In the present study, the LD decay was rapid in the combined population, although the mean LD between adjacent SNPs was similar (r ² = 0.45) in both populations, implying that the LD in the combined population is conserved over shorter distances than that in the parental population. The short conserved regions of LD could improve the precision of QTL detection in the combined population by narrowing down the QTL regions.

GWAS in the combined population detected SNPs significantly associated (false discovery rate <0.05) with 11 traits. The most significant association was detected for Weight on Chr. 8 where one QTL named FW8 for fruit weight has been identified in a bi-parental QTL mapping study that used a mandarin F₁ population derived from ‘Fortune’ × ‘Murcott’²⁷. The most significant SNP detected for Weight resided in the same region as FW8, confirming a single major QTL for fruit weight on Chr. 8.

Identification of seven common significant SNPs for ColorP and ColorF suggests a close genetic linkage between QTLs controlling these traits, resulting in high phenotypic correlation. Common significant SNPs for other traits could be explained similarly. Significant SNPs for ColorP and ColorF on Chr. 4 were located in multiple QTLs for fruit flavedo and juice color identified on the linkage group MUR4.1²⁷. Two genes, Ciclev10031587 m.g and Ciclev10031003 m.g, annotated as pds1 and ccd4 respectively, are colocalized with the QTLs on MUR4.1²⁷. The pds1 gene is involved in the carotenoid biosynthetic pathway in citrus^{28, 29} and Arabidopsis³⁰, whereas ccd4 is involved in carotenoid accumulation in peach^{31, 32}. The pericarp and flesh color of citrus fruits is determined by carotenoid accumulation³.

A significant SNP for FruH and Peeling on Chr. 3 resides in the callose synthase gene (Ciclev10018456 m.g). The polysaccharide callose is a component of plant cell wall³³. Cell wall polysaccharides are associated with cell wall elasticity and viscosity, which influence fruit firmness. Softening of fruit tissues in tomato and avocado is caused by the degradation of cell wall polysaccharides^{34, 35}.

A single significant SNP for Acid detected on Chr. 4 is consistent with those for Weight, ColorP, and Bitter, and with the largest (although not significant) peak SNP for Brix. This SNP resides in the gene Ciclev10031681 m.g, which is annotated as glutamate dehydrogenase. In avocado, the activity and protein content of GDH remain constant during fruit development, but increase ~4-fold during fruit ripening³⁶. Generally, the acidity decreases while the sugar content increases during fruit maturation³. GDH could have pleiotropic effects on fruit quality traits involved in citrus fruit maturation. If so, common significant SNPs could be used as common markers for such traits in MAS programs.

The prediction accuracy of GBLUP (RR) that treats all SNPs as random effects, was highest for all the traits except Appear and Bitter, implying that all the traits except Appear and Bitter are controlled by many genes and/or QTLs. For Shape, SmoothP, Aroma, Taste, and Brix, for which the levels of prediction accuracy were intermediate, even though no significant association was detected. These results suggest that GS is more accurate for these traits than MAS based on significant SNPs. On the other hand, high prediction accuracies (r > 0.7) of MLR for ColorP, ColorF, and FirmLM imply that MAS performed by using a few markers enables the selection at low-cost genotyping for these traits.

A multi-kernel model that considers both additive and dominance effects improved the accuracy for Acid and Juicy, suggesting that the effects of both types are important for these traits. The highest prediction accuracy of GBLUP (GAUSS) for Acid supports this idea. Whereas GBLUP (RR) captures additive genetic effects, GBLUP (GAUSS) captures non-additive genetic effects (e.g., the dominance effect) by modeling non-linear relationships between markers and the phenotype^{37, 38}. For the other traits, prediction accuracy was similar or lower with the model considering additive and dominance effects, suggesting that additive models are sufficient to explain genetic variations in the population. Consistent with this assumption, additive genetic effects were the major factor contributing to genetic variation. Using models that consider both additive and dominance effects may not always be beneficial, as suggested by several previous studies. Adding dominance and epistatic effects to additive genetic effects did not help genomic prediction in rice, suggesting a high correlation between multiple variance components³⁹. In wheat, considering only additive effects resulted in equal or even higher prediction accuracy than considering both additive and dominance effects⁴⁰, even though considering both types of effects increased the prediction accuracy in another simulation study that used a large population⁴¹. These results suggest that models considering dominance effects are quite sensitive to the number of individuals in the population⁴⁰.

The prediction for the tails of fruit weight distribution was accurate for large-fruit varieties. The extremely large fruit weight of pummelos was properly predicted, especially with GBLUP (RR), even though the accuracy of GBLUP (RR) for the whole range of distribution was slightly lower than that of GBLUP (GAUSS). This implies that the genetic architecture of the extremely large fruit weight of pummelos can be explained by the additive accumulation of fruit weight–related genes of “non-large-fruit” varieties, which include a number of mandarins and sweet oranges. Cultivated pummelos were originally selected from a single progenitor species, C. maxima, whilst some of the cultivated mandarins were developed by introgression from C. maxima to the ancestral mandarin species C. reticulata ⁴²; sweet oranges (C. sinensis) originated from pummelo and mandarin⁴³. Thus, mandarins and sweet oranges are related to pummelos, which may have contributed to the high prediction accuracy for the extremely large fruit weight of pummelos. For other traits, RR also outperformed GAUSS in the prediction of the distribution tails, suggesting that RR is much more robust and accurate than GAUSS when used for this purpose. This is an important implication for the use of GS in breeding programs, because the prediction of distribution tails is important for increasing or decreasing genotypic values of target traits.

For the breeding population, the prediction accuracy was lower in the model built using only the parental population than in the model built using the combined population. This result suggests that the high genetic diversity in the parental population is not always useful to predict a small variation in the breeding population. Generally, the prediction accuracy across populations that are of low relevance is lower than that within a population⁴⁴. Lower prediction accuracies across populations than within populations were also observed in apple^{45, 46}, grapevine⁴⁷ and maize⁴⁸. On the other hand, across-population evaluation is reportedly preferable to that within populations when the populations are closely related to each other, marker density is high, or the number of phenotypic records is small^{49, 50}. The advantage of using data from multiple populations should be discussed from the viewpoint of the costs of genotyping and phenotyping, as pointed out by Muranty et al.⁴⁶. In the present study, the prediction accuracy was improved by combining the parental and breeding populations. The shorter conserved region of LD in the combined population than in the parental population could improve not only GWAS resolution but also the prediction accuracy by narrowing down the QTL regions. Our results suggest that multiple populations closely related to each other are useful to predict a small variation in the breeding population.

The results of this study suggest that GWAS and GS have a good potential for genetic improvement of fruit quality traits in citrus. Declining sequencing cost enables genomics-assisted breeding with very large numbers of DNA markers, and it has been anticipated that this approach would improve fruit tree breeding by avoiding constraints imposed by the long juvenile period and large plant size⁵¹. In the present study, the GWAS power was increased and the prediction accuracy was improved by combining the breeding and parental populations. The results support the idea of breeding-assisted genomics, in which the data collected from actual breeding programs will be beneficial for plant functional genomics¹⁶. Because the phenotypic data are accumulated every year, combining marker genotypic data in a breeding program and gathering them will increase the detection power of GWAS and the prediction accuracy of GS. The combination of genomics-based approaches, GWAS and GS, and the data from actual breeding programs will facilitate both functional genomics and fruit tree breeding.

Methods

Plant materials and fruit assessment

A set of 111 citrus varieties (48 hybrid cultivars, 41 breeding/selected strains, 22 indigenous varieties), called the parental population (Supplementary Table S1), and 35 full-sib families with a total of 676 F₁ individuals, called the breeding population (Supplementary Table S2), were used in this study. The breeding population was derived from crosses among 45 varieties, of which 41 were included in the parental population. All genetic materials were maintained at NIFTS (Nagasaki and Shizuoka, Japan).

We evaluated 17 fruit quality traits for the parental and breeding populations by using instrumental, sensory, or visual assessment methods (Table 1). Five fruits were sampled from one tree of each genotype in December of each year, when many genotypes were maturing. The mean values obtained from 2008 to 2014 were estimated by fitting a mixed linear model (MLM); year was treated as a fixed effect and variety as a random effect to remove the year effect. The best linear unbiased prediction (BLUP) estimated by the MLM was used for subsequent analysis as the dependent variables of regression models for GWAS and genomic prediction models. The MLM was implemented in the “lmer” function of the R package lme4 ver. 1.1–7⁵². All traits except Weight, Brix, and Acid had ordinal categorical values (Table 1).

SNP genotyping data

DNA was extracted and its quality was evaluated according to Shimizu et al.⁵³. Next-generation sequencing data of 12 citrus varieties was obtained with Illumina HiSeq 2000 in paired-end mode for the development of a SNP Array for genotyping 768 SNPs⁵³, and a SNP array for genotyping 1,536 SNPs was designed from the sequencing data as described by Shimizu et al.⁵³.

SNP genotyping for 768 or 1,536 arrays was performed as described by Shimizu et al.⁵³. Parentage tests were performed using a function of GUGS (General Utilities for Genotyping Study) software (T. Shimizu, in preparation) with known hybrid trios. Finally, 1,841 verified SNPs were obtained (Supplementary Data S1). For further analysis, each of the 1,841 SNP genotypes was converted to 1 (AA homozygotes), −1 (BB homozygotes), or 0 (AB heterozygotes). The rate of missing SNP genotypes was 0.17. We used BEAGLE ver. 3.3.2⁵⁴ to impute the missing genotypes; the mean imputation accuracy (r ²) was 0.97.

Estimation of linkage disequilibrium

To estimate LD between each pair of SNPs within the same chromosome, we calculated the squared correlation coefficients (r ²) of SNP genotypes and plotted them against physical distance between the corresponding markers in Mb. Local polynomial regression with kernel weight was conducted using the “locpoly” function in the R package KernSmooth ver. 2.23–13⁵⁵ to test the relation between the r ² values and physical map distances. Physical distances between adjacent markers were 0.056–2,243 kb (mean, 156.4 kb).

Population structure analysis

The genetic structure of the parental population was estimated using hierarchical clustering, ADMIXTURE clustering, and PCA. Hierarchical clustering based on Ward’s method⁵⁶ with Euclidean distance was conducted by using the R function “hclust”. Model-based clustering implemented in the software ADMIXTURE ver. 1.3.0⁵⁷ infers population structure by estimating individual admixture proportions from multi-locus SNP data by using a maximum-likelihood method. The number of ancestral populations (K) in the parental population varying from 1 to 10 was assumed when using ADMIXTURE. To choose the optimal K value, 5-fold CV was performed. PCA was conducted using the R function “prcomp”. The optimal number of principal components (PCs) in the parental population was determined by estimating the variances of PC scores. To locate the breeding population in the PCA space of the parental population, the PC scores of the breeding population were estimated based on PCA of the parental population. A heat map was generated to estimate the phenotypic population structure of fruit weight (log-converted Weight) by using R package ggplot2 ver. 1.0.0⁵⁸.

Regression models for GWAS

Association analysis was conducted by using an MLM⁵⁹ implemented in the “GWAS” function of the R package rrBLUP ver. 4.3³⁷. To avoid spurious associations due to population structure, a kinship matrix and four PCs were included in the MLM as fixed effects. The kinship matrix was computed by using the “A.mat” function of the R package rrBLUP ver. 4.3³⁷. The number of PCs was selected based on the variances of PC scores. The variance of PC score rapidly decreased until PC4, and gradually thereafter (Supplementary Fig. S6). Annotations of the genes containing significant SNPs were obtained from the Phytozome version 12.0.1 (https://phytozome.jgi.doe.gov/pz/portal.html/) and CitrusCyc Pathways version 3.0 databases (http://pathways.citrusgenomedb.org/). The LD heatmaps of the regions surrounding the peaks on Chr. 4 for ColorP and ColorF were constructed by using the R package LDheatmap ver. 0.99–1⁶⁰.

Genomic prediction models

We used 12 methods to evaluate prediction accuracy. A MLM-based method, genomic best linear unbiased prediction (GBLUP), was performed by using the “kinship.BLUP” function of the R package rrBLUP ver. 4.3³⁷ with linear ridge kernel regression (RR) and non-linear Gaussian kernel regression (GAUSS). We also used the R package glmnet ver. 1.9–8⁶¹ for three different linear regression methods: Ridge Regression (alpha = 0), Lasso (alpha = 1), and Elastic Net (alpha = 0.5). The latter is a combination of ridge regression and lasso. A non-linear decision tree-based ensemble learning method, Random Forest, was run in the R package randomForest ver. 4.6–10⁶². The Bayesian linear regression models Bayesian Ridge Regression (BRR), Bayesian Lasso (BL), BayesA, BayesB, and BayesC were also used for the modeling. These models were implemented in the R package BGLR ver. 1.0.3⁶³.

Because 14 of the 17 traits analyzed in this study were scored as ordinal categories (Table 1), seven regression models for the categorical data were also tested. The categorical data was obtained by rounding off the mean continuous values. We used the ordinal probit model-based BayesA and BayesB^{64, 65}, and the R package BGLR for BRR, BL, BayesA, BayesB, and BayesC. The BGLR package supports models for categorical traits.

To evaluate the effect of peak SNPs with high –log₁₀(p) values detected in GWAS, the accuracies of three models were compared. The first model was MLR implemented in the R function “lm”^{66, 67}. The top three peak SNPs with high –log₁₀(p) values of each trait were entered in MLR unless the correlation coefficient between the SNPs was ≥0.6 to prevent multicollinearity. The proportions of the variance explained by the top three peak SNPs for each trait were also estimated by the MLR model. The second model was GBLUP (RR) that treats the effects of the top three SNPs as fixed effects and the effects of other SNPs as random effects. The last model was the commonly used version of GBLUP (RR), which treats all SNPs as random effects.

To take into account the high observed heterozygosity (H _o) of the parental population (>0.50)²¹, a multi-kernel model that considers both additive and dominance genetic effects was compared with a single-kernel model that considers only additive effects. Both models were implemented in the R package of BGLR ver. 1.0.3⁶³. The additive relationship matrix (kinship matrix) was computed by the “A.mat” function of the R package rrBLUP, whereas the dominance relationship matrix was calculated as D in equation (6) in Vitezica et al.⁶⁸ using a modified “A.mat” function. The additive (\({\sigma }_{a}^{2}\)) and dominance (\({\sigma }_{d}^{2}\)) genetic variances and residual variance (\({\sigma }_{e}^{2}\)) were estimated by using the multi-kernel model. Narrow-sense heritability (h ²) of each trait was computed as the ratio of \({\sigma }_{a}^{2}\) to the total phenotypic variance \(({\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2})({h}^{2}={\sigma }_{a}^{2}/({\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2}))\).

To verify the models, we evaluated the prediction accuracy by 10-fold CV using the parental population randomly partitioned into each fold. The CV was repeated 5 times, and the same partition patterns were adapted to all prediction models in each CV. The prediction accuracy was defined as Pearson’s correlation coefficients (r) between observed and predicted genotypic values and regarded as r = 0 at r < 0.

To evaluate the prediction accuracy for the tails of the distribution of fruit weight, we employed the linear ridge kernel regression model GBLUP (RR) and the non-linear Gaussian kernel regression model GBLUP (GAUSS). All varieties except the large- and small-weight groups were used as a training set, whereas the large- and small-weight groups were used as a test set to assess the difficulty of the prediction of the distribution tails. The training set was also used for leave-one-out CV as a control. The prediction accuracy for the distribution tails was also evaluated for other traits. Rank-ordered 4-fold CV of the parental population was performed. The prediction accuracy was calculated in each partition. To show the tails of the distribution, phenotypic variations of the traits were visualized as jitter plots superimposed onto boxplots by using the R packages ggplot2 ver. 1.0.1⁵⁸ and Rmisc ver. 1.5⁶⁹.

To evaluate the prediction accuracy in the breeding population, all 12 methods were applied. Three different training populations for modeling were compared: only the parental population, only the breeding population (with one family excluded), and both the parental and breeding populations combined (with one family excluded); the phenotype of the excluded family was predicted to mimic actual breeding in which the phenotypic information of a family targeted by GS is generally not available. The prediction accuracies for each family or combined all families were calculated.

References

Saunt, J. Citrus varieties of the world. 2nd Ed. (Sinclair Intl. Limited, Norwich, England, 2000).
Economos, C. & Clay, W. D. Nutritional and health benefits of citrus fruits. Food, Nutr. Agric. 24, 11–18 (1999).
Google Scholar
Omura, M. & Shimada, T. Citrus breeding, genetics and genomics in Japan. Breed. Sci. 66, 3–17 (2016).
Article PubMed PubMed Central Google Scholar
Hodgson, R. W. Horticultural varieties of citrus. in The citrus industry. Vol. 1. (eds Reuther, W., Webber, H. J. & Batchelor, L. D.) 431–591 (University of California Press, Berkeley, USA., 1967).
Cooper, W. C., Reece, P. C. & Furr, J. R. Citrus breeding in Florida - Past, present and future. Proc. Florida State Hort. Soc. 75, 5–13 (1962).
Google Scholar
Okudai, N. Recent Trend in breeding medium- and late-maturing citrus in Japan. JARQ, Japan Agric. Res. Q. (1982).
Mitani, N., Matsumoto, R., Yoshioka, T. & Kuniga, T. Citrus hybrid seedlings reduce initial time to flower when grafted onto shiikuwasha rootstock. Sci. Hortic. 116, 452–455 (2008).
Article Google Scholar
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet. 12, 499–510 (2011).
Article ADS CAS PubMed Google Scholar
Brachi, B., Morris, G. P. & Borevitz, J. O. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol. 12, 232 (2011).
Article PubMed PubMed Central Google Scholar
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
CAS PubMed PubMed Central Google Scholar
Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).
Article CAS PubMed PubMed Central Google Scholar
Khan, M. A. & Korban, S. S. Association mapping in forest trees and fruit crops. J. Exp. Bot. 63, 4045–4060 (2012).
Article CAS PubMed Google Scholar
Jannink, J. L., Lorenz, A. J. & Iwata, H. Genomic selection in plant breeding: from theory to practice. Brief. Funct. Genomics 9, 166–177 (2010).
Article CAS PubMed Google Scholar
Lorenz, A. J. et al. Genomic selection in plant breeding: knowledge and prospects. Adv. Agron. 110, 77–123 (2011).
Article Google Scholar
Grattapaglia, D. & Resende, M. D. V. Genomic selection in forest tree breeding. Tree Genet. Genomes 7, 241–255 (2011).
Article Google Scholar
Poland, J. Breeding-assisted genomics. Curr. Opin. Plant Biol. 24, 119–124 (2015).
Article CAS PubMed Google Scholar
Curk, F. et al. Phylogenetic origin of limes and lemons revealed by cytoplasmic and nuclear markers. Ann. Bot. 117, 565–583 (2016).
Article PubMed PubMed Central Google Scholar
Heffner, E. L., Sorrells, M. E. & Jannink, J. L. Genomic selection for crop improvement. Crop Sci. 49, 1–12 (2009).
Article CAS Google Scholar
Flint-Garcia, S. A., Thornsberry, J. M. & Buckler, E. S. Structure of linkage disequilibrium in plants. Annu. Rev. Phys. Chem. 53, 319–348 (2002).
Article Google Scholar
Gupta, P. K., Rustgi, S. & Kulwal, P. L. Linkage disequilibrium and association studies in higher plants: Present status and future prospects. Plant Mol. Biol. 57, 461–485 (2005).
Article CAS PubMed Google Scholar
Shimizu, T. et al. Hybrid origins of citrus varieties inferred from DNA marker analysis of nuclear and organelle genomes. PLoS One 11(11), e0166969 (2016).
Article PubMed PubMed Central Google Scholar
Barnaud, A., Lacombe, T. & Doligez, A. Linkage disequilibrium in cultivated grapevine, Vitis vinifera L. Theor. Appl. Genet. 112, 708–716 (2006).
Article CAS PubMed Google Scholar
Calus, M. P. L. & Veerkamp, R. F. Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. J. Anim. Breed. Genet 124, 362–368 (2007).
Article CAS PubMed Google Scholar
Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
Article CAS PubMed PubMed Central Google Scholar
van den Berg, I., Boichard, D. & Lund, M. S. Comparing power and precision of within-breed and multibreed genome-wide association studies of production traits using whole-genome sequence data for 5 French and Danish dairy cattle breeds. J. Dairy Sci. 99, 8932–8945 (2016).
Article PubMed Google Scholar
de Roos, A. P. W., Hayes, B. J., Spelman, R. J. & Goddard, M. E. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics 179, 1503–1512 (2008).
Article PubMed PubMed Central Google Scholar
Yu, Y., Chen, C. & Gmitter, F. G. QTL mapping of mandarin (Citrus reticulata) fruit characters using high-throughput SNP markers. Tree Genet. Genomes 12, 77 (2016).
Article Google Scholar
Kita, M. et al. Cloning and expression of CitPDS1, a gene encoding phytoene desaturase in citrus. Biosci. Biotechnol. Biochem. 65, 1424–1428 (2001).
Article CAS PubMed Google Scholar
Kato, M. et al. Accumulation of carotenoids and expression of carotenoid biosynthetic genes during maturation in citrus fruit. Plant Physiol. 134, 824–837 (2004).
Article CAS PubMed PubMed Central Google Scholar
Norris, S. R., Barrette, T. R. & DellaPenna, D. Genetic dissection of carotenoid synthesis in arabidopsis defines plastoquinone as an essential component of phytoene desaturation. Plant Cell 7, 2139–2149 (1995).
Article CAS PubMed PubMed Central Google Scholar
Brandi, F. et al. Study of ‘Redhaven’ peach and its white-fleshed mutant suggests a key role of CCD4 carotenoid dioxygenase in carotenoid and norisoprenoid volatile metabolism. BMC Plant Biol 11, 24 (2011).
Article CAS PubMed PubMed Central Google Scholar
Adami, M. et al. Identifying a carotenoid cleavage dioxygenase (ccd4) gene controlling yellow/white fruit flesh color of peach. Plant Mol. Biol. Report. 31, 1166–1175 (2013).
Article CAS Google Scholar
Piršelová, B. & Matušíková, I. Callose: The plant cell wall polysaccharide with multiple biological functions. Acta Physiol. Plant. 35, 635–644 (2013).
Article Google Scholar
Sakurai, N. & Nevins, D. J. Changes in physical properties and cell wall polysaccharides of tomato (Lycoperskon esculentum) pericarp tissues. Physiol. Plant. 89, 681–686 (1993).
Article CAS Google Scholar
Sakurai, N. & Nevins, D. J. Relationship between fruit softening and wall polysaccharides in avocado (Persea americana Mill) mesocarp tissues. Plant Cell Physiol. 38, 603–610 (1997).
Article CAS Google Scholar
Loulakakis, K. A., Roubelakis-Angelakis, K. A. & Kanellis, A. K. Regulation of glutamate dehydrogenase and glutamine synthetase in avocado fruit during development and ripening. Plant Physiol. 106, 217–222 (1994).
Article CAS PubMed PubMed Central Google Scholar
Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome J. 4, 250–255 (2011).
Article Google Scholar
Morota, G. & Gianola, D. Kernel-based whole-genome prediction of complex traits: A review. Front. Genet 5, 1–13 (2014).
Google Scholar
Xu, S., Zhu, D. & Zhang, Q. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc. Natl. Acad. Sci. USA 111, 12456–12461 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, Y., Zeng, J., Fernando, R. & Reif, J. C. Genomic prediction of hybrid wheat performance. Crop Sci. 53, 802–810 (2013).
Article Google Scholar
Wellmann, R. & Bennewitz, J. Bayesian models with dominance effects for genomic evaluation of quantitative traits. Genet. Res. 94, 21–37 (2012).
Article Google Scholar
Wu, G. A. et al. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat. Biotechnol. 32, 656–662 (2015).
Article Google Scholar
Xu, Q. et al. The draft genome of sweet orange (Citrus sinensis). Nat. Genet. 45, 59–66 (2013).
Article CAS PubMed Google Scholar
Pszczola, M., Strabel, T., Mulder, H. A. & Calus, M. P. L. Reliability of direct genomic values for animals with different relationships within and to the reference population. J. Dairy Sci. 95, 389–400 (2012).
Article CAS PubMed Google Scholar
Kumar, S. et al. Genomic selection for fruit quality traits in apple (Malus × domestica Borkh.). PLoS One 7(5), e36674 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Muranty, H. et al. Accuracy and responses of genomic selection on key traits in apple breeding. Hortic. Res. 2, 15060 (2015).
Article PubMed PubMed Central Google Scholar
Fodor, A. et al. Genome-wide prediction methods in highly diverse and heterozygous species: Proof-of-concept through simulation in grapevine. PLoS One 9(11), e110436 (2014).
Article ADS PubMed PubMed Central Google Scholar
Zhao, Y. et al. Accuracy of genomic selection in European maize elite breeding populations. Theor. Appl. Genet. 124, 769–776 (2012).
Article PubMed Google Scholar
Ibanez-Escriche, N., Fernando, R. L., Toosi, A. & Dekkers, J. C. Genomic selection of purebreds for crossbred performance. Genet. Sel. Evol. 41, 12 (2009).
Article PubMed PubMed Central Google Scholar
de Roos, A. P. W., Hayes, B. J. & Goddard, M. E. Reliability of genomic predictions across multiple populations. Genetics 183, 1545–1553 (2009).
Article PubMed PubMed Central Google Scholar
Iwata, H., Minamikawa, M. F., Kajiya-Kanegae, H., Ishimori, M. & Hayashi, T. Genomics-assisted breeding in fruit trees. Breed. Sci. 66, 100–15 (2016).
Article PubMed PubMed Central Google Scholar
Bates, D., Maechler, M., Bolker, B. & Walker, S. lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1–7. http://cran.r-project.org/package=lme4 (2014).
Shimizu, T. et al. A genomic approach to selecting robust and versatile SNP sets from next-generation sequencing data for genome-wide association study in citrus cultivars. Acta Hortic. 1135, 23–32 (2016).
Article Google Scholar
Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wand, M. P. & Jones, M. C. Kernel smoothing. (Chapman & Hall, 1995).
Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).
Article MathSciNet Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer, 2009).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Article CAS PubMed Google Scholar
Shin, J.-H., Blay, S., McNeney, B. & Graham, J. LDheatmap: An R Function for Graphical Display of Pairwise Linkage Disequilibria between Single Nucleotide Polymorphisms. J. Stat. Softw. 16, 1–10 (2006).
Article Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article PubMed PubMed Central Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article MATH Google Scholar
Perez, P. BGLR: A statistical package for whole genome regression and prediction. Genetics 198, 483–495 (2014).
Article PubMed PubMed Central Google Scholar
Iwata, H., Ebana, K., Fukuoka, S., Jannink, J. L. & Hayashi, T. Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasms. Theor. Appl. Genet. 118, 865–880 (2009).
Article PubMed Google Scholar
Iwata, H. et al. Potential assessment of genome-wide association study and genomic selection in Japanese pear Pyrus pyrifolia. Breed. Sci. 63, 125–140 (2013).
Article PubMed PubMed Central Google Scholar
Chambers, J. M. Linear models. in Statistical Models in S. (eds Chambers, J. M. & Hastie, T. J.) 95–144 (Wadsworth & Brooks/Cole, 1992).
Wilkinson, G. N. & Rogers, C. E. Symbolic descriptions of factorial models for analysis of variance. J. R. Stat. Soc. Ser. C (Applied Stat.) 22, 392–399 (1973).
Google Scholar
Vitezica, Z. G., Varona, L. & Legarra, A. On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics 195, 1223–1230 (2013).
Article PubMed PubMed Central Google Scholar
Hppe, R. M. Rmisc: Ryan miscellaneous. R package version 1.5. https://cran.r-project.org/package=Rmisc (2013).

Download references

Acknowledgements

This research was supported by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics-based Technology for Agricultural Improvement, NGB-1006, 2009 and 2010) and was partially supported by the Transdisciplinary Research Integration Center Project of the Research Organization of Information and Systems (ROIS), Japan. Data analysis was partially performed using the ROIS NIG Supercomputer System.

Author information

Authors and Affiliations

Laboratory of Biometry and Bioinformatics, Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
Mai F. Minamikawa, Hiromi Kajiya-Kanegae, Akio Onogi & Hiroyoshi Iwata
Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization (NARO), 485-6 Okitsu Nakacho, Shimizu, Shizuoka, 424-0292, Japan
Keisuke Nonaka, Shingo Goto, Terutaka Yoshioka, Hiroko Hamada & Tokurou Shimizu
Genome Informatics Laboratory, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Shizuoka, 411-8540, Japan
Eli Kaminuma & Yasukazu Nakamura
Institute of Fruit Tree and Tea Science, NARO, 2-1 Fujimoto, Tsukuba, Ibaraki, 305-8605, Japan
Atsushi Imai
Institute of Crop Science, NARO, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8518, Japan
Takeshi Hayashi
Institute of Crop Science, NARO, 1-2 Ohwashi, Tsukuba, Ibaraki, 305-8634, Japan
Satomi Matsumoto & Yuichi Katayose
Comparative Genomics Laboratory, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Shizuoka, 411-8540, Japan
Atsushi Toyoda
Advanced Genomics Center, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Shizuoka, 411-8540, Japan
Atsushi Toyoda & Asao Fujiyama

Authors

Mai F. Minamikawa
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Nonaka
View author publications
You can also search for this author in PubMed Google Scholar
Eli Kaminuma
View author publications
You can also search for this author in PubMed Google Scholar
Hiromi Kajiya-Kanegae
View author publications
You can also search for this author in PubMed Google Scholar
Akio Onogi
View author publications
You can also search for this author in PubMed Google Scholar
Shingo Goto
View author publications
You can also search for this author in PubMed Google Scholar
Terutaka Yoshioka
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Imai
View author publications
You can also search for this author in PubMed Google Scholar
Hiroko Hamada
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Satomi Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Yuichi Katayose
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Toyoda
View author publications
You can also search for this author in PubMed Google Scholar
Asao Fujiyama
View author publications
You can also search for this author in PubMed Google Scholar
Yasukazu Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Tokurou Shimizu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyoshi Iwata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.F.M., K.N., T.S., and H.I. conceived and designed the study. K.N., T.Y., A.I., and H.H. performed phenotyping. S.G. and T.S. extracted DNA. T.S. and E.K. designed the SNP genotyping array. E.K., S.G., S.M., Y.K., A.T., A.F., Y.N., and T.S performed SNP genotyping. M.F.M., H.K.K., A.O., T.H., and H.I. performed genomic prediction and genome-wide association study. M.F.M., K.N., and T.S. wrote the original manuscript. K.N., H.K.K., A.O., T.S., and H.I. helped to edit the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Hiroyoshi Iwata.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary information

Dataset 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Minamikawa, M.F., Nonaka, K., Kaminuma, E. et al. Genome-wide association study and genomic prediction in citrus: Potential of genomics-assisted breeding for fruit quality traits. Sci Rep 7, 4721 (2017). https://doi.org/10.1038/s41598-017-05100-x

Download citation

Received: 03 February 2017
Accepted: 24 May 2017
Published: 05 July 2017
DOI: https://doi.org/10.1038/s41598-017-05100-x

Genome-wide association study and genomic prediction in citrus: Potential of genomics-assisted breeding for fruit quality traits

Subjects

Abstract

Similar content being viewed by others

Genome-wide association studies provide insights into the genetic determination of fruit traits of pear

Genome-wide association study and candidate gene identification for agronomic traits in 182 upward-growing fruits of C. frutescens and C. annuum

Genome-wide association study identifies QTL for eight fruit traits in cultivated tomato (Solanum lycopersicum L.)

Introduction

Results

Linkage disequilibrium and population structure

Genome-wide association study

Accuracy of prediction models for 17 fruit traits in the parental population

Prediction for distribution tails

Prediction for the breeding population

Discussion

Methods

Plant materials and fruit assessment

SNP genotyping data

Estimation of linkage disequilibrium

Population structure analysis

Regression models for GWAS

Genomic prediction models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Supplementary information

Dataset 1

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Genome-wide association studies provide insights into the genetic determination of fruit traits of pear

Genome-wide association study and candidate gene identification for agronomic traits in 182 upward-growing fruits of C. frutescens and C. annuum

Genome-wide association study identifies QTL for eight fruit traits in cultivated tomato (Solanum lycopersicum L.)

Introduction

Results

Linkage disequilibrium and population structure

Genome-wide association study

Accuracy of prediction models for 17 fruit traits in the parental population

Prediction for distribution tails

Prediction for the breeding population

Discussion

Methods

Plant materials and fruit assessment

SNP genotyping data

Estimation of linkage disequilibrium

Population structure analysis

Regression models for GWAS

Genomic prediction models

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Supplementary information

Dataset 1

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links