[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Comparative Effects of Vitamin D Supplementation on Oxidative Stress in Relapsing–Remitting Multiple Sclerosis
Previous Article in Journal
Periplosides Extract from Cortex periplocae Improve Collagen Antibody-Induced Arthritis by Regulating Macrophage Polarization
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Kinship and Population Genetic Structure of 53 Apricot Resources Based on Whole Genome Resequencing

College of Forestry, Inner Mongolia Agricultural University, Hohhot 010019, China
*
Author to whom correspondence should be addressed.
Curr. Issues Mol. Biol. 2024, 46(12), 14106-14118; https://doi.org/10.3390/cimb46120844
Submission received: 4 November 2024 / Revised: 8 December 2024 / Accepted: 10 December 2024 / Published: 13 December 2024
(This article belongs to the Section Molecular Plant Sciences)
Figure 1
<p>Statistics on the number of different types of single base substitutions.</p> ">
Figure 2
<p>Heat map of sample genetic relationships. Pairs from the first sample to the last. The larger the value, the closer to red; that is, the closer the relationship between two individuals.</p> ">
Figure 3
<p>Comparison of genetic relationship matrices and genetic distances. DT: Sample from Datong City, Shanxi Province; HS: from Hohhot, Inner Mongolia; LZ: samples from Lanzhou City, Gansu Province; TG: from Jinzhong City, Shanxi Province; SX: from Yulin City, Shaanxi Province. A total of 53 common apricot varieties were classified into four taxa (Q1, Q2, Q3, and Q4).</p> ">
Figure 4
<p>PCA, principal component analysis. Each color represents a group of species. S1 (Red): Group I; S2 (Orange): Group II; S3 (Green): Group III; S4 (Blue): Group IV.</p> ">
Figure 5
<p>Cross-validation error rate line chart.</p> ">
Figure 6
<p>Analysis of SNP population structure in 53 common apricot varieties (K = 4). Each column of vertical grids represents the genetic background of a sample; each color block represents an estimated ancestor, and the proportion of the vertical grid occupied by each color block represents the proportion of that ancestor that contributes to the genetic background of that sample. Red for G1; Orange for G2; Green for G3; Blue for G4.</p> ">
Figure 7
<p>Three-dimensional PCA distribution maps of different groups. (<b>A</b>) The three-dimensional PCA of the samples when the varieties are divided into four groups; (<b>B</b>) the three-dimensional PCA of the samples when the varieties are divided into six groups. Varieties with the same color in the PCA plot were considered to be in the same line.</p> ">
Versions Notes

Abstract

:
Based on the single nucleotide polymorphism (SNP) markers developed by whole genome resequencing (WGRS), the relationship and population genetic structure of 53 common apricot (P. armeniaca) varieties were analyzed to provide a theoretical basis for revealing the phylogenetic relationship and classification of the common apricot. WGRS was performed on 53 common apricot varieties, and high-quality SNP sites were obtained after alignment with the “Yinxiangbai” apricot genome as a reference. Phylogenetic analysis, G matrix analysis, principal component analysis, and population structure analysis were performed using Genome-wide Complex Trait Analysis (GCTA), FastTree, Admixture, and other software. The average comparison ratio between the sequencing results and the reference genome was 97.66%. After strict screening, 88,332,238 high-quality SNP sites were finally obtained. Based on the statistical SNP variation type, it was found that LNLJX had the largest number of variations (3,951,322) and the lowest base transition/base transversion ratio (ts/tv = 1.77), indicating that its gene exchange events occurred less frequently. Based on the SNP point estimation of the relationship and genetic distance between samples, the relationship between species was 1.41–0.01, among which PLDJX and BK1 had the closest relationship of 1.41, and YZH and LGWSX had the farthest relationship of 0.01. The genetic distance between species was 0.00367–0.264344, the genetic distance between HMX and JM was the closest, and the genetic distance between WYX and YX was the farthest, which was the largest. Phylogenetic tree, PCA, and genetic structure analysis results all divided 53 common apricot varieties into four groups, and the classification results were consistent. The SNP markers mined using WGRS technology are useful not only to analyze the variation of common apricots, but also to effectively identify their kinship and genetic structure, which plays a critical role in the classification and utilization of common apricot germplasm resources.

1. Introduction

Apricot (Armeniaca vulgaris Lam.) belongs to the Armeniaca genus plant of the Rosaceae (Rosaceae) and is a diploid temperate deciduous tree (2n = 16) with a genome size of about 245.07~291.59 Mb [1]. There are 10 species of apricot in the world. Among them, the most widely planted, the oldest, and the most diversified is the common apricot (P. armeniaca), which has a history spanning more than 3000 years. Owing to its long history and frequent germplasm exchange between taxa, it is also one of the least understood fruiting resources [2,3]. Due to their strong adaptability, resistance to light, cold, drought, and sand, as well as their high economic benefit, apricots are widely planted in the “three near-north regions” of China. It is important that common apricot relatives can be distinguished.
The study of the genetic diversity and genetic structure of species can not only reveal the level of genetic variation, spatial and temporal distribution, and the species’ relationship with the surrounding ecosystem, but can also aid analysis of a species’ evolutionary history and evolutionary potential, providing important information for their future development and a scientific basis for species protection and resource development and utilization [4]. By using SSR molecular markers and fluorescent capillary electrophoresis detection technology, the genetic diversity, kinship, and genetic structure of common apricot germplasm resources were analyzed, and the kinship and genetic background of some important apricot germplasm resources were clarified [5]. Yang et al. found that Limeixing has closer convergence with Li by analyzing the PCR amplification of the RAPD and S alleles of Limei apricot and other Prunus salicina and Prunus mume varieties around the world [6]. Qiuping Zhang investigated the genetic diversity and population structure of 67 north China ecological groups of ordinary apricots using SSR technology. It was found that when K = 4, except the kernel apricot, the north China ecological group of common apricots can be divided into three subgroups. Moreover, the north China ecological group of ordinary apricots has rich genetic diversity, while the kernel apricot has a narrow genetic basis, but has more unique allelic variation and a unique blood relationship, The same geographical origin is not necessarily observed for the same group [7].
With the rapid development of sequencing technology and sequencing cost reduction, SNP markers are being increasingly widely applied to plant population genetics research. Based on WGRS, individual and population genetic variation site datasets can be analyzed to understand the differences between individuals and groups. This technique has been widely used in trait gene positioning, genetic map construction, origin evolution, and other aspects of genetic study. Revealing genetic diversity and kinship is essential for conserving germplasm resources, improving breeding efficiency, and enhancing competitiveness. Accurate assessment of relatives and genealogical records is critical to avoid inbreeding, maintain genetic diversity, improve species adaptability and viability, guide breeding decisions, ensure the speed and quality of new breed development, and ensure the stability and economic efficiency of agricultural output. The small size and simple structure of the genomes of fruit trees such as peach [8], plum [9], and cherry [10] trees have attracted the attention of researchers. The first high-quality genome of apricot was released in 2019, with a size of 221.9 Mb, including an overlapping cluster NG 50 with a size of 1.02 Mb, and BUSCO analysis showed that up to 98.0% of complete genes could be detected in the assembly, resulting in a predicted 30,436 protein-coding genes [11]. At the same time, it was revealed for the first time that the NCED gene in the β -carotenoid metabolic extension pathway is the key gene to regulate the color formation of apricot pulp, which provides a strong reference for apricot gene identification and breeding strategy. Zhang et al. selected “silver fragrant white” apricot as sequencing material, and used three-generation sequencing technology, second-generation data error correction, and HiC sequencing for sequence assembly, obtaining a high-quality 8-apricot chromosome sequences with a genome size of 251.19 Mb, heterozygosity of 0.99%, and annotation of 29,230 protein coding genes. The repeat sequence in this variety genome was 46.78%, significantly higher than the 38.28% observed in the “Chuanzhi” apricot [12].
In this study, 53 common apricot varieties were collected from Shaanxi, Shanxi, Inner Mongolia, and Gansu. By applying WGRS technology and SNP as markers, comprehensive mining genetic variation information revealed the different sources of common apricot germplasm and genetic diversity, providing evidence for variety breeding and efficient use in the future.

2. Materials and Methods

2.1. Materials

A total of 53 common apricot varieties were obtained from breed bases in different regions of China, covering most of the typical Chinese apricot varieties. A total of 13 varieties from Hohhot of Inner Mongolia (HS), 8 from Jinzhong City of Shanxi Province (TG), 16 from Datong City of Shanxi Province (DT), 11 from Yulin City of Shaanxi Province (SX), and 5 from Lanzhou City of Gansu Province (LZ) were collected. Samples comprising 2 g of fresh, healthy, and pest-free young leaves were selected as test samples and sent to the sequencing company for genome sequencing.

2.2. DNA Extraction

Using the E.Z.N.A. Tissue DNA Kit, we first completed the digestion and lysis of tissue samples within 30 min, then added 500 μL of RB Buffer pre-mixed with 2-mercaptoethanol. The sample was purified using a gDNA Filter Column and a HiBind® RNA Mini Column, which included centrifugation at 14,000× g for 5 min, transferring the clear lysate, adding 0.5 volumes of ethanol, centrifugation at 12,000× g for 1 min, repeating the sample transfer, and washing steps with 400 μL of RWF Wash Buffer and 500 μL of RNA Wash Buffer II, ultimately obtaining pure DNA (>3 µg; concentration > 30 ng/µL; OD 260/OD 280 = 1.80~2.00).

2.3. Library Preparation and Sequencing

To prepare for Illumina pair-end sequencing, we required at least 3 μg of genomic DNA per sample to create libraries with inserts around 450 bp. Following Illumina’s protocol, DNA was fragmented by Covaris, blunt-ended with T4 DNA polymerase, and ‘A’ bases were added to the 3′ ends for adapter ligation. After purifying the desired fragments by gel-electrophoresis, we enriched and amplified them with PCR, incorporating index tags. The libraries were then quality-checked and sequenced on the Illumina NovaSeq 6000 platform (150 bp*2 paired-end reads) by Shanghai Biozeron Biotechnology Co., Ltd. (Shanghai, China) .The filtered valid data were aligned with the reference genome (the published “Yinxiangbai” apricot genome, (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_020424065.1/ (accessed on 3 December 2024)) using BWA v0.7.12-r1039 software. Sequencing depth and coverage were calculated using custom Perl scripts, and SNPs and short InDels were detected via the “Haplotype Caller” function of GATK v4.1.2.0 software using a valid BAM file. The Variant Call Format (VCF) files were generated by quality filtering, and the SV of samples was detected using the software BreakDancer v1.1.2 to obtain high-quality SNPs for genetic diversity analysis.

2.4. Data Analysis

Based on the filtered markers, kinship analysis was performed using GCTA v1.93.2 software to calculate genetic distances and obtain a G matrix (genetic relationship matrix) between two samples. The evolutionary tree was constructed by FastTree v2.1.10 software (neighbor-joining methods, model: p-distance). Both the eigenvalue (Eigen value) and the eigenvector (Eigen vector) were calculated to draw the PCA map. Population structure analysis was performed using Admixture v1.3.0 software, and the K values were taken as 2 to 10. The best K value was determined based on the crossover error rate.

3. Results

3.1. DNA and Sequencing Quality Control

DNA was extracted from the complete and pest-free leaves of healthy plants of 53 apricot varieties. The concentration ranged from 16.5 and 60.6 ng/μL, with an average concentration of 27.81 ng/μL. The total amount was between 1.65 and 6.06 μg, with an average value of 2.78 μg. After electrophoresis, there were no impurities such as protein or pigment, and qualified DNA was used for library sequencing. According to the quality control analysis of the sequence data, the Q20 value ranged from 97.08% to 98.52%, with a mean value of 98.04%; Q30 ranged from 91.78 to 95.63%, with a mean value of 93.81%; and the GC value was between 37.91% and 41.44%, with a mean value of 39.61%. All the above results indicate that the sequencing quality of this study was high and met the data requirements. Using the genome of “Yinxiangbai” apricot as the reference genome, the 53 samples had a sequencing depth between 7.31~80.01×, and an average sequencing depth of 17.29×. The maximum alignment value was 98.1%, with a minimum value of 95.87%, and a mean value of 97.66%, The average coverage was 90.50%. The above results are shown in Table 1, and the similarity of each sample to the reference genome met the criteria for subsequent analysis.

3.2. Variant Analysis

A total of 88,332,238 SNP sites and 13,934,301 InDel sites were obtained by annotating the SNP and InDel sites obtained from variant detection using GATK. SNP point variation ranged from 939,061 to 3,548,841. Among the SNPs, 54.8% (48,392,511) were located in intergenic regions, 7.3% (6,436,463) were in exons, 16.8% (14,853,605) were intronic, 10.1% (8,897,323) were 1 Kb upstream of the gene, and 9.3% (8,246,616) were 1 Kb downstream of the gene (Table 2). Variations in the coding region gene may cause changes in traits by changing the amino acid sequence. Among them, 3,028,064 SNPs were synonymous mutations, meaning that base substitutions would not cause amino acid alterations. In contrast, 3,323,008 SNPs were non-synonymous mutations, indicating that base substitutions would lead to changes in coding amino acids. The range of InDel variation was observed to be between 149,830 and 402,481. The deletion variant occurred between 1379 and 4516, and the insertion variant occurred between 737 and 2743. Further details regarding InDel can be found in Table 3. The magnitude of selection pressure acting on genes can be assessed by calculating the ratio of nonsynonymous and synonymous mutations. Results greater than 1 indicate that the gene is under positive selection pressure, while a result of less than 1 indicates that the gene is under negative selection pressure. A result equal to 1 indicates that the gene is under neutral selection pressure [13]. The ratio of nonsynonymous mutations (3,323,008) to synonymous mutations (3,028,064) was 1.10, indicating that the gene is under positive selection pressure. SNP variant types were annotated (Figure 1), and 65% (85,795,736) of SNPs were base converted (ts), while 35% (45,528,853) were base switched (tv). The SNP variant types were dominated by base conversion. The base conversion/base subversion value (ts/tv) was 1.88, indicating that the SNP variant types were mostly dominated by variation between nucleotides of the same type, among which LNLJX had the highest number of variants (3,951,322) and the lowest ts/tv value (1.77), indicating that fewer gene exchange events occurred in it (Supplementary Files S1 and S2).

3.3. G Matrix Analysis

The genomic relatedness of the G matrix can be used to resolve the influence of factors involving unclear population genealogy or unclear ancestors on the population structure analysis [14]. Based on the selected markers, GCTA v1.93.2 software was used to obtain a binary G matrix (genetic relationship matrix). Larger G values indicate a closer relationship between breeds. A heatmap was drawn using the G matrix, resulting in Figure 2. A total of 1378 relatedness values were obtained from the relationship analysis (Supplementary File S3), ranging from 1.41 to 0.01, with an average of 0.18, in which PLDJX and BK1 were the closest related (G = 1.41), and yzh and LGWSX had the most distant relationship (G = 0.01).

3.4. Cluster Analysis

The genetic distance was calculated by GCTA v1.93.2 software (Supplementary File S4). The genetic distance between the 53 common apricot varieties ranged between 0.00367 to 0.264344. Among them, the genetic distance between WYX and YX was the greatest (0.264344), and the genetic distance between HMX and JM was the closest (0.00367). The NJ phylogenetic tree was constructed using FastTree 2.1.10. The results are shown in Figure 3. Firstly, through cluster analysis, the 53 varieties were divided into four groups named Q1, Q2, Q3, and Q4, colored in red, orange, green, and blue, respectively. Q1 consisted of 12 varieties with a genetic distance range of 0.00367 to 0.23945. In total, five varieties were collected from HS, two from LZ, three from SX, and two from TG. The genetic distance between HMX and JM was the closest among all the varieties. Q2 was composed of 18 varieties, with a genetic distance range of 0.005383 to 0.252468. Eight varieties were collected from DT, four from HS, one from LZ, four from SX, and one from TG. Among the varieties, BK1 and XBX exhibited the closest genetic distance. Q3 was composed of 8 varieties, with observed genetic distances spanning a range from 0.004959 to 0.228316. Six varieties originated from DT; and Two from TG. RTJX and YTJX exhibited the closest genetic distance to each other. Q4 consisted of 15 varieties, and the genetic distance range was 0.004558 to 0.236309. Two varieties originated from DT; Four from HS; Two from LZ; four from SX; three from TG.

3.5. Kinship Relationship Analysis

Both G matrices and phylogenetic trees can be used to infer population structure and relatedness among varieties [14,15]. By comparing the results of these two analysis methods, we can determine the real situation of kinship.
The results of genetic relationship analysis based on G matrix and genetic distance showed that some individuals had a close genetic relationship. The top 30 G matrices with a close genetic relationship and the genetic relationship combination of cluster analysis are listed, respectively, in Table 4. It was found that 26 varieties appeared in both analyses, but the rankings were slightly different.

3.6. Principal Component Analysis

The kinship of 53 common apricot varieties was analyzed by principal component analysis, and the evolution of different apricot varieties was classified. According to the PCA two-dimensional clustering diagram (Figure 4), PCA1 and PCA2 explained 11.95% and 9.25% of the total variance, respectively. The 53 varieties were divided into four groups: S1 (red), S2 (orange), S3 (green), and S4 (blue). S1 was discovered to be far away from the other three, so S1 is significantly different from S2, S3, and S4. The distance between S1, S2, and S3 is close, indicating that the similarity is high.

3.7. Population Genetic Structure Analysis

To further understand the evolutionary history and kinship of apricot, the required atlas files were generated using PLINK v1.9. Population genetic structure analysis of the 106,002,914 SNP loci obtained was performed using Admixture v1.3.0 software. The number of clusters (K value) was assumed to be 2 to 9, and clustering was then carried out. According to the cross-validation error rate (C ross-validation error, CV error), the optimal number of clusters was determined. The CV value with the smallest cross-validation error rate corresponded to the optimal number of clusters [16]. The results are shown in Figure 5. The CV value was the smallest when K = 4, indicating that the 53 common apricot samples were divided into four clusters that closely reflected the real situation of the group. As shown in Figure 6, the four clusters were named G1, G2, G3, and G4, and the main color segments were red, orange, green, and blue, respectively. The results are shown in Figure 6. There were 11 varieties in G1 (red) with pure genetic backgrounds, namely, FYZH, HBX, HMX, JM, LZDJX, LSX, SXDJX, TPH, YZH, XX, and CH. There were 16 varieties in G2 (orange), of which 6 varieties had pure genetic backgrounds: BK1, DTB, PLDJX, XBX, YX, and ZSLGX. The other varieties contained a variety of genetic components and were mixed. There were 22 varieties in G3 (green), of which 6 varieties had pure genetic backgrounds: BX, DWDJ-3, HJTX, HTX, LGWSX, LTH, MDJ, YGDJX, WYX, and JMX. Other varieties contained multiple genetic components and were hybrids. G4 (blue) had four varieties with pure genetic backgrounds, namely, GFX, HAMX, RTJX, and YTJX.
When the optimal number of clusters was determined according to the CV error, it was found that the CV value was also small when the 53 common apricot samples were divided into six lineage populations. Therefore, the PCA three-dimensional graph was used for reverse verification. As shown in Figure 7, when the samples were divided into six groups, it was more chaotic than when they were divided into four groups. Thus, the grouping result was optimal when the sample was divided into four lineage populations.

4. Discussion

The cultivation history of apricot in China can be traced back to 3000 years ago. Due to the frequent blind germplasm exchange among many local varieties, the genetic background of apricot varieties is unknown and their kinship relationships are less understood [5]. At the same time, due to the large number of varieties of germplasm resources varieties in breeding units, deviations in genealogy or variety name records are inevitable during introduction, propagation, and seed preservation, leading to results exhibiting different names for the same object, as well as identical names for different entities [17]. This has led to genetic variation, the determination of kinship, the establishment of core breeding groups, and the exploration of high-quality germplasm resources. Therefore, it is particularly important to collect, organize, and analyze the genetic diversity of common apricot varieties.
Compared with other molecular marker techniques, SNP analysis is based on a single nucleotide. SNPs are widely used in the analysis of population genetic diversity due to their wide distribution, large number, and strong genetic stability. However, it has high cost, is prone to false positives, and has a high error rate [18]. Therefore, filtering standards need to be strictly controlled. In this study, 53 high-quality common apricot germplasm resources were selected, using the “Yinxiangbai” apricot genome as the reference genome. A total of 88,332,238 high-quality SNPs were identified, with an average sequencing depth of 17.29, a Q20 of 98.04%, a Q30 of 93.81%, and a GC value of 39.61%. The ratio of nonsynonymous to synonymous mutations was 1.10, which was greater than that of peach [19]. The ts/tv value of 1.89 indicated that the SNP variant types were mostly those between the same nucleotides, compared with wild tea trees [20] and bubble tung [21]. The research results were consistent. LNLJX had the highest number of variants and the lowest ts/tv values, indicating fewer gene exchange events.
Our analysis was based on the construction method of genome relationships via the G matrix, proposed by Van Raden [22]. Through G matrix analysis, G values between 1.41 and 0.01 were found, and the relationship between PLDJX and BK1 was determined close. Through clustering analysis of 53 apricot varieties divided into four groups, genetic distances between 0.264344 to 0.00367 were discovered, with an average of 0.225. With the exception of the fourth group, which had a relatively similar geographical location, varieties from the five geographical locations were distributed across the other three groups. HMX and JM were found to be closely related, followed by CH and JM, JMX and YGDJX. Comparing the results of the two analyses, although the order of affinities was different, both indicated that some of the individuals had closer affinities.
The results of this study are similar to those obtained by Zhang et al. Common apricots were divided into four groups, and the results of PCA analysis and population genetic structure analysis were highly consistent with our analyses [7]. The analysis of phylogenetic trees was slightly different. In the individual evolutionary tree, XDB of Group 1 (Q1) and FY29 and KT of Group 2 (Q2) were divided into Group 3 (G3) for population genetic structure analysis, which may be due to the complex genetic structure of these three varieties.
From the perspective of geographical distribution, neither population structure analysis nor the phylogenetic tree classified geographically similar species into categories, which is consistent with the results of Gao et al. According to their research, it is possible that the genetic communication between apricot groups is too frequent [23].

5. Conclusions

In this study, 96,765,900 high-quality SNPs were obtained by whole-genome resequencing analysis of 53 common apricot varieties. These SNPS are not only useful for analyzing the variations in common apricots, but also help to study the species characteristics and population evolution of apricots and locate the gene loci of target traits. This enables us to quickly discover the genetic variations related to important traits in apricots, which can be applied in molecular breeding and can shorten the breeding cycle. The relationships and genetic structures for 53 common apricot varieties were also determined. This provides a theoretical basis for the development and identification of allelic varieties with phenotypic effects in the future.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cimb46120844/s1, Supplementary Files S1–S4.

Author Contributions

Q.X.: sampling, data processing, and writing; review and guidance from Y.H. and J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

Thanks to the Fund for Germplasm innovation and the new variety breeding fund of main fruit trees in cold and arid regions, and the Special forest fruit ‘Mengxing No.1‘ Siberian apricot variety demonstration and promotion fund for the project donations (No. 2021GG0034 and No. 2022YFXZ0024).

Data Availability Statement

All data are available upon reasonable request.

Acknowledgments

Thanks to Yang Xudong, Deputy Director of the Conservation Center of Qingshuihe County Forestry and Grassland Bureau, and Wu Jianxin from the Forestry and Grassland Work Station of Inner Mongolia Autonomous Region for their help in the sampling and experiment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, J.; Yang, L.; Jiang, F.; Zhang, M.; Sun, H.; Wang, Y. Estimation of Genome Size of Apricots Based on Flow Cytometry. Acta Agric. Boreali-Sin. 2020, 35, 32–38. [Google Scholar] [CrossRef]
  2. Zhang, J.Y.; Zhang, Z. Chinese Fruit Tree Annals-Apricot Rolls; China Forestry Publishing House: Beijing, China, 2003. [Google Scholar]
  3. Xia, L.; Song, W.; Zhang, W.; Huang, Z.; Chen, L.; Cui, Z.; Chen, Y. Zaohongyan, a superior early-maturing and firmness flesh Apricot cultivar. J. Fruit Sci. 2021, 38, 1207–1210. [Google Scholar] [CrossRef]
  4. Zhao, Q.F.; Huang, L.; Wang, M.; Liu, X.; Zhang, X.; Ma, X. Genetic relationship and population genetic structure analysis of 100 accessions of grape germplasm resources based on SSR markers. J. Fruit Sci. 2021, 38, 1217–1230. [Google Scholar] [CrossRef]
  5. Liu, Z.; Chen, X.; Wang, D.; Jing, C.; Wu, X. Genetic diversity analysis of Prunus armeniaca L. germplasm resources based on SSR markers. Mol. Plant Breed. 2023, 1–22. Available online: https://kns.cnki.net/kcms2/detail/46.1068.S.20230724.1435.007.html (accessed on 3 December 2024).
  6. Yang, H.H.; Chen, X.; Feng, B.; Wu, Y. Assessment of Prunus armeniaca limeixing germplasm by RAPD. J. Fruit Sci. 2007, 24, 303–307. [Google Scholar] [CrossRef]
  7. Zhang, Q.; Liu, D.; Liu, W.; Liu, S.; Zhang, A.; Liu, N.; Zhang, Y. enetic Diversity and Population Structure of the North China Populations of Apricot(Prunus armeniaca L.). Sci. Agric. Sin. 2013, 46, 89–98. [Google Scholar] [CrossRef]
  8. Verde, I.; Abbott, A.G.; Scalabrin, S.; Jung, S.; Shu, S.; Marroni, F.; Zhebentyayeva, T.; Dettori, M.T.; Grimwood, J.; Cattonaro, F.; et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 2013, 45, 487–494. [Google Scholar] [CrossRef] [PubMed]
  9. Gao, Z.; Ni, X. The Chloroplast Genome of Prunus mume; The Prunus Mume Genome. Compendium of Plant Genomes; Springer: Berlin/Heidelberg, Germany, 2019; pp. 85–91. [Google Scholar]
  10. Shirasawa, K.; Isuzugawa, K.; Ikenaga, M.; Saito, Y.; Yamamoto, T.; Hirakawa, H.; Isobe, S. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding. DNA Res. 2013, 24, 499–508. [Google Scholar] [CrossRef] [PubMed]
  11. Jiang, F.; Zhang, J.; Wang, S.; Yang, L.; Luo, Y.; Gao, S.; Zhang, M.; Wu, S.; Hu, S.; Sun, H.; et al. The apricot (Prunus armeniaca L.) genome elucidates Rosaceae evolution and beta-carotenoid synthesis. Hortic. Res. 2019, 6, 128. [Google Scholar] [CrossRef] [PubMed]
  12. Zhang, Q.; Zhang, D.; Yu, K.; Ji, J.; Liu, N.; Zhang, Y.; Xu, M.; Zhang, Y.; Ma, X.; Liu, S.; et al. Frequent germplasm exchanges drive the high genetic diversity of Chinese-cultivated common apricot germplasm. Hortic. Res. 2021, 8, 215. [Google Scholar] [CrossRef] [PubMed]
  13. Chen, W.; Niu, L.N.; Li, Z.W.; Xue, T.; Chai, M.; Luo, Y.; He, J. Genetic diversity of Strawberry Mottle virus based on cp geness. Acta Agric. Boreali-Occident. Sin. 2023, 32, 2007–2013. [Google Scholar] [CrossRef]
  14. Du, Y.; Yu, X.; Wang, P.; Li, Q.; Wang, Y.; Zhang, B. Genetic diversity analysis of wild Medicago falcata L. in Xinjiang based on SNP molecular markers. Feed Res. 2024, 47, 68–73. [Google Scholar] [CrossRef]
  15. Yu, D.N.; Wang, T.; Tie, Z.; Qin, F.; La, Y.; Ma, X.; Zhang, D.; Zha, X.; Liang, C. Population structure analysis of qilian white tibetan sheep group based on whole genome resequencing. China Herbiv. Sci. China Herbiv. Sci. 2023, 43, 12–16. [Google Scholar] [CrossRef]
  16. Li, G. Genetic Diversity of Chinese PCNA Persimmon Based on Whole Genome Resequencing. Master’s Dissertation, Huazhong Agricultural University, Wuhan, China, 2022. [Google Scholar] [CrossRef]
  17. Han, Z.; Han, W.; Xie, R.; Guo, J.; Yi, L.; Hou, J. Analysis of genetic diversity of 148 Potato germplasm based on SNP Markers from whole genome resequencing. Acta Bot. Boreali-Occident. Sin. 2021, 41, 1302–1314. [Google Scholar]
  18. Li, Q.; Zheng, G. Research and application of SNP molecular marker technology in crop seed detection. China Seed Ind. 2019, 16, 16–17. [Google Scholar] [CrossRef]
  19. Guan, L. Development of SSR and SNP Markers Based on Whole-Genome Variants and Their Potentional Application Peach. Ph.D. Thesis, Huazhong Agricultural University, Wuhan, China, 2020. [Google Scholar]
  20. Wang, X.; Liu, F.; Li, M.; Zhang, T.; LAi, Q.; Liu, X.; Xiong, Y.; Tang, X.; Li, C.; Wang, Y. Genetic diversity and structure analysis of Gulin wild tea resources based on GBS technology. Southwest China J. Agric. Sci. 2023, 36, 1141–1149. [Google Scholar] [CrossRef]
  21. Zhao, Y.; Feng, Y.; Yang, C.; Wang, B.; Qiao, J.; Yin, S.; Zhou, H.; Li, F. Genetic relationship analysis of Paulownia species using whole genome re-sequencing. J. Cent. South Univ. For. Technol. 2023, 43, 1–10. [Google Scholar] [CrossRef]
  22. Vanraden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef] [PubMed]
  23. Gao, Y.; Wang, K.; Wang, D.; Zhao, J.; Zhang, C.; Cong, P.; Liu, L.; Li, L.; Piao, J. The genetic diversity and population structure analysis on Malus baccata (L.) borkh from 7 Sources. Sci. Agric. Sin. 2018, 51, 3766–3777. [Google Scholar] [CrossRef]
Figure 1. Statistics on the number of different types of single base substitutions.
Figure 1. Statistics on the number of different types of single base substitutions.
Cimb 46 00844 g001
Figure 2. Heat map of sample genetic relationships. Pairs from the first sample to the last. The larger the value, the closer to red; that is, the closer the relationship between two individuals.
Figure 2. Heat map of sample genetic relationships. Pairs from the first sample to the last. The larger the value, the closer to red; that is, the closer the relationship between two individuals.
Cimb 46 00844 g002
Figure 3. Comparison of genetic relationship matrices and genetic distances. DT: Sample from Datong City, Shanxi Province; HS: from Hohhot, Inner Mongolia; LZ: samples from Lanzhou City, Gansu Province; TG: from Jinzhong City, Shanxi Province; SX: from Yulin City, Shaanxi Province. A total of 53 common apricot varieties were classified into four taxa (Q1, Q2, Q3, and Q4).
Figure 3. Comparison of genetic relationship matrices and genetic distances. DT: Sample from Datong City, Shanxi Province; HS: from Hohhot, Inner Mongolia; LZ: samples from Lanzhou City, Gansu Province; TG: from Jinzhong City, Shanxi Province; SX: from Yulin City, Shaanxi Province. A total of 53 common apricot varieties were classified into four taxa (Q1, Q2, Q3, and Q4).
Cimb 46 00844 g003
Figure 4. PCA, principal component analysis. Each color represents a group of species. S1 (Red): Group I; S2 (Orange): Group II; S3 (Green): Group III; S4 (Blue): Group IV.
Figure 4. PCA, principal component analysis. Each color represents a group of species. S1 (Red): Group I; S2 (Orange): Group II; S3 (Green): Group III; S4 (Blue): Group IV.
Cimb 46 00844 g004
Figure 5. Cross-validation error rate line chart.
Figure 5. Cross-validation error rate line chart.
Cimb 46 00844 g005
Figure 6. Analysis of SNP population structure in 53 common apricot varieties (K = 4). Each column of vertical grids represents the genetic background of a sample; each color block represents an estimated ancestor, and the proportion of the vertical grid occupied by each color block represents the proportion of that ancestor that contributes to the genetic background of that sample. Red for G1; Orange for G2; Green for G3; Blue for G4.
Figure 6. Analysis of SNP population structure in 53 common apricot varieties (K = 4). Each column of vertical grids represents the genetic background of a sample; each color block represents an estimated ancestor, and the proportion of the vertical grid occupied by each color block represents the proportion of that ancestor that contributes to the genetic background of that sample. Red for G1; Orange for G2; Green for G3; Blue for G4.
Cimb 46 00844 g006
Figure 7. Three-dimensional PCA distribution maps of different groups. (A) The three-dimensional PCA of the samples when the varieties are divided into four groups; (B) the three-dimensional PCA of the samples when the varieties are divided into six groups. Varieties with the same color in the PCA plot were considered to be in the same line.
Figure 7. Three-dimensional PCA distribution maps of different groups. (A) The three-dimensional PCA of the samples when the varieties are divided into four groups; (B) the three-dimensional PCA of the samples when the varieties are divided into six groups. Varieties with the same color in the PCA plot were considered to be in the same line.
Cimb 46 00844 g007
Table 1. Mapping ratio statistics.
Table 1. Mapping ratio statistics.
NumberSample NameSample CodeAlignment Rate (%)Coverage (%)Aver-Dep
1HuangjintianxingHJTX97.5790.049.25
2HongmeixingHMX97.9689.7715.48
3Lingnong3LN398.0489.4616.93
4Lingnong1LN197.9589.0715.55
5HongxiangmiHXM97.8592.6013.15
6HuangbanxingHB97.7290.0515.36
7JinmeixingJMX97.7990.8215.58
8KaitexingKT98.0088.9912.52
9Sanyuandajiexing SYDJ97.6791.2016.55
10XiangxingXX97.5389.9515.20
11XintexingXT97.6889.7615.16
12LiaoninglijiexingLNLJX95.8788.8913.13
13HamixingHAMX98.1089.4614.84
14Boke1BK197.8489.6714.94
15MuguaxingMGX97.9290.0316.94
16GongfoxingGFX97.4389.9113.92
17MeixingMX97.6490.1916.22
18YingtiaojinxingYTJX97.8989.4614.77
19XiangbaixingXBX97.9590.0214.81
20YidianhongYDH97.7089.3913.92
21YanggaodajiexingYGDJX98.0790.6914.47
22PingliangdajiexingPLDJX97.7989.6714.00
23ZaoshuliguangxingZSLGX97.9688.7913.47
24RuantiaojinxingRTJX97.9289.4014.55
25LiguangwanshuxingLGWSX97.9789.9413.14
26YouxingYX97.5389.9515.20
27BaishuixingBSX97.5093.0863.27
28JiguangJG97.7692.5536.41
29WanhongWH97.8593.2740.78
30JinmeiJM97.4793.2780.01
31JidanxingJDX97.4090.6111.82
32YidalixingYDLX97.7589.499.96
33DaxingmeiDXM97.5590.5912.22
34LuotuohuangLTH97.8189.829.97
35TaipinghongTPH97.4291.2419.09
36LaoshanhongLSH96.9289.668.64
37MengdajiexingMDJ96.1089.9310.13
38WuyuexianWYX97.7988.277.31
39ShanxidajiexingSXDX97.4389.2910.14
40JintaiyangJTY97.6192.7010.66
41YanzhihongYZH97.6388.307.52
42FenyanzhihongFYZH97.7791.8631.76
43BaixingBX97.9696.4312.26
44ManaixingMNX97.9288.728.58
45HoutouxingHTX97.7994.8412.68
46ChuanhongCH97.8390.0511.86
47FengyuanxingFYX97.6990.9014.64
48Fengyuan29FY2997.7690.5811.28
49XindianbaohexingXDB97.8290.4013.45
50LanzhoudajiexingLDJ-397.6591.4313.80
51DongwudajiexingDWDJ-397.3290.8712.10
52DongxiangdajiexingDTD97.7689.1011.89
53DongxiangbaohexingDTB97.4889.9014.67
Table 2. SNP annotation information table.
Table 2. SNP annotation information table.
Genomic PositionSNP Number of Loci
Nonsynonymous3,323,008
Stop gain68,981
Synonymous3,028,064
Stop loss16,410
Exonic6,436,463
Intronic14,853,605
Intergenic48,392,511
Upstream8,897,323
Downstream8,246,616
Upstream/Downstream1,480,979
Splicing24,741
Table 3. InDel annotation results table.
Table 3. InDel annotation results table.
Genomic PositionInDel Number of Point Points
Frameshift deletion170,474
Frameshift insertion88,397
UTR5381
Non-frameshift deletion81,518
Non-frameshift insertion57,623
Stop gain8807
Stoploss3344
Downstream1,741,285
Exonic410,163
Intergenic7,513,932
Intronic2,954,011
Splicing12,788
Upstream1,931,934
Upstream/Downstream328,046
Table 4. Comparison of genetic relationship matrices and genetic distances.
Table 4. Comparison of genetic relationship matrices and genetic distances.
Sample CombinationGenetic DistanceGenetic Distance RankingAffiliationRelationship Ranking
PLDJX × BK11.412205910.00570017
PLDJX × XBX1.409770920.00557016
XBX × BK11.406844930.00538014
LTH × WYX1.072407140.00801126
YZH × HMX1.064885250.00684125
JM × YZH1.056385960.00555015
JM × YZH1.054011570.00623221
CH × YZH1.052468880.00661423
HBX × SXDJX1.050779990.00616619
XX × HBX1.0439993100.0048604
JM × HMX1.0262081110.0041501
RTJX × HAMX1.0208997120.00509012
YTJX × HAMX1.0196109130.0050209
HAMX × CH1.0188594140.0048605
RTJX × YTJX1.0188064150.0049607
HAMX × GFX1.0187374160.00505011
RTJX × GFX1.0187374170.00503010
JM × CH1.017809180.0037502
YTJX × GFX1.0164344190.0050008
LZDJX × YZH1.010440 7200.01071330
LGWSX × HJTX0.9968687220.00665224
DWDJ-3 × MDJ0.9960544240.00625522
HJTX × YGDJX0.9889232260.00622120
HJTX × JMX0.9878738270.00602918
YGDJX × LGWSX0.9703597290.00510013
LGWSX × JMX0.9690487300.0049006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xin, Q.; Qing, J.; He, Y. Analysis of Kinship and Population Genetic Structure of 53 Apricot Resources Based on Whole Genome Resequencing. Curr. Issues Mol. Biol. 2024, 46, 14106-14118. https://doi.org/10.3390/cimb46120844

AMA Style

Xin Q, Qing J, He Y. Analysis of Kinship and Population Genetic Structure of 53 Apricot Resources Based on Whole Genome Resequencing. Current Issues in Molecular Biology. 2024; 46(12):14106-14118. https://doi.org/10.3390/cimb46120844

Chicago/Turabian Style

Xin, Qirui, Jun Qing, and Yanhong He. 2024. "Analysis of Kinship and Population Genetic Structure of 53 Apricot Resources Based on Whole Genome Resequencing" Current Issues in Molecular Biology 46, no. 12: 14106-14118. https://doi.org/10.3390/cimb46120844

APA Style

Xin, Q., Qing, J., & He, Y. (2024). Analysis of Kinship and Population Genetic Structure of 53 Apricot Resources Based on Whole Genome Resequencing. Current Issues in Molecular Biology, 46(12), 14106-14118. https://doi.org/10.3390/cimb46120844

Article Metrics

Back to TopTop