[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Enhancing ‘Mirlo Rojo’ Apricot (Prunus armeniaca L.) Quality Through Regulated Deficit Irrigation: Effects on Antioxidant Activity, Fatty Acid Profile, and Volatile Compounds
Next Article in Special Issue
Transcriptome and miRNA Reveal the Key Factor Regulating the Somatic Embryogenesis of Camellia oleifera
Previous Article in Journal
Planting System and Cultivar Influence Olive Key-Pests Infestation in an Olive-Growing Vocated Area
Previous Article in Special Issue
A C2H2-Type Zinc Finger Protein from Mentha canadensis, McZFP1, Negatively Regulates Epidermal Cell Patterning and Salt Tolerance
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Approach Unveils Potential Gene Introgression of Oil Camellias

1
Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou 311400, China
2
Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100000, China
*
Author to whom correspondence should be addressed.
Horticulturae 2024, 10(12), 1252; https://doi.org/10.3390/horticulturae10121252
Submission received: 28 October 2024 / Revised: 19 November 2024 / Accepted: 22 November 2024 / Published: 26 November 2024
(This article belongs to the Special Issue Germplasm, Genetics and Breeding of Ornamental Plants)
Figure 1
<p>Topological discordance between the phylogenetic relationships of Sect. Oleifera based on the chloroplast genome dataset (<b>a</b>) and transcriptome orthologs (<b>b</b>) with bipartition information restored from the gene trees shown above the branches. The multispecific coalescent species tree is presented to show the branch lengths. The pie charts at each node represent the estimated proportions of gene trees with different topologies based on the nucleotide alignment; q1, q2, and q3 refer to the quartet support for the main topology (green), the first alternative (purple), and the second alternative (blue), respectively. (<b>c</b>) PCA analysis of the morphological data (leaf, flower, and fruit). (<b>d</b>) The pattern and content of anthocyanin in these plants (µg/100 mg), where the liquid-phase diagram represents four peak patterns.</p> ">
Figure 2
<p>The origin of oil camellias. (<b>a</b>) Maximum-likelihood phylogenetic tree of oil camellias using whole-genome resequencing data. Numbers above the branches indicate the bootstrap values. (<b>b</b>) Gene flow events in oil camellias estimated with SNaQ and the different datasets, using <span class="html-italic">C. sinensis</span> as the outgroup, and the length of each terminal branch set to 1. (<b>c</b>–<b>e</b>) Different subsets with different outgroups.</p> ">
Figure 3
<p>Identification of gene flow events between species with different ploidies. (<b>a</b>) Combinations of ABBA-BABA statistics and the corresponding values. Z-scores &gt; 3 indicate statistically significant results. (<b>b</b>) Ploidy of these oil camellias. * indicates that the ploidy has been verified with cytological data according to Ming (2000).</p> ">
Figure 4
<p>Ancestral area reconstruction for oil camellias based on transcriptome orthologs. (<b>a</b>) Biogeography of Oil camellia. (<b>b</b>) Divergence time of Oli camellia. The pie charts indicate the relative estimates of possible ancestral areas. A, Central China; B, Paleotropic region; C, Eastern China; D, Southwestern China; E, Northwest China; and F, Japan. Ma, million years ago.</p> ">
Figure 5
<p>Predicted parental origin model for polyploid camellias. ♀ Maternal and ♂ paternal. Probable subgenomic composition and origins are inferred from the data presented herein.</p> ">
Versions Notes

Abstract

:
The complex phylogenetic relationship of polyploid species provides an opportunity for a comprehensive study of gene introgression. Oil camellias refer to a class of important woody oil plant in the camellia genus, including octoploid, hexaploid, tetraploid, and diploid plants, but the phylogeny relationship of these species remains poorly investigated. Here, based on multiple types of evidence, including phylogenetic conflict, gene flow analysis, and representative metabolite, we reconstructed the phylogenetic relationship of oil camellias. Camellia shensiensis and C. grijsii formed a distinct branch. Phylogenetic conflict suggested that hexaploid C. oleifera probably originated from hybridization and clustered with diploid C. kissi and tetraploid C. meiocarpa. Tetraploid C. confusa probably originated from crossing the ancestor of C. kissi and C. brevistyla, and C. brevistyla probably was the maternal progenitor of hexaploid C. sasanqua. Furthermore, the composition of anthocyanin in tender leaves showed a strong correlation with phylogenetic distinctions. This study proves the feasibility of using iconic metabolic components to solve phylogenetic relationships and lays a foundation for analyzing genetic breeding and utilizing oil camellia resources.

1. Introduction

Interspecific hybridization is an important process during biological evolution and is particularly common in plants, leading to new speciation and creating reticulate structures in the Tree of Life [1,2,3,4], which could be one of the main reasons for the origin of polyploids [5]. Chromosomal polyploidy is widespread in camellias, such as 2×, 4×, 6×, and 8× [6]. Polyploids in some oil camellias are allopolyploids, which arise from crosses between genetically different parents [7], but there is a lack of theoretical research on the species origins of camellias.
Oil camellias generally refer to the multiple oil species and cultivars in the genus Camellia with a high oil content and considerable commercial value [6]. Camellia seed oil is rich in oleic acid- and linoleic acid-based unsaturated fatty acids and is known as “Oriental olive oil” [8]. Oil camellias include Section Oleifera and Sect. Paracamellia [9], but Ming [10] supported grouping these species together, thus the division between the two groups remains controversial (Figure S2A). The leaf anatomy supported their separation in two groups [11], but the classification based on transcriptome information inferred from the phylogenetic tree also supported merging the two groups [6,12]. Thus, the relationship between oil camellia species is still uncertain. In addition to C. oleifera, which has the widest distribution (Yangtze River Basin and its southern provinces) and the longest cultivation history, C. meiocarpa, C. vietnamensis, and C. grijsii also have cultivation areas and histories in China. Furthermore, C. sasanqua is also an important ornamental plant [13]. The cross affinity among these oil camellias is high, and there may be frequent interspecific hybridization, possibly making it the main driving force for ploidy variability in oil camellias, which promotes speciation. The potential hybrid regions between different oil camellias could be rich in genetic diversity [14].
The function of traditional morphological data is very limited, so more phylogenomic research is needed to promote the exploitation and utilization of oil camellia resources. Comprehensive phylogenomic analysis is an effective way to study the origin of species: for instance, the phylotranscriptomic study of partial Theaceae species revealed two reticulation events within camellias [15]. A recent study based on low-copy genes showed that C. oleifera C. Abel probably originated from hybridization between closely related diploid and tetraploid (e.g., C. meiocarpa) species [14], but the evidence provided by matrilineal origin analysis is insufficient. Most oil camellia species still have unresolved origins. Due to complex reticular evolution and chloroplast capture events, the origins and classification relationships between these species have not been fully resolved.
In this study, we focused on revealing a well-supported method to determine the reticulate evolutionary history of species using the best single-copy gene set from the transcriptome and the complete chloroplast genome, using a combined set of approaches, including cytology, metabolomics, phylogenetic reconstruction, and gene flow inference analysis. Furthermore, we found that the composition of anthocyanin was highly related to the phylogeny of camellias. The gene clusters that could be related to the evolution of horticultural characters in oil camellias were also identified.

2. Method

2.1. Taxon Sampling

Fourteen individuals including all potential ancestor origin species were sampled for phylogenomic analysis, representing all thirteen species from Sect. Paracamellia in Ming’s classification (Ming, 1998) and C. sinensis as an outgroup (Table S1). The fresh leaves of these samples were used for performing whole-genome sequencing and chloroplast genome sequencing. In addition, the fresh leaves, flowers, and buds of the same individuals were used for extracting the total RNA and for transcriptome sequencing. All the plants were collected from the Camellia germplasm Resource Center of Institute of subtropical forestry, Chinese Academy of Forestry (119.96 E, 30.06 N), and identified by experts.

2.2. Extraction and Quantitative and Qualitative Analyses of the Anthocyanins

Tender red leaves (3 g) were ground in liquid N2 and extracted in a 5 mL mixed buffer (methanol–water–formic acid–trifluoroacetic acid = 70:27:2:1, v/v) for 24 h in the shade (Li et al., 2009). Then, the supernatant was filtered through a 0.22 µm membrane (Wang et al., 2001) and stored at −20 °C (three biological replicates and three research replicates).
The chromatographic analysis was performed by high-performance liquid chromatography–diode array detection (HPLC-DAD; Waters-Alliance 2695, Waters Corp., Milford, MA, USA). The experimental settings were described in a previous study (Fan et al., 2023). In brief, the elution gradient was 0 min at 22% B, 15 min at 28% B, and 35 min at 68% B, with detection at 525 nm. The anthocyanin structure was analyzed according to a previous study (Fan et al., 2023) using an ultrahigh-performance liquid chromatography–quantitative time-of-flight mass spectrometry system (Waters Corp., Manchester, UK). A cyanidin-3-0-β-glucoside (Cy3G) standard (Sigma, St. Louis, MO, USA) was used to calculate the anthocyanin content. The mass spectrometry conditions were total ion scanning mass (m/z) of 50–1000 u, dry gas (N2) 450 °C, 600 L/h, and capillary voltage of 4500 V.

2.3. Genome Size Estimation and Determination of the Ploidy Level

Flow cytometry was used to assess the genome sizes of these camellias (Greilhuber et al., 2005). The sample was placed in precooled MGb dissociation solution (45 mM MgCl2·6H2O, 20 mM MOPS, 30 mM sodium citrate, 1% (w/v) PVP40, 0.2% (v/v) Triton X-100, 10 mM Na2EDTA, pH = 7.5). The nuclear samples were crushed, dissociated, filtered, dyed, and measured with a BD FACScalibur Flow Cytometer (BD Sciences, Brea, CA, USA) and maize B73 (2.3 Gb) and tomato (900 Mb) as the internal references [16]. The relative genome size was estimated using equations described previously [17]. A genome size of 2–3 Gb predicted a diploid, while tetraploids were predicted at 5–6 Gb, and a hexaploid was 7–8 Gb [14].

2.4. Plastome Sequencing, Assembly, and Annotation

Paired-end reads of 150 bp were generated in all samples on an Illumina HiSeq 2500 platform. The raw bases were filtered using Trimgalore software [18] with the following settings: -phred33, -length 50, and -e 0.1. We assembled the plastomes with filtered data using the de novo assembly method and the reference genome of Camellia sinensis (KF562708.1) [19]. Geneious 9.1.4 was used to map the reads against the reference genome [20] with medium–low sensitivity using five iterations. The final plastome scaffolds were manually adjusted to eliminate errors and ambiguities. We annotated the plastomes of all samples using the geneChecker.py script in mitoMaker [21] and the C. sinensis plastome as a reference. These data were uploaded to the NCBI database (Table S1).

2.5. Transcriptome Assembly and Inference of Orthologs

In brief, the Illumina HiSeq2500 platform yielded 150 bp paired reads in a single lane. The raw data were filtered using trim_galore with the above settings. We performed de novo assembly using Trinity 2.8.5 [22]. Transdecoder 5.7.0 was used to predict coding sequences longer than 100 amino acids with an open reading frame [23]. Non-redundant, representative sequences were retained using CD-HIT 4.6.8 with “-c 0.98” [24]. Finally, 1033 single-copy orthologs (referred to as the 1033nu-14 dataset) of 14 species (13 oil camellias and C. sinensis ‘LongJing43’ as the outgroup) were identified using OrthoFinder 2.5.4 [25].

2.6. Whole-Genome Resequencing Analysis

For whole-genome resequencing, using C. oleifera var. “Nanyongensis” as the reference genome, all 13 samples’ sequencing depth ranged from 15× to 20×. After filtering low-quality SNP, the phylogenetic relationships of the 13 samples were constructed based on the ML tree and SNP filter.

2.7. Phylogenetic Relationship Analyses

Single-copy transcriptome orthologs (1033nu-14 dataset) and the chloroplast genome sequences were aligned using MAFFT (7.520) with the “L-INS-i” option [26] and trimmed with trimAl 1.4.1 using the “automated1” option [27].
The phylogenetic tree for the plastome dataset based on the maximum-likelihood (ML) method was constructed using IQ-TREE 2.2.0 [28] with 5000 ultrafast bootstrap replicates [29]. The best model was selected using ModelFinder [30]. Furthermore, MrBayes 3.2.7a was used to construct the Bayesian inference (BI) tree [31]. We used concatenation and coalescent methods to reconstruct the phylogenetic relationships of these species based on the amino acid and nucleotide data for the single-copy ortholog dataset. IQ-TREE was used to construct the phylogenetic relationships based on the concatenated dataset with 5000 ultrafast bootstrap replicates. The ML gene trees of each ortholog were constructed using IQ-TREE with the option “5000 ultrafast bootstrap replicates”, and the best model for each gene tree was selected using the AIC metric in ModelFinder; then, multispecies coalescent (MSC) trees in ASTRAL 5.7.8 were calculated with gene trees as the input [32].

2.8. Phylogenetic Conflict Assessment and Gene Flow Inference

We calculated the gene concordance factors using IQ-TREE (2.2.0) to evaluate the degree of conflict, with the MSC species tree as the mapping tree for the ML gene trees. In addition, ASTRAL 5.6.3, with the options “-q” and “-t”, was used to assess phylogenetic conflicts.
To construct the evolutionary network and detect whether gene flow occurred in the genomes of these species, the Species Networks applying Quartets (SNaQ) program in Phylonetwork was used to identify optimal gene flow, by counting the maximum pseudolikelihood of a network from four-taxon concordance factors [33]. The 13 species and C. sinensis (as an outgroup) were selected for analysis. To increase credibility, we calculated single-copy orthologs for each subset, and the phylogenetic networks for five different subsets were estimated. We ran SNaQ with a possible maximum hybrid node number (hmax) from 0 to 10, and each SNaQ program was run 20 times. Based on the 1033nu-14 dataset, ABBA-BABA tests were used to identify gene flow within clades using the R package evobiR 2.1 [34]. To test whether a gene flow event was significant, 1000 bootstrap replicates were conducted in each test. We tested possible combinations, with C. sinensis as the outgroup.

2.9. Divergence Time and Reconstructing the Ancestral State

Divergence time was estimated with the 1033nu-14 dataset and the C. sinensis outgroup using MCMCTree in PAML 4.9j. The fossil calibration from a previous study [35] was used in the analysis, and the MCMCTree settings were the birth–death model, correlated rates, and HKY85 substitution model with alpha = 0.5. We combined the results of two independent MCMC chains, where samples were drawn every 10,000 generations, with the first 20% of iterations discarded as burn-in until an effective sample size of >200 was reached for all parameters.
We used the R package ‘BiogeoBEARS’ to estimate the ancestral locations of oil camellias with the topologies based on the transcriptome ortholog [36]. Distribution data for these species were assembled based on the Global Biodiversity Information Facility (GBIF, https://www.gbif.org (accessed on 5 November 2023)) and Flora Republicae Popularis Sinicae (http://www.iplant.cn/frps (accessed on 5 November 2023)). Seven geographical regions representing the current distribution were defined: A, Central China; B, Paleotropic region; C, Eastern China; D, Southwestern China; E, Northwestern China; and F, Japan.

3. Results

3.1. Phylogenetic Relationships Based on the Transcriptome and Chloroplast Genome

To identify the phylogenetic relationships among different oil camellias, we sequenced and assembled nine complete plastomes in this study. Phylogenetic analyses of all thirteen plastomes based on the ML and BI methods revealed two well-supported clades within oil camellias (Figure 1a). C. shensiensis and C. shensiensis ‘Zhenzhucha’ formed a sister branch. Two well-supported lineages were identified within clade 2. C. fluviatilis, C. grijsii, and others with long leaves formed a lineage. Another lineage was formed around plants with oval leaves, including C. kissi, C. oleifera, C. brevistyla, C. sasanqua, C. confusa, and C. meiocarpa. Moreover, C. lanceoleosa diverged first and was well-supported as an ancient ancestor [35].
We obtained well-supported topologies from the coalescence sequence dataset of the nucleotide and amino acid data, based on the 1033 single-copy transcriptome orthologs. These two types of data yielded the same topology (Figure S1A,C). The ASTRAL MSC-based trees, based on the concatenated nucleotide and amino acid datasets, yielded different topologies compared to the coalescence trees (Figure S1B,D). These discordances concerned the phylogenetic positions of C. oleifera and C. fluviatilis. However, two nearly identical topologies were obtained in the nucleotide and amino acid sequence-based analyses.
Interestingly, cytonuclear phylogenetic incongruences were detected within Sect. Oleifera and Sect. Paracamellia: transcriptome analysis showed a closer relationship between C. confusa and C. kissi within the oval leaf clade, but the plastome data placed C. kissi and C. oleifera together (Figure 1a,b). Similarly, the plastome supported a closer relationship between C. shensiensis and C. fluviatilis, but the transcriptome revealed a closer relationship between C. shensiensis and C. grijsii (Figure 1a,b). In addition, the branching order for C. oleifera, C. sasanqua, and C. confusa displayed a high level of discordance.

3.2. Morphological and Metabolite Data Support the Phylogenetic Relationships

The leaf morphological data partially support the conclusion of an evolutionary relationship. The oil camellias were divided into four groups (long leaves, oval leaves, big fruit, and small fruit), and C. gauchowensis with larger leaves and larger fruit was more closely related to C. vietnamensis. C. grijsii was an independent group because of its unique larger leaves and smaller fruits (Supplementary Table S3).
The taxa, number of peaks, and sum of the peak areas based on the HPLC-MS data of the 13 species are shown in Figure 1d and Table S2. The taxa, number, and peak areas of anthocyanins were significantly different. The peak pattern of these species was divided into four groups. The branches of C. shensiensis, C. grijsii, and C. shensiensis ‘Zhenzhucha’ showed a peak pattern (contained three kinds of anthocyanins; ④ in Figure 1d). The relationship was so close that it was difficult to distinguish. In contrast, their peak patterns were completely different from those of other species, confirming the evolutionary relationship based on the plastome and transcriptome orthologs. Interestingly, the proportion of delphinidin in C. fluviatilis was the highest, and the peak pattern was unique (③ in Figure 1d). Other species displayed a high proportion of Dp3GEpC, but there was a large difference in the proportion of Cy3GEpC. The clustering results of the metabolites supported evolutionary relationships based on molecular evidence.

3.3. Evaluation of the Topological Conflicts and Origin of Oil Camellias

The transcriptome ortholog tree topologies were highly variable (Figure 1b), particularly among C. confusa, C. oleifera, C. brevistyla, and C. kissi. The final normalized quartet score of the gene tree was 0.555, and the gene concordance factors strongly supported the potential conflicts with the species tree (Figure S2).
Although the origin of cultivated C. oleifera based some gene biomarker has attracted attention [6], the progenitors of oil camellias sources remain uncertain. We resequenced those samples that were assumed to be the most likely ancestral species of oil camellias, with an average coverage depth of approximately 15× for each accession. After filtering out the low-quality sites, the total SNP number ranged from 30,092,698 to 62,096,658 in these samples. The well-supported phylogenetic tree illustrated that C. oleifera, C. kissi, C. confusa, C. brevistyla, and C. meiocarpa might be closer than other species (Figure 2a). Interestingly, different from the phylogenetic relationship constructed by the transcriptome, C. oleifera and C. kissi clustered into a group. We speculate that interspecific hybridization within oil camellias might have contributed to their evolution.
The SNaQ program analysis performed on oil camellia revealed that the optimal developmental network (possibly resulting from introgression and hybridization) was established with an hmax value of 3. Some well-supported introgressions were detected between C. oleifera and C. meiocarpa, C. grijsii, and C. shensiensis ‘Zhenzhucha’. Furthermore, introgression occurred between C. grijsii and another clade (Figure 2b). We performed SNaQ searches on the subsets and obtained four better-supported results (Figure 2c–e and Figure S3): (1) gene flow from C. shensiensis ‘Zhenzhucha’ to C. grijsii (Figure 2b), (2) gene flow from C. lanceoleosa to C. fluviatilis (Figure 2e), (3) gene flow from C. kissi to C. confusa (Figure S2), (4) gene flow from C. fluviatilis to the common ancestor of C. shensiensis, C. grijsii, and C. shensiensis ‘Zhenzhucha’ (Figure 2c), and (5) gene flow from C. brevistyla to C. sasanqua (Figure 2d).
We evaluated the ploidy of all samples by flow cytometry. According to previous studies based on the chromosome preparation method [37], we confirmed most of our results (Figure 3b). The results showed that the ploidy of most species was diploid, whereas C. sasanqua, C. grijsii, and C. oleifera were hexaploid (Figure 3b). In addition, based on the results of flow cytometry, the Phylonetwork analysis produced puzzling results such as, for example, gene flow from C. confusa to C. oleifera (Figure S3A). To further confirm the framework of reticulate evolution within oil camellias, the ABBABABA, developed to detect introgression [38], was performed. The ABBA-BABA tests supported introgression events between C. oleifera and C. meiocarpa with significant Z scores (>3) (Figure 3a), and introgressions between C. fluviatilis and C. lanceoleosa, C. confusa, and C. kissi were well supported. However, introgressions between C. oleifera and C. confusa were not supported (Figure 3a).

3.4. Divergence Times and Biogeographical Reconstruction

We estimated the divergence times using the strategies reported earlier [39] (Figure S4). Oil camellias were estimated to have split from Sect. Thea in the south and east regions of Asia during the latter part of the Miocene based on the transcriptome ortholog datasets (Node 1 in Figure 4a,b). Sect. Oleifera started to diversify during the early Pliocene in the Paleotropic and Central China regions (Node 2 in Figure 4a,b). The species in Southeast Asia did not diversify until 1.4 Ma ago, during the early Pleistocene (Node 4 in Figure 4a,b). The divergence time of the most southern Chinese clade was estimated to be 4.7–1.6 Ma in the middle Pliocene (Node 6 in Figure 4a,b), and the divergence time of C. sasanqua was estimated to be the middle Pliocene, and it remained in the Japanese region despite dividing in the mainland.

4. Discussion

Phylogenomics are used to determine the phylogenetic relationships and hybridization events among species [40], but there are few comprehensive methods that can be used to analyze camellia’s complex evolutionary process. We integrated morphological data, cytological data, metabolite data, plastome, transcriptome, and whole-genome resequencing to reconstruct the reticular evolutionary history of species from Sect. Oleifera and Sect. Paracamellia, uncovering multiple forms of evidence for the parental origin of oil camellias.

4.1. Reconstructing the Phylogenetic Relationships of Sect. Oleifera and Sect. Paracamellia

The advantages of using whole genome, single-copy ortholog, and complete chloroplast genome information to solve taxonomic species problems have gradually emerged with high-throughput sequencing technology [12,41]. The classification of Sect. Oleifera and Sect. Paracamellia [9] has been controversial based on morphological characters. The phylogenetic relationship of oil crops is still insufficient. In this study, a set of concatenated and coalescence methods based on “perfect single copy” transcriptome orthologs (1033nu-14 dataset) were integrated to reconstruct the phylogenetic relationships among oil camellia plants. Species from Sect. Oleifera and Sect. Paracamellia showed few signs of separation, except for C. shensiensis, C. grijsii, and C. shensiensis ‘Zhenzhucha’, which formed an independent clade (Figure 1b). Thus, our nuclear gene classification results support merging the two groups, consistent with Ming’s classification. A previous study based on a low-copy gene alignment dataset also supported merging the two groups, and C. shensiensis also formed an independent clade [6]. However, the positions of some species differed: for instance, C. oleifera was included inside the clade in our study, but, in previous study [6], C. lanceoleosa and C. fluviatilis formed a sister group, unlike our study. Interestingly, reconstructing relationships based on the plastome revealed that C. grijsii and C. fluviatilis were closer, and C. shensiensis, C. lanceoleosa, and C. shensiensis ‘Zhenzhucha’ formed an independent clade. These conflicting results are not uncommon in molecular phylogeny and are probably caused by taxon sampling or analysis [42]. These results indicate potential gene flow or chloroplast capture events between C. grijsii, C. shensiensis, and other species.

4.2. The Composition of Anthocyanin Is Related to Phylogenetic Relationships Between Camellia Species

We tried to judge evolutionary status by analyzing the anthocyanin patterns in tender leaves. Only C. grijsii and C. shensiensis did not contain delphinidin, and Cy3G was the main ingredient (Figure 1d). A previous study showed that the relationship between Sect. Camellia and Sect. Paracamellia [37] is relatively close, and their evolutionary appearance is relatively late [12]. Combined with our previous research on anthocyanin categories [43], their peak patterns are similar to camellias. However, considering potential gene flow events, C. grijsii and C. shensiensis may be transitional species between Sect. Camellia and Sect. Paracamellia [10]. Previously, studies have also supported C. shensiensis’ close relationship with C. japonica [6]. A cluster analysis of metabolites suggests a solution for chrysanthemum evolution [44]. Our results also provide evidence for the feasibility of using marker metabolites in systematic taxonomy, as recognition was higher than with morphological data (leaves, fruits, etc.).

4.3. Potential Gene Introgression of Oil Camellias

The species tree obtained by the concatenated method was highly inconsistent and had many topological conflicts with the evolutionary tree based on plastome reconstruction. It is common for chloroplast and nuclear gene trees to clash, and it is probably difficult to find consistency [45]. These results suggest a complex reticular evolutionary process among oil camellias. We reconstructed the reticulate evolution of these species by combining gene flow, phylogenetic, and ploidy analyses.
A previous study based on plastomes showed different results, in that neither C. meiocarpa nor C. kissi were closely related to C. oleifera. However, the hybridization test showed that C. meiocarpa and C. kissi are probably the ancestors of C. oleifera [6]. As such, the evidence is insufficient. The plastome is maternally inherited in most angiosperms. Our chloroplast tree showed that C. kissi might be the potential female parent of C. oleifera, and C. brevistyla could be the potential female parent of C. confusa and C. meiocarpa. Therefore, our results provide further evidence of the potential female parent of C. oleifera. In the past, Sealy (1958) thought that C. confusa was as variety of C. oleifera [46]. However, the nuclear tree supported the notion that C. meiocarpa and C. brevistyla could be the ancestors of C. oleifera. The topological conflicts between the gene trees suggest different evolutionary processes that could be used to identify the ancestors of hybrid species [47]. The Phylonetwork and ABBA-BABA test revealed C. meiocarpa as a potential male parent. Considering the results of species ploidy, we speculate that hexaploid C. oleifera probably originated from the hybridization of diploid C. kissi and tetraploid C. meiocarpa (Figure 5) and that C. meiocarpa probably originated from diploid C. brevistyla. Notably, C. brevistyla is a tetraploid species in previous study [6], possibly because their samples were collected from germplasm (Jinua International Camellia species Garden) and not the wild.
Furthermore, C. confusa is regarded as a variety of C. kissi with large leaves in the classification of Ming (1999) [37]. According to its ploidy level, C. confusa is tetraploid, so it could be a different species than C. kissi. However, our chloroplast genome and nuclear gene tree supported C. confusa’s probable origin from a cross between C. kissi and C. brevistyla; the hybridization and gene flow analysis also supported the above conclusion. Interestingly, the network and phylogenetic analyses revealed that C. brevistyla could be the potential ancestor of C. sasanqua. However, the cytological studies showed that C. sasanqua is hexaploid, so it probably has unknown parents from other sections, requiring further study.

5. Conclusions

In summary, by reconstructing the reticular evolutionary framework of oil camellias, we obtained some tips about their origin. For instance, tetraploid C. meiocarpa and diploid C. kissi are the probable ancestors of hexaploid C. oleifera. C. kissi might be the progenitor of tetraploid C. confusa, and C. meiocarpa and C. sasanqua are probably closely related to C. brevistyla. Furthermore, the composition of anthocyanins is highly related to the phylogeny of oil camellias. Our work offers insight into the evolution of oil camellias and identifies vital genetic resources, facilitating genome-breeding programs in oil camellias.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/horticulturae10121252/s1: Figure S1: Concatenated phylogenetic trees based on nucleotides and amino acids; Figure S2: Summary of topological information based on previous studies and this study; Figure S3: Phylonetwork analysis based on different subsets; Figure S4: Chronogram of oil camellia inferred from the MCMC Tree in the PAML package; Table S1: Species and sources of the plastome and transcriptome data; Table S2: Detailed information for sequencing species; Table S3: Detailed leaf and fruit morphological data; and Table S4: Raw anthocyanin content data.

Author Contributions

M.F. conducted the data analysis and wrote the manuscript; X.L. and Z.S (Zhenyuan Sun) conceived and designed the experiments; and Y.Z. and Z.S. (Zhixin Song) performed the experiments. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Zhejiang Science and Technology Major Program on Agricultural New Variety Breeding (2021C02071-2).

Data Availability Statement

The datasets generated and/or analyzed during this study are included in this article, its Supplementary Materials, or the [NCBI] repository with accession number PRJNA991271 [https://www.ncbi.nlm.nih.gov/sra/PRJNA991271].

Acknowledgments

The authors are grateful to the editors and referees for their valuable comments which improved our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rieseberg, L.H.; Wendel, J.F. Introgression and its consequences in plants. In Hybrid Zones and the Evolutionary Process; Harrison, R.G., Ed.; Oxford University Press: New York, NY, USA, 1993; pp. 70–100. [Google Scholar]
  2. Soltis, P.S.; Soltis, D.E. The role of hybridization in plant speciation. Annu. Rev. Plant Biol. 2009, 60, 561–588. [Google Scholar] [CrossRef]
  3. Abbott, R.; Albach, D.; Ansell, S.; Arntzen, J.W.; Baird, S.J.; Bierne, N.; Zinner, D. Hybridization and speciation. J. Evol. Biol. 2013, 26, 229–246. [Google Scholar] [CrossRef]
  4. Payseur, B.A.; Rieseberg, L.H. A genomic perspective on hybridization and speciation. Mol. Ecol. 2016, 25, 2337–2360. [Google Scholar] [CrossRef]
  5. Cui, W.H.; Du, X.Y.; Zhong, M.C.; Fang, W.; Suo, Z.Q.; Wang, D.; Hu, J.Y. Complex and reticulate origin of edible roses (Rosa, Rosaceae) in China. Hortic. Res. 2016, 9, uhab051. [Google Scholar] [CrossRef]
  6. Qin, S.Y.; Chen, K.; Zhang, W.J.; Xiang, X.G.; Zuo, Z.Y.; Guo, C.; Rong, J. Phylogenomic insights into the reticulate evolution of Camellia sect. Paracamellia Sealy (Theaceae). J. Syst. Evol. 2023, 62, 38–54. [Google Scholar] [CrossRef]
  7. Xiao, D.T.; Gu, Z.J.; Xiao, L.F. A study of meiosis of 9 species in genus camellia. Acta Bot. Yunnanica 1993, 2, 167–172. [Google Scholar]
  8. He, L.; Zhou, G.Y.; Zhang, H.Y.; Liu, J.A. Research progress on the health function of tea oil. J. Med. Plants Res. 2011, 5, 485–489. [Google Scholar]
  9. Chang, H.T. Theaceae (1) Theoideae 1. Camellia. In Flora Reipublicae Popularis Sinicae, 49; Science Press: Beijing, China, 1998; pp. 3–195. [Google Scholar]
  10. Ming, T.L. A systematic synopsis of the genus Camellia. Acta Bot. Yunnanica 1999, 21, 149–159. [Google Scholar]
  11. Lin, X.Y.; Peng, Q.F.; Lü, H.F.; Du, Y.Q.; Tang, B.Y. Leaf anatomy of Camellia sect. Oleifera and sect. Paracamellia (Theaceae) with reference to their taxonomic significance. J. Syst. Evol. 2008, 46, 183–193. [Google Scholar]
  12. Wu, Q.; Tong, W.; Zhao, H.; Ge, R.; Li, R.; Huang, J.; Li, F.; Wang, Y.; Mallano, A.I.; Deng, W.; et al. Comparative transcriptomic analysis unveils the deep phylogeny and secondary metabolite evolution of 116 Camellia plants. Plant J. 2022, 111, 406–421. [Google Scholar] [CrossRef]
  13. Fan, M.; Li, X.; Zhang, Y.; Yang, M.; Wu, S.; Yin, H.; Li, J. Novel insight into anthocyanin metabolism and molecular characterization of its key regulators in Camellia sasanqua. Plant Mol. Biol. 2023, 111, 249–262. [Google Scholar] [CrossRef]
  14. Qin, S.Y.; Rong, J.; Zhang, W.J.; Chen, J.K. Cultivation history of Camellia oleifera and genetic resources in the Yangtze River Basin. Biodivers. Sci. 2018, 26, 384–395. [Google Scholar] [CrossRef]
  15. Zhang, Q.; Zhao, L.; Folk, R.A.; Zhao, J.L.; Zamora, N.A.; Yang, S.X.; Yu, X.Q. Phylotranscriptomics of Theaceae: Generic-level relationships, reticulation and whole-genome duplication. Ann. Bot. 2022, 129, 457–471. [Google Scholar] [CrossRef]
  16. Schnable, P.S.; Ware, D.; Fulton, R.S.; Stein, J.C.; Wei, F.; Pasternak, S.; Presting, G.G. The B73 maize genome:complexity, diversity, and dynamics. Science 2009, 326, 1112–1115. [Google Scholar] [CrossRef]
  17. Dolezel, J.; Greilhuber, J.; Suda, J. Estimation of nuclear DNA content in plants using flow cytometry. Nat. Protoc. 2007, 2, 2233–2244. [Google Scholar] [CrossRef]
  18. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  19. Jansen, R.K.; Kaittanis, C.; Saski, C.; Lee, S.B.; Tomkins, J.; Alverson, A.J.; Daniell, H. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: Effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol. Biol. 2006, 6, 32. [Google Scholar] [CrossRef]
  20. Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647–1649. [Google Scholar] [CrossRef]
  21. Wyman, S.K.; Jansen, R.K.; Boore, J.L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 2004, 20, 3252–3255. [Google Scholar] [CrossRef]
  22. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.D.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef]
  23. Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.; Lieber, M.; et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef]
  24. Li, W.Z.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef]
  25. Emms, D.M.; Kelly, S. OrthoFinder: Solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015, 16, 157. [Google Scholar] [CrossRef]
  26. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
  27. Capella-Gutiérrez, S.; Silla-Martínez, J.M.; Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef]
  28. Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; von Haeseler, A.; Lanfear, R. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020, 37, 1530–1534. [Google Scholar] [CrossRef]
  29. Hoang, D.T.; Chernomor, O.; Von Haeseler, A.; Minh, B.Q.; Vinh, L.S. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 2018, 35, 518–522. [Google Scholar] [CrossRef]
  30. Kalyaanamoorthy, S.; Minh, B.Q.; Wong, T.K.F.; Von Haeseler, A.; Jermiin, L.S. ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 2017, 14, 587–589. [Google Scholar] [CrossRef]
  31. Ronquist, F.; Teslenko, M.; Van Der Mark, P.; Ayres, D.L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M.A.; Huelsenbeck, J.P. Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 2012, 61, 539–542. [Google Scholar] [CrossRef]
  32. Mirarab, S.; Reaz, R.; Bayzid, M.S.; Zimmermann, T.; Swenson, M.S.; Warnow, T. ASTRAL: Genome-scale coalescent-based species tree estimation. Bioinformatics 2014, 30, 541–548. [Google Scholar] [CrossRef]
  33. Solís-Lemus, C.; Bastide, P.; An’e, C. PhyloNetworks: A Package for Phylogenetic Networks. MBE 2017, 34, 3292–3298. [Google Scholar] [CrossRef]
  34. Michelle, M.J.; Maximos, C.; Nathan, A.; Richard, H.A.; Jeffery, P.D.; Heath, B. EvobiR: Tools for comparative analyses and teaching evolutionary biology. Zenodo 2023. [Google Scholar] [CrossRef]
  35. Gong, W.; Xiao, S.; Wang, L.; Liao, Z.; Chang, Y.; Mo, W.; Hu, G.; Li, W.; Zhao, G.; Zhu, H.; et al. Chromosome-level genome of Camellia lanceoleosa provides a valuable resource for understanding genome evolution and self-incompatibility. Plant J. 2022, 110, 881–898. [Google Scholar] [CrossRef]
  36. Dupin, J.; Matzke, N.J.; Sarkinen, T.; Knapp, S.; Olmstead, R.; Bohs, L.; Smith, S. Bayesian estimation of the global biogeographic history of the Solanaceae. J. Biogeogr. 2016, 44, 887–899. [Google Scholar] [CrossRef]
  37. Ming, T.L. Monograph of the Genus Camellia; Yunnan Science and Technology Press: Kunming, China, 2000. [Google Scholar]
  38. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 2012, 487, 94–98. [Google Scholar] [CrossRef]
  39. Dong, S.S.; Wang, Y.L.; Xia, N.H.; Liu, Y.; Liu, M.; Lian, L.; Li, N.; Li, L.F.; Lang, X.A.; Gong, Y.Q.; et al. Plastid and nuclear phylogenomic incongruences and biogeographic implications of Magnolia sl (Magnoliaceae). J. Syst. Evol. 2021, 60, 1–15. [Google Scholar] [CrossRef]
  40. Yu, J.R.; Niu, Y.T.; You, Y.C.; Cox, C.J.; Barrett, R.L.; Trias-Blasi, A.; Guo, J.; Wen, J.; Lu, L.M.; Chen, Z.D. Integrated phylogenomic analyses unveil reticulate evolution in Parthenocissus (Vitaceae), highlighting speciation dynamics in the Himalayan-Hengduan Mountains. New Phytologist. 2023, 238, 888–903. [Google Scholar] [CrossRef]
  41. Yang, K.; Fan, M.L.; Sun, Y.K.; Liu, Q.H.; Gao, H.D. The complete chloroplast genome of the subtropical species Camellia japonica ‘Huaheling’. Mitochondrial DNA Part B 2021, 6, 2385–2386. [Google Scholar] [CrossRef]
  42. Lin, H.Y.; Hao, Y.J.; Li, J.H.; Fu, C.X.; Soltis, P.S.; Soltis, D.E.; Zhao, Y.P. Phylogenomic conflict resulting from ancient introgression following species diversification in Stewartia sl (Theaceae). Mol. Phylogenet. Evol. 2019, 135, 2385–2386. [Google Scholar] [CrossRef]
  43. Fan, M.; Zhang, Y.; Yang, M.; Wu, S.; Yin, H.; Li, J.; Li, X. Transcriptomic and Chemical Analyses Reveal the Hub Regulators of Flower Color Variation from Camellia japonica Bud Sport. Horticulturae 2022, 8, 129. [Google Scholar] [CrossRef]
  44. Chen, X.; Wang, H.; Jiang, J.; Jiang, Y.; Zhang, W.; Chen, F. Biogeographic and metabolic studies support a glacial radiation hypothesis during Chrysanthemum evolution. Hortic. Res. 2022, 9, uhac153. [Google Scholar] [CrossRef] [PubMed]
  45. Romero-Soler, K.J.; Ramiirez-Morillo, I.M.; Ruiz-Sanchez, E.; Homung-Leoni, C.T.; Carnevali, G.; Raigoza, N. Phylogenetic relationships within the mexican genus Bakerantha (Hechtioideae, Bromeliaceae) based on plastid and nuclear dna: Implications for taxonomy. J. Syst. Evol. 2022, 60, 55–72. [Google Scholar] [CrossRef]
  46. Sealy, J.R. A Revision of the Genus Camellia; The Royal Horticultural Society: London, UK, 1958. [Google Scholar]
  47. Zan, T.; He, Y.T.; Zhang, M.; Yonezawa, T.; Ma, H.; Zhao, Q.M.; Kuo, W.Y.; Zhang, W.J.; Huang, C.H. Phylogenomic analyses of Camellia support reticulate evolution among major clades. Mol. Phylogenetics Evol. 2023, 182, 107744. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Topological discordance between the phylogenetic relationships of Sect. Oleifera based on the chloroplast genome dataset (a) and transcriptome orthologs (b) with bipartition information restored from the gene trees shown above the branches. The multispecific coalescent species tree is presented to show the branch lengths. The pie charts at each node represent the estimated proportions of gene trees with different topologies based on the nucleotide alignment; q1, q2, and q3 refer to the quartet support for the main topology (green), the first alternative (purple), and the second alternative (blue), respectively. (c) PCA analysis of the morphological data (leaf, flower, and fruit). (d) The pattern and content of anthocyanin in these plants (µg/100 mg), where the liquid-phase diagram represents four peak patterns.
Figure 1. Topological discordance between the phylogenetic relationships of Sect. Oleifera based on the chloroplast genome dataset (a) and transcriptome orthologs (b) with bipartition information restored from the gene trees shown above the branches. The multispecific coalescent species tree is presented to show the branch lengths. The pie charts at each node represent the estimated proportions of gene trees with different topologies based on the nucleotide alignment; q1, q2, and q3 refer to the quartet support for the main topology (green), the first alternative (purple), and the second alternative (blue), respectively. (c) PCA analysis of the morphological data (leaf, flower, and fruit). (d) The pattern and content of anthocyanin in these plants (µg/100 mg), where the liquid-phase diagram represents four peak patterns.
Horticulturae 10 01252 g001
Figure 2. The origin of oil camellias. (a) Maximum-likelihood phylogenetic tree of oil camellias using whole-genome resequencing data. Numbers above the branches indicate the bootstrap values. (b) Gene flow events in oil camellias estimated with SNaQ and the different datasets, using C. sinensis as the outgroup, and the length of each terminal branch set to 1. (ce) Different subsets with different outgroups.
Figure 2. The origin of oil camellias. (a) Maximum-likelihood phylogenetic tree of oil camellias using whole-genome resequencing data. Numbers above the branches indicate the bootstrap values. (b) Gene flow events in oil camellias estimated with SNaQ and the different datasets, using C. sinensis as the outgroup, and the length of each terminal branch set to 1. (ce) Different subsets with different outgroups.
Horticulturae 10 01252 g002
Figure 3. Identification of gene flow events between species with different ploidies. (a) Combinations of ABBA-BABA statistics and the corresponding values. Z-scores > 3 indicate statistically significant results. (b) Ploidy of these oil camellias. * indicates that the ploidy has been verified with cytological data according to Ming (2000).
Figure 3. Identification of gene flow events between species with different ploidies. (a) Combinations of ABBA-BABA statistics and the corresponding values. Z-scores > 3 indicate statistically significant results. (b) Ploidy of these oil camellias. * indicates that the ploidy has been verified with cytological data according to Ming (2000).
Horticulturae 10 01252 g003
Figure 4. Ancestral area reconstruction for oil camellias based on transcriptome orthologs. (a) Biogeography of Oil camellia. (b) Divergence time of Oli camellia. The pie charts indicate the relative estimates of possible ancestral areas. A, Central China; B, Paleotropic region; C, Eastern China; D, Southwestern China; E, Northwest China; and F, Japan. Ma, million years ago.
Figure 4. Ancestral area reconstruction for oil camellias based on transcriptome orthologs. (a) Biogeography of Oil camellia. (b) Divergence time of Oli camellia. The pie charts indicate the relative estimates of possible ancestral areas. A, Central China; B, Paleotropic region; C, Eastern China; D, Southwestern China; E, Northwest China; and F, Japan. Ma, million years ago.
Horticulturae 10 01252 g004
Figure 5. Predicted parental origin model for polyploid camellias. ♀ Maternal and ♂ paternal. Probable subgenomic composition and origins are inferred from the data presented herein.
Figure 5. Predicted parental origin model for polyploid camellias. ♀ Maternal and ♂ paternal. Probable subgenomic composition and origins are inferred from the data presented herein.
Horticulturae 10 01252 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, M.; Song, Z.; Zhang, Y.; Li, X.; Sun, Z. Multi-Approach Unveils Potential Gene Introgression of Oil Camellias. Horticulturae 2024, 10, 1252. https://doi.org/10.3390/horticulturae10121252

AMA Style

Fan M, Song Z, Zhang Y, Li X, Sun Z. Multi-Approach Unveils Potential Gene Introgression of Oil Camellias. Horticulturae. 2024; 10(12):1252. https://doi.org/10.3390/horticulturae10121252

Chicago/Turabian Style

Fan, Menglong, Zhixin Song, Ying Zhang, Xinlei Li, and Zhenyuan Sun. 2024. "Multi-Approach Unveils Potential Gene Introgression of Oil Camellias" Horticulturae 10, no. 12: 1252. https://doi.org/10.3390/horticulturae10121252

APA Style

Fan, M., Song, Z., Zhang, Y., Li, X., & Sun, Z. (2024). Multi-Approach Unveils Potential Gene Introgression of Oil Camellias. Horticulturae, 10(12), 1252. https://doi.org/10.3390/horticulturae10121252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop