CN108319984A - The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level - Google Patents
The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level Download PDFInfo
- Publication number
- CN108319984A CN108319984A CN201810120969.0A CN201810120969A CN108319984A CN 108319984 A CN108319984 A CN 108319984A CN 201810120969 A CN201810120969 A CN 201810120969A CN 108319984 A CN108319984 A CN 108319984A
- Authority
- CN
- China
- Prior art keywords
- leaf
- blade
- sample
- dna methylation
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Biotechnology (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Probability & Statistics with Applications (AREA)
- Bioethics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides the construction methods and prediction technique of xylophyta leaf characters and photosynthesis characteristics model based on DNA methylation level, belong to bioassay technique field.The important feature variable for embodying geographical location difference is chosen the present invention is based on random forest, screening obtains 7 leaf characteristic variables, determines optimum cluster number, every group cluster blade sample is obtained using improved FCM clustering algorithms;According to the enzymes combinations importance that correlation between variable and gradient boosted tree obtain, obtain to important enzymes combinations in every group cluster blade sample;Using the DNA methylation level of the enzymes combinations as regression variable, LS SVM regressive prediction models are built based on Gaussian radial basis function;The DNA methylation level of important enzymes combinations is inputted accurately to predict the leaf shape factor, leaf area and Net Photosynthetic Rate.This method screens merit xylophyta individual for predicting xylophyta phenotypic characteristic and photosynthesis characteristics.
Description
Technical field
The invention belongs to biological information fields, and in particular to the xylophyta leaf morphology based on DNA methylation level is special
It seeks peace the construction method and prediction technique of photosynthesis characteristics prediction model.
Background technology
DNA methylation is occurred most frequently on the 5th carbon atom of the cytimidine in dinucleotides, is a kind of important table
See genetic modification.DNA methylation inhibits transposable element, pseudogene, the expression of repetitive sequence and genes of individuals, in many lifes
It plays a crucial role in such as gene expression of object process, embryonic development, cell Proliferation, differentiation and chromosome stability.In plant
In, DNA methylation betides the site CG, CHG, CHH (H represents other bases in addition to guanine), and in higher plant
In cell, the cytimidine being methylated can at most account for the 50% of total cytimidine number.DNA methylation controls the growth of plant
And development, more particularly to the regulation and control of gene expression and DNA replication dna.Therefore, studying the DNA methylation of plant contributes to us
The mode of DNA methylation coordinate plant growth development is solved, there is realistic meaning.
Qualitative analysis was concentrated mainly on to the research of DNA methylation in the past, lacks quantitative study, and for the DNA of plant
The research that methylates also is concentrated mainly on herbaceous plant, also less to the research of xylophyta.For example, 2015, Wanneng
Yang et al. based on linear regression have studied complete genome DNA methylate how to influence the leaf characters of rice (leaf width, leaf is long,
Leaf area etc.).In addition, 2015, Dong Ci et al. are reported using methyl-sensitive Polymorphism technique (MSAP) to leaflet
The DNA methylation decorating site of poplar natural population carries out genome-wide screening, obtains the polymorphic site that methylates;It utilizes simultaneously
Principal component analysis (PCA) and STRUCTURE softwares parse the apparent population genetic variations of populus simonii natural population.Hair
Gene on existing DNA methylation site may play an important role in leaf development and photosynthesis regulation and control, to plant trait
(including leaf), which is associated form and photosynthesis, certain influence.But the research still rests in qualitative analysis, lacks pair
The quantitative study of DNA methylation analysis.Meanwhile the research to forest tree about DNA methylation depends on full-length genome at present
DNA methylation scanning, research cost is higher, and data volume is huge causes result accuracy poor.
Invention content
In view of this, the purpose of the present invention is to provide the xylophyta leaf morphology features based on DNA methylation level
With the construction method and prediction technique of photosynthesis characteristics prediction model, predictablity rate height.
In order to achieve the above-mentioned object of the invention, the present invention provides following technical scheme:
The present invention provides xylophyta leaf morphology features and photosynthesis characteristics prediction mould based on DNA methylation level
The construction method of type, includes the following steps:
1) blade for collecting the same species xylophyta of national NATURAL DISTRIBUTION, obtains blade representative sample;
2) photosynthesis characteristics and phenotypic characteristic for measuring the blade representative sample, obtain blade photosynthesis characteristics data and
Leaf morphology characteristic;
The phenotypic characteristic includes leaf area, leaf length, leaf width degree, leaf perimeter, ratio of length to breadth and the leaf shape factor;
The photosynthesis characteristics include Net Photosynthetic Rate, stomatal conductance, CO2Concentration and efficiency of water application;
3) methylation state of DNA for measuring endonuclease bamhi in the blade representative sample, obtains the DNA first of endonuclease bamhi
Baseization is horizontal, and the full-length genome average dna methylation level of each blade representative sample is calculated;
4) with the average dna of the full-length genome of the photosynthesis characteristics data of the blade, leaf morphology characteristic and blade
Methylation level is candidate variables, generates the important feature variable of difference to geographical location using Random Forest model screening, obtains
To leaf morphology characteristic and full-length genome average dna methylation level;
5) it using the leaf morphology characteristic and full-length genome average dna methylation level as characteristic variable, utilizes
Square error and 26 Cluster Evaluation indexs determine the optimum cluster group number of blade representative sample in being organized in NbClust software packages;
6) the optimum cluster group number of the blade representative sample is input in improved Fuzzy C-Means Clustering Algorithm,
The subordinated-degree matrix of every group cluster sample is calculated;
The improved Fuzzy C-Means Clustering Algorithm is as follows:
A. it is sample data number to give optimum cluster group the number c, n, sets iteration stopping threshold value as ε=10-5If
Maximum iteration t=300 is set, random initializtion Subject Matrix U enables iteration count t=0;
If finite aggregate X={ x1, x2..., xn, and the member in X is known as m characteristic variable, X is expressed as the square of n × m
Battle array is as follows:
Wherein, m indicates that the number of characteristic variable, n represent blade representative sample number;
The n sample of matrix X is divided into c groups (2≤c≤n), the fuzzy clustering matrix U for the c groups being divided into is:
In the matrix U, μijIndicate sample xjWith the membership of classification i, and
C cluster centre be:
Select minimal error quadratic sum as clustering criteria, shown in the object function such as formula (1) of clustering:
In addition constraints formula as shown in formula (2):
Solve to obtain formula shown in formula (3):
B. update fuzzy clustering matrix and cluster centre matrix are calculated according to formula (3);
If c. P(t)-P(t-1)< ε then stop calculating, and export fuzzy clustering matrix U and cluster centre matrix P, otherwise enable
T=t+1 turns to step (2) until output matrix U and matrix P;
7) it is based on the subordinated-degree matrix for every group cluster sample that the step 6) obtains, is calculated per complete in group cluster sample
The enzymes combinations weight that the correlation and gradient boosted tree of genome average dna methylation level and each characteristic variable of blade obtain
The property wanted obtains the important endonuclease bamhi combination of each group cluster sample;
Each characteristic variable of blade includes leaf area, Net Photosynthetic Rate and the leaf shape factor;
8) the DNA methylation level of the important endonuclease bamhi combination obtained using the step 7) is utilized as regression variable
Gaussian radial basis function establishes LS-SVM regressive prediction models, obtains leaf morphology feature and photosynthesis characteristics model such as formula (9) institute
Show;
There is no the limitation of time sequencing between step 2 and 3.
Preferably, use Random Forest model screening for the Mean of selection selection characteristic variable in the step 4)
The mean value of Decrease Accuracy and Mean Decrease Gini 15 or more variable as important feature variable.
Preferably, the method for establishing LS-SVM regressive prediction models using Gaussian radial basis function, includes the following steps:
In SVM, it is assumed that sample training collection isSample training collection T
In, xiThe input variable y of i-th of sample is concentrated for training sampleiThe output variable of i-th of sample, R are concentrated for training sample
Real number field is represented, n represents input sample number, and regression function is
In formula (4), w and b are regression parameter,It is characterized mapping, x is the input variable of training sample set;
And solution is converted into problem in LS-SVM:
In formula (5), γ is regularization parameter, and ξ is relaxation factor, In×1=(1,1 ..., 1) ', In×1=(1,1 ...,
1) ' beSuch a matrix;
It is as follows to construct Lagrange functions:
Wherein α is Lagrange multiplier;Under ask the saddle point of L (w, b, ξ, α), i.e. optimum point;
W, ξ in subtractive (7), can obtain:
In formula (8), Ω=(xix′j), i, j=1,2 ..., n, E are the unit matrix of n × n.
α and b are obtained by solving formula (8), then the estimation function of least square method supporting vector machine is:
Wherein, k (xi, x 'j) it is kernel function, choose Gauss Radial basis kernel functions
Input variable is standardized before being returned, and by parameter optimization, enables γ=10,
σ=1.
Preferably, in the step 3) in blade representative sample the methylation state of DNA of endonuclease bamhi assay method,
Include the following steps:
31) genome of the EcoRI/HpaII and EcoRI/MspI restriction enzymes to the blade representative sample is used
DNA carries out digestion, obtained endonuclease bamhi;
32) endonuclease bamhi expanded in advance successively and selective amplification, according to obtained selective amplification product
Parting is carried out, the methylation state of DNA of endonuclease bamhi is obtained according to genotyping result.
Preferably, the parting be by the selective amplification product carry out electrophoresis, by obtained electrophoretic band two into
It scores in character matrix processed, indicates band missing with " 0 ", the presence of band is indicated with " 1 ";CNG (1,0) represents half first
Base state, CG (0,1) represent permethylated state, and (1,1) is represented without methylation state, and (0,0) represents the unknown shape that methylates
State;Shown in the methylation level calculation formula such as formula (10) of full-length genome:
The DNA methylation level=(site of hemimethylation state+permethylated state site+unknown first of full-length genome
The site of base state)/(site+nothing of the site of hemimethylation state+permethylated state site+unknown methylation state
The site of methylation state) formula (10).
Preferably, the quantity of the blade representative sample is 200 or more.
Preferably, shown in the calculation formula such as formula (11) of the leaf shape factor:
The present invention provides xylophyta leaf morphology features and photosynthesis characteristics prediction mould based on DNA methylation level
The prediction technique of type, the DNA methylation level combined with important endonuclease bamhi input the leaf morphology feature of the method structure
With the prediction leaf shape factor, leaf area and leaf Net Photosynthetic Rate in photosynthesis characteristics model;
The DNA methylation level of the important endonuclease bamhi combination is the important enzyme obtained in the construction method
Cut the DNA methylation levels that fragment combination is calculated according to the methylation level calculation formula of full-length genome.
Preferably, the DNA methylation level of enzymes combinations is to the blade profile factor, the influence journey of leaf area and Net Photosynthetic Rate
Degree draws edge effect figure.
The method for drafting of the edge effect figure is the DNA methylation level that combines important endonuclease bamhi as input
Then variable calls the function of the drafting edge effect figure carried in gbm to draw more important endonuclease bamhi to each blade
The edge effect figure of the leaf shape factor, leaf area and leaf Net Photosynthetic Rate.
Xylophyta leaf morphology characteristic and photosynthesis characteristics provided by the invention based on DNA methylation level predict mould
The construction method of type carries out photosynthesis characteristics using random forest and phenotypic characteristic selects by blade representative sample before cluster,
Obtain the characteristic variable for having larger impact to the xylophyta Geographical distribution differences.When blade representative sample clusters, it is not used
Traditional clustering method (K-Means is clustered, PAM clusters etc.), but the improved Fuzzy C-Means Clustering Algorithm used, institute
Result, which must be exported, has probability meaning, remains the uncertainty of xylophyta inter-individual difference.Establishing prediction model simultaneously
Before, more important enzymes combinations are screened first, reduce the complexity of prediction model and practical operation, keep prediction more acurrate.Together
When the present invention establish the apparent population genetic study strategy of forest for the first time, parsed forest tree population epigenetic structure, propose
DNA methylation may influence the phenotypic characteristic and photosynthesis characteristics of populus simonii natural population.
Further, method provided by the invention, the DNA methylation combined using the important endonuclease bamhi of screening are horizontal
Edge effect figure is made to phenotypic characteristic and photosynthesis characteristics, realizes quantitative study DNA methylation to phenotypic characteristic and photosynthesis characteristics
Influence.
Description of the drawings
Fig. 1 is to screen populus simonii characteristic variable figure using forest stochastic model in embodiment 1;
Fig. 2 is to determine populus simonii sample optimum cluster number based on 26 Cluster Evaluation indexes of SSE and in embodiment 1
Figure;
Fig. 3 is the individual figure determined using improved FCM algorithms in embodiment 1 per class sample;
Fig. 4 is that correlation clusters situation map between each characteristic variable in embodiment 1;
Fig. 5 is the importance of endonuclease bamhi and three variables in the first group cluster populus simonii sample in embodiment 1;
Fig. 6 is the importance of endonuclease bamhi and three variables in the first group cluster populus simonii sample in embodiment 1;
Fig. 7 is influence of the DNA methylation level to the marginal utility of leaf area of important enzymes combinations in embodiment 2, figure
7-1 is the first group cluster populus simonii sample;Fig. 7-2 is the second cluster populus simonii sample;
Fig. 8 is shadow of the DNA methylation level to the marginal utility of Net Photosynthetic Rate of important enzymes combinations in embodiment 2
It rings, Fig. 8-1 is the first group cluster populus simonii sample;Fig. 8-2 is the second cluster populus simonii sample;
Fig. 9 is marginal utility of the DNA methylation level to the blade shape factor of important enzymes combinations in embodiment 2
It influences, Fig. 9-1 is the first group cluster populus simonii sample;Fig. 9-2 is the second cluster populus simonii sample;
Figure 10 is the leaf morphology of two willow subgroups in embodiment 2.
Specific implementation mode
The present invention provides xylophyta leaf morphology features and photosynthesis characteristics prediction mould based on DNA methylation level
The construction method of type, includes the following steps:
1) blade for collecting a kind of xylophyta of national NATURAL DISTRIBUTION, forms blade representative sample;
2) photosynthesis characteristics and phenotypic characteristic for measuring the blade representative sample, obtain blade photosynthesis characteristics data and
Leaf morphology characteristic;
The phenotypic characteristic includes leaf area, leaf length, leaf width degree, leaf perimeter, ratio of length to breadth and the leaf shape factor;
The photosynthesis characteristics include Net Photosynthetic Rate, stomatal conductance, CO2Concentration and efficiency of water application;
3) methylation state of DNA for measuring endonuclease bamhi in the blade representative sample, obtains the DNA first of endonuclease bamhi
Baseization is horizontal, and the full-length genome average dna methylation level of each blade representative sample is calculated;
4) with the average dna of the full-length genome of the photosynthesis characteristics data of the blade, leaf morphology characteristic and blade
Methylation level is candidate variables, generates the important feature variable of difference to geographical location using Random Forest model screening, obtains
To leaf morphology feature and full-length genome average dna methylation level;
5) it using leaf morphology characteristic and full-length genome average dna methylation level as characteristic variable, utilizes
Square error and 26 Cluster Evaluation indexs determine the optimum cluster group number of blade representative sample in being organized in NbClust software packages;
6) optimum clustering number of the blade representative sample is input in improved Fuzzy C-Means Clustering Algorithm, is counted
Calculation obtains the subordinated-degree matrix of every group cluster sample;
The improved Fuzzy C-Means Clustering Algorithm is as follows:
A. the optimum clustering number c is given, n is sample data number, sets iteration stopping threshold value as ε=10-5, setting
Maximum iteration t=300, random initializtion Subject Matrix U, enables iteration count t=0;
If finite aggregate X={ x1, x2..., xn, and the member in X is known as m characteristic variable, X is expressed as the square of n × m
Battle array is as follows:
Wherein, m indicates that the number of characteristic variable, n represent blade representative sample number;
The n sample of matrix X is divided into c classes (2≤c≤n), fuzzy clustering matrix U is:
Wherein, μijIndicate sample xjWith the membership of classification i, and
C cluster centre be:
Select minimal error quadratic sum as clustering criteria, the object function of clustering is:
In addition constraints obtains:
It solves:
B. update fuzzy clustering matrix and cluster centre matrix are calculated according to formula (3);
If c. P(t)-P(t-1)< ε then stop calculating, and export fuzzy clustering matrix U and cluster centre matrix P, otherwise enable
T=t+1 turns to step (2) until obtaining fuzzy clustering matrix U and cluster centre matrix P;
7) subordinated-degree matrix based on every group cluster sample is calculated per full-length genome average dna methyl in group cluster sample
The enzymes combinations importance that the correlation and gradient boosted tree of change level and each characteristic variable of blade obtain obtains each group and gathers
The important endonuclease bamhi of class sample combines;
Each characteristic variable of blade includes leaf area, Net Photosynthetic Rate and the leaf shape factor;
8) it using the DNA methylation level of important endonuclease bamhi combination as regression variable, is built using Gaussian radial basis function
Vertical LS-SVM regressive prediction models input LS-SVM regressive prediction models with the DNA methylation level that important endonuclease bamhi combines
The middle prediction leaf shape factor, leaf area and leaf Net Photosynthetic Rate.
The present invention collects the blade of the same species xylophyta of national NATURAL DISTRIBUTION, obtains blade representative sample.
In the present invention, the collection of blade preferably Shaanxi, Qinghai, Hebei, Henan, Ningxia, Shanxi, Beijing, Inner Mongol
It is ancient.The type of the xylophyta is not particularly limited, the applicable all xylophytas of method provided by the invention.It is described
Xylophyta is preferably willow category, most preferably populus simonii.The quantity of the blade sample of xylophyta be preferably 1000 with
On.Blade representative sample is selected from the blade of the xylophyta of acquisition.The standard selected is can to cover nature leaflet
The entire geographical distribution of Yang Qunti.The quantity of the blade representative sample is 200~500.
After obtaining blade representative sample, the present invention measures the photosynthesis characteristics and phenotypic characteristic of the blade representative sample, obtains
To the photosynthesis characteristics data and leaf morphology characteristic of blade;The leaf morphology feature includes leaf area, leaf length, leaf
Width, leaf perimeter, ratio of length to breadth and the leaf shape factor;The photosynthesis characteristics include Net Photosynthetic Rate, stomatal conductance, CO2Concentration
And efficiency of water application.
In the present invention, the measurement leaf area, leaf length, leaf width degree, leaf perimeter and leaf width degree five phenotypic characters of ratio
When, use laser blade area measuring device measurement.The present invention is not particularly limited laser blade area measuring device,
Using leaf area measuring instrument known in the art.In the embodiment of the present invention, leaf area measuring instrument is portable laser leaf
Piece area measuring device (CI-202).In the present invention, shown in the formula such as formula (11) of the leaf shape factor:
In the present invention, the photosynthesis characteristics are surveyed using portable gas exchange system (Li-6400xt, LiCor) instrument
It is fixed.In order to obtain maximum instantaneous Net Photosynthetic Rate, photosynthetic photon flux density (PPFD) is set as 1600, CO2Concentration is arranged
It is 400.Net Photosynthetic Rate, stomatal conductance, iuntercellular CO2Concentration and water use efficiency (WUE) have under Net Photosynthetic Rate
Record.
The present invention measures the methylation state of DNA of endonuclease bamhi in the blade representative sample, obtains endonuclease bamhi
DNA methylation is horizontal, and the full-length genome average dna methylation level of each blade representative sample is calculated.
In the present invention, in the blade representative sample methylation state of DNA of endonuclease bamhi assay method, preferably
Include the following steps:
31) use EcoRI/HpaII and EcoRI/MspI restriction enzymes respectively to the base of the blade representative sample
Because group DNA carries out digestion, obtained endonuclease bamhi;
32) endonuclease bamhi expanded in advance successively and selective amplification, according to obtained selective amplification product
Parting is carried out, the methylation state of DNA of endonuclease bamhi is obtained according to genotyping result.
The present invention preferably extracts the genomic DNA of the blade representative sample.
In the present invention, the extracting method of the genomic DNA of the blade representative sample is not particularly limited, using this
Genome DNA extracting method known to field.In embodiments of the present invention, the genome of the blade representative sample
The extraction of DNA extracts to obtain using RNA isolation kit.The kit using DNA Plant Mini Kits (base root China, on
Sea).
After obtaining the genomic DNA of the blade representative sample, the present invention uses EcoRI/HpaII and EcoRI/MspI
Restriction enzyme carries out digestion, obtained endonuclease bamhi to the genomic DNA of the blade representative sample respectively.
In the present invention, the source of the EcoRI/HpaII and EcoRI/MspI restriction enzymes does not have special limit
System, using enzyme source known in the art.The present invention is not particularly limited the digestion method, using ability
The enzymatic cleavage methods of EcoRI/HpaII and EcoRI/MspI restriction enzymes known to domain.
In the present invention, the endonuclease bamhi carries out amplification and selective amplification in advance, obtains selective amplification PCR product.
The present invention is not particularly limited the pre- amplification and selective amplification, the Variation in delivered using Dong Ci et al.
genomic methylation in natural populations of Populus simonii is associated
With leaf shape and photosynthetic traits (Journal of Experimental Botany,
Vol.67, No.3pp.723-737,2016).
After obtained selective amplification product, the present invention carries out parting to the selective amplification product, according to parting knot
Fruit obtains the methylation state of DNA of endonuclease bamhi.
In the present invention, when the selective amplification product is EcoRI/HpaII digestion with restriction enzyme product bands
It is indicated with H;The selective amplification product is indicated when being EcoRI/MspI digestion with restriction enzyme product bands with M.If choosing
After the PCR product of selecting property amplification carries out electrophoresis, HM has band, then illustrates that the two cut place does not all methylate, i.e. CCGG sequences
Row;If H has band, when M is without band, be hemimethylation also have be it is outer methylate, i.e., 5 ' mCCGG sequences;And H is without band, when M has band,
Permethylated, also say be it is interior methylate, i.e. 5 ' CmCGG sequences.HpaII cannot cut any full methyl of double-strand cytimidine
Change, can only methylate outside cutting single-chain.As for MspI, it can cut inside methylation sites, either it is permethylated,
It can also be hemimethylation.
After obtained selective amplification product, the present invention carries out parting to the selective amplification product, according to parting knot
Fruit obtains the methylation state of DNA of endonuclease bamhi.
In the present invention, the selective amplification product is preferably carried out electrophoresis, the electrophoretic band that will be obtained by the parting
It scores in binary-coded character matrix, indicates band missing with " 0 ", the presence of band is indicated with " 1 ";(CNG (1,0))
Hemimethylation state is represented, (CG (0,1) represents permethylated state, and (1,1) represents permethylated state, and (0,0) represents not
Know methylation state) the methylation level calculation formula of full-length genome is shown in formula (1,0):
The DNA methylation level=(site of hemimethylation state+permethylated state site+unknown first of full-length genome
The site of base state)/(site+nothing of the site of hemimethylation state+permethylated state site+unknown methylation state
The site of methylation state) formula (10).
Obtain the average dna methyl of the full-length genome of the photosynthesis characteristics data of blade, leaf morphology characteristic and blade
After changing level, the present invention is with the flat of the full-length genome of the photosynthesis characteristics data of the blade, leaf morphology characteristic and blade
Equal DNA methylation level is candidate variables, and the important feature for being generated difference to geographical location using Random Forest model screening is become
Amount, obtains leaf morphology feature and full-length genome average dna methylation level.
In the present invention, the important feature variable for generating difference to geographical location using Random Forest model screening
Method, be by the mean value of Mean Decrease Accuracy and Mean Decrease Gini 15 or more variable make
For important characteristic variable.For the application using populus simonii as sample, obtained important feature variable is the leaf morphology spy of the blade
Levy the average dna methylation level of the full-length genome of data and blade.
After obtaining leaf morphology feature and full-length genome average dna methylation level, the present invention is by leaf morphology characteristic
, as characteristic variable, square error knot in being organized in NbClust software packages is utilized according to full-length genome average dna methylation level
Close the optimum cluster group number that 26 Cluster Evaluation indexs determine blade representative sample.
In the present invention, when being assessed using square error in group (SSE), the cluster numbers of slope minimum are selected, and use 26
When a Cluster Evaluation index evaluation, it is optimum cluster group to select group internal standard number (Number Criteria) maximum cluster numbers
Number.
The optimum clustering number of the blade representative sample is input to improved Fuzzy C-Means Clustering Algorithm by the present invention
In, the subordinated-degree matrix of every group cluster sample is calculated;
The improved Fuzzy C-Means Clustering Algorithm is as follows:
A. it is sample data number to give optimum cluster group the number c, n, sets iteration stopping threshold value as ε=10-5If
Maximum iteration t=300 is set, random initializtion Subject Matrix U enables iteration count t=0;
If finite aggregate X={ x1, x2..., xn, and the member in X is known as m characteristic variable, X is expressed as the square of n × m
Battle array is as follows:
Wherein, m indicates that the number of characteristic variable, n represent blade representative sample number;
The n sample of matrix X is divided into c classes (2≤c≤n), fuzzy clustering matrix U is:
Wherein, μijIndicate sample xjWith the membership of classification i, and
C cluster centre be:
Select minimal error quadratic sum as clustering criteria, the object function of clustering is:
In addition constraints obtains:
It solves:
B. update fuzzy clustering matrix and cluster centre matrix are calculated according to formula (3);
If c. P(t)-P(t-1)< ε then stop calculating, and export fuzzy clustering matrix U and cluster centre matrix P, otherwise enable
T=t+1 turns to step (2) until obtaining fuzzy clustering matrix U and cluster centre matrix P.
After obtaining the subordinated-degree matrix of every group cluster sample, the present invention is based on the subordinated-degree matrix of every group cluster sample, meters
The correlation per full-length genome average dna methylation level and each characteristic variable of blade in group cluster sample is calculated, and according to ladder
The enzymes combinations importance that degree boosted tree obtains.
In the present invention, the method for determining enzymes combinations importance is as follows:
First, the method for analyzing each correlation of variables is as follows:
In order to analyze the correlativity and combined effect situation of each variable, converts related coefficient to distance metric, use
Method be d=1- | r |, wherein d be metric range, r is related coefficient;The formula of r is as follows:
Cov (X, Y) is X in formula, and the covariance of Y, D (X), D (Y) are respectively the variance of X, Y.).By it is multiple dimensioned from
Main double sampling can obtain the p value that every part of data carry out hierarchical clustering, the uncertainty of hierarchical clustering is assessed with this.We
In this way to 51 variables (leaf area, leaf length, leaf width, leaf perimeter, ratio of length to breadth, the leaf shape factor, net photosynthesis
Rate, stomatal conductance, CO2The DNA methylation of concentration and efficiency of water application and 41 enzymes combinations is horizontal) to carry out system poly-
Class, has obtained the correlativity between each variable, and the stronger variable of correlation is marked with red boxes.
Secondly, the enzymes combinations importance obtained according to gradient boosted tree, the specific method is as follows:
Using the gbm packet training gradient boosted trees in R language,
The setting of parameter is as follows:
Distribution=' gaussian ',
N.trees=10000,
Shrinkage=0.01,
Interaction.depth=5,
Bag.fraction=0.5,
Cv.folds=10.
Gbm packets calculate each input variable to the importance of relevant variable according to training pattern, and score is higher, to sound
Dependent variable influences bigger.The important endonuclease bamhi combination of each group cluster sample is thus obtained, we select importance score
Endonuclease bamhi 2 or more combines.
In the present invention, it is drawn using the function and arrange parameter of drawing edge effect figure in the gbm packets in R language, instruction
Practice gradient boosted tree, after obtaining the enzymes combinations of great influence, we draw edge effect figure.
The setting of parameters is as follows in the drawing process:
Distribution=' gaussian ',
N.trees=10000,
Shrinkage=0.01,
Interaction.depth=5,
Bag.fraction=0.5,
Cv.folds=10.
According to obtained edge effect figure, so that it may to analyze in every a kind of populus simonii, DNA methylation horizontal blade face
Product, the edge effect of the Net Photosynthetic Rate leaf shape factor, to analyze DNA methylation to populus simonii leaf characters and photosynthetic spy
Property influence it is last, after obtaining the combination of important endonuclease bamhi, DNA methylation water that the present invention is combined with important endonuclease bamhi
It is flat to be used as regression variable, LS-SVM regressive prediction models are established using Gaussian radial basis function, obtain the leaf as shown in formula (9)
Phenotypic characteristic model;
In the present invention, in SVM, it is assumed that sample training collection isIt returns
The function is returned to be
Wherein, w, b are regression parameter,It is characterized mapping.
And solution is converted into problem in LS-SVM:
Wherein, γ is regularization parameter, and ξ is relaxation factor, In×1=(1,1 ..., 1) ';
It is as follows to construct Lagrange functions:
Wherein α is Lagrange multiplier;Under ask the saddle point of L (w, b, ξ, α), i.e. optimum point;
W, ξ in subtractive (7), can obtain:
In formula (8), Ω=(xix′j), i, j=1,2 ..., n, E are the unit matrix of n × n.
α and b are obtained by solving formula (8), then the estimation function of least square method supporting vector machine is:
Wherein, k (xi, x 'j) it is kernel function, choose Gauss radial directions base (RBF) kernel function
Input variable is standardized before being returned, and by parameter optimization, enables γ=10, σ=1.
The present invention provides xylophyta leaf morphology features and photosynthesis characteristics prediction mould based on DNA methylation level
The prediction technique of type, the DNA methylation level combined with important endonuclease bamhi input the leaf morphology feature of the method structure
With the prediction leaf shape factor, leaf area and leaf Net Photosynthetic Rate in photosynthesis characteristics model;
The DNA methylation level of the important endonuclease bamhi combination is the important enzyme slice obtained in the construction method
Duan Zuhe is horizontal according to the DNA methylation that the methylation level calculation formula of full-length genome is calculated.
In the present invention, the DNA methylation level of enzymes combinations is to the blade profile factor, the shadow of leaf area and Net Photosynthetic Rate
The degree of sound draws edge effect figure.
The method for drafting of the edge effect figure is the DNA methylation level that combines important endonuclease bamhi as input
Then variable calls the function of the drafting edge effect figure carried in gbm to draw more important endonuclease bamhi to each blade
The edge effect figure of the leaf shape factor, leaf area and leaf Net Photosynthetic Rate.By making marginal utility figure, we can obtain
The DNA methylation level of enzymes combinations is to the blade profile factor, the influence of leaf area and Net Photosynthetic Rate.Influence can be directly found
Larger endonuclease bamhi, further the gene loci of research thereon, reduces time and economic cost.
With reference to embodiment to a kind of blade based on DNA methylation horizontal forecast xylophyta provided by the invention
The construction method and prediction technique of shape and photosynthesis characteristics model are described in detail, but they cannot be interpreted as pair
The restriction of the scope of the present invention.
Embodiment 1
Below in the collected 235 populus simonii individuals in the whole nation.
The acquisition of experimental data:
The Variation in genomic methylation in natural delivered using Dong Ci et al.
populations of Populus simonii is associated with leaf shape and
Photosynthetic traits (Journal of Experimental Botany, Vol.67, No.3pp.723-737,
2016) 235 populus simonii individual of sample DNA genomic methylation water are calculated in the data that methylate of endonuclease bamhi disclosed in
Flat, the results are shown in Table 1, and (CC indicates Chicheng County:Chicheng County, ZJK:Zhangjiakou FX:Shaanxi Fu County;LY:
The Linyou Counties Linyou County;LX:The Langao Counties Langao County, LC:Luochuan County Luochuan Counties, GQ:
The Gaoling Counties Gaoling Count;HZ:The Danma Huzhus Huzhu County, XH:The Xinghai Counties Xinghai County;W:Dulan
County Dulan Counties, MY:The counties Menyuan County Men Yuan, SX:The Songxian County Song County, YC:Yichuan County she
Chuan Xian, JL:The Zhongning Counties Zhongning County, NM:The Baotous Baotou City, NW:The Ningwu Counties Ningwu County,
TRT:The Beijing Taoranting Park Joyous Pavilion Park).Portable laser blade area measuring device (CI-202) is used simultaneously
Measure leaf morphology, leaf width (abbreviation width), perimeter (abbreviation perim), leaf area (abbreviation area), (letter of the leaf shape factor
Claim fact), leaf length (abbreviation length), ratio of length to breadth (abbreviation ratio) and average DNA methylation are horizontal (referred to as
Dmavg));The photosynthesis characteristics of blade use portable gas exchange system (li-6400xt licor) Instrument measuring.In order to obtain
Maximum instantaneous Net Photosynthetic Rate is obtained, photosynthetic photon flux density (PPFD) is set as 1600, CO2Concentration is set as 400.Only
Photosynthetic rate, stomatal conductance, iuntercellular CO2Concentration and water use efficiency (WUE) have record under Net Photosynthetic Rate.Light
It closes performance data and the results are shown in Table 2 (place name mark is same as above).
1. being classified to populus simonii sample using improved FCM algorithm.
1.1 choose the important feature variable that can embody geographical location difference based on random forest.
Fig. 1 is that the important feature variable that can embody geographical location difference is chosen based on random forest.It is by Fig. 1 it is found that small
The difference of leaf poplar sample depends primarily on seven variables, such as leaf width (abbreviation width), perimeter (abbreviation perim), leaf area
(abbreviation area), the leaf shape factor (abbreviation fact), leaf length (abbreviation length), ratio of length to breadth (abbreviation ratio) and average
DNA methylation level (abbreviation Dmavg).
1.2 determine populus simonii sample optimum cluster number.Fig. 2 is to determine populus simonii sample optimum cluster number.From Fig. 2
In as can be seen that using group in a square error (SSE) assess when, it is 2 select the cluster numbers of slope minimum, when use 26 gather
When class evaluation index is assessed, it is 2 to select group internal standard number (Number Criteria) maximum cluster numbers.Therefore, comprehensive SSE
Optimum cluster number with 26 indexs, recommendation is 2.
1.3 determine the individual per class sample using improved FCM algorithms.From figure 3, it can be seen that first kind sample packet
Containing 139 populus simonii individuals, the second class sample includes 97 populus simonii individuals.
2. the selection of prediction model regression variable
The research of 2.1 correlation of variables
As seen from Figure 4, find have the DNA methylation horizontal correlation of four groups of enzymes combinations extremely strong, such as H44E4
And H47E4, H80E1 and H80E14, H80E9 and H86E9, and the 4th group comprising 15 enzymes combinations (H65E7, H63E9,
H86E7, H60E3, H60E1, H46E4, H31E1, H60E15, H65E8, H47E6, H63E11, H65E6, H34E3, H46E10,
H46E11).These results show that DNA methylation is regiospecificity in willow genome, and the digestion group of strong correlation
The modification that the DNA methylation levels of conjunction may represent DNA methylation on these areas is similar.
2.2 select regression variable based on the enzymes combinations importance obtained by machine learning algorithm (gradient boosted tree), due to
There are certain errors for the machine learning method, so the important endonuclease bamhi selected after study every time can be slightly different.Therefore it is
Reduction error, herein by the endonuclease bamhi for taking out existing more number after multiple study, and according to importance degree size into
Row sequence.
As shown in Figure 5, in first kind populus simonii sample, enzymes combinations H31E1, H65E12, H80E14, H65E5,
The DNA methylation level of H80E2, H46E12, H80E12, H65E4 and H80E1 have larger impact, enzymes combinations to the blade profile factor
The DNA methylation level of H31E3, H46E11, H60E3, H63E10, H44E4, H80E2, H65E5, H63E11 and H60E2 are to leaf
Piece area is affected, enzymes combinations H60E3, H31E3, H63E12, H46E4, H65E7, H86E16, H82E5, H80E1 and
The DNA methylation level of H63E11 is affected to Net Photosynthetic Rate.
It will be appreciated from fig. 6 that in the second class populus simonii sample, enzymes combinations H60E15, H60E1, H63E10, H65E6,
The DNA methylation level of H34E5, H65E7, H80E13, H82E5 and H60E2 have larger impact, enzymes combinations to the blade profile factor
The DNA methylation of H80E13, H60E2, H82E5, H60E15, H63E11, H86E7, H80E9, H86E16, H65E6 and H44E4
Level is affected to blade area, enzymes combinations H65E8, H60E1, H80E2, H86E7, H46E11, H44E15, H63E12,
The DNA methylation level of H82E5 and H46E4 is affected to Net Photosynthetic Rate.
3. using the enzymes combinations that filter out as regression variable, based on LS-SVM prediction populus simonii phenotypic characteristics and photosynthetic
The value of characteristic (the blade profile factor, blade area, Net Photosynthetic Rate), while populus simonii phenotypic characteristic and photosynthesis characteristics data are measured,
It the results are shown in Table 3~table 8.
In first kind populus simonii sample:Predictablity rate to the blade profile factor, blade area, Net Photosynthetic Rate is respectively
96.26%, 94.42% and 96.88%.In second class populus simonii sample, to the blade profile factor, blade area, Net Photosynthetic Rate
Predictablity rate respectively reaches 81.27%, 92.1% and 95.8%.
Embodiment 2
DNA methylation level and populus simonii leaf morphology feature, the relationship of photosynthesis characteristics.
In order to inquire into DNA methylation level to phenotypic character (the blade shape factor, blade area and Net Photosynthetic Rate)
It influences, we analyze the numerical value of marginal utility.
In first kind populus simonii sample, enzymes combinations H31E3, H60E3, H63E10, H44E4, H80E2, H65E5,
The DNA methylation level of H63E11 and H60E2 is apparent (such as Fig. 7-1) to the edge effect of blade area.The meeting of blade area with
It the raising of the DNA methylation level of enzymes combinations H60E3, H44E4 and H60E2 and reduces, with H31E3, H63E10,
The raising of the DNA methylation levels of H80E2, H65E5 and H63E11 and improve.In second class populus simonii sample, enzymes combinations
The DNA methylation of H80E13, H60E2, H82E5, H60E15, H63E11, H86E7, H80E9, H86E16, H65E6 and H44E4
Level is fairly obvious to the edge effect of blade area (with reference to figure 7-2), also, in addition to enzymes combinations H80E13, H60E15,
H63E11 and H86E7, the very high of other enzymes combinations DNA methylations levels can make blade area become smaller.
For Net Photosynthetic Rate (as shown in Figure 8), in first kind populus simonii sample, enzymes combinations H60E3,
The edge effect of the DNA methylation level of H31E3, H63E12, H46E4, H65E7, H86E16, H82E5, H80E1 and H63E11
Obviously.Wherein, Net Photosynthetic Rate can be reduced with the raising of the DNA methylation level of H31E3, H63E12 and H65E7, with
It the raising of the DNA methylation level of H60E3, H46E4, H86E16, H82E5, H80E1 and H63E11 and increases.In the second class
In sample, the edge effect of the DNA methylation level of enzymes combinations H65E8, H80E2 and H86E7 is apparent, Net Photosynthetic Rate meeting
It is reduced with the raising of the DNA methylation level of enzymes combinations H65E8 and H86E7, with the DNA first of enzymes combinations H80E2
The raising of baseization level and increase.
It can be seen in figure 9 that in first kind populus simonii sample, enzymes combinations H31E1, H65E12, H80E14,
Edge effect ten of the DNA methylation level of H65E5, H80E2, H46E12, H80E12, H65E4 and H80E1 to the leaf factor
Clearly demarcated aobvious, the DNA methylation level of enzymes combinations H31E1, H65E12, H80E14, H65E5, H46E12 and H65E4 are higher, leaf
The shape factor is smaller, but the leaf factor can be with the improve of the DNA methylation level of enzymes combinations H80E2, H80E12 and H80E1
Two improve.In second class populus simonii sample, the DNA methylation level of enzymes combinations H82E5 and H60E2 are higher, the leaf factor
It is bigger.But the leaf factor can be with the DNA methylation of H60E15, H60E1, H63E10, H65E6, H34E5, H65E7 and H80E13
Horizontal raising and reduce.
By analyzing two class populus simonii samples, it is found that the raising of DNA methylation level may be such that the leaf factor reduces.
The kind of Populus has the diversity of phenotype.And DNA methylation is bigger to the contribution of Populus plasticity.The blade profile factor can at this time
The important references factor as Phenotypic Diversity.The value of the blade profile factor is closer to 1, then the shape of blade is closer to round.Leaflet
The blade of poplar mainly has following two (such as Figure 10).Pass through their blade profile factor of calculating, it has been found that for the first blade
Form, the blade profile factor is larger, in 0.698-0.7853 or so.The blade profile factor of the populus simonii of second of leaf morphology is generally small
In 0.5. and its blade profile factor values it is bigger, vane curvature is smaller.Pass through long-term experiment and research, it has been found that second
The populus simonii resistance of type is stronger.Therefore increase it is concluded that obtaining the resistance that DNA methylation may be this kind of populus simonii
Strong reason.
The present invention provides theoretical foundation for growth and development of the populus simonii under DNA methylation effect.
Pass through the analysis of two subpopulations to populus simonii, it has been found that the raising of DNA methylation level may reduce
The blade shape factor.The kind of Populus has the diversity of phenotype.And DNA methylation is bigger to the contribution of Populus plasticity.This
When the blade profile factor can be used as the important references factor of Phenotypic Diversity.The value of the blade profile factor is got over closer to 1, the then shape of blade
Close to circle.The blade of populus simonii mainly has following two.Pass through their blade profile factor of calculating, it has been found that for first
Kind leaf morphology, the blade profile factor is larger, in 0.698-0.7853 or so.The blade profile factor of the populus simonii of second of leaf morphology
Generally less than 0.5. and its blade profile factor values is bigger, vane curvature is smaller.Pass through long-term experiment and research, it has been found that
The populus simonii resistance of second of type is stronger.Therefore it is concluded that it may be the anti-of this kind of populus simonii to obtain DNA methylation
The reason of inverse property enhancing.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.
The measured value and actual value of the Net Photosynthetic Rate of 4 second group cluster blade sample of table
The measured value and actual value of the leaf shape factor of 5 second group cluster blade sample of table
Feature | The measured value of Pn | Predicted value | Feature | The measured value of Pn | Predicted value |
CC27 | 0.09 | 0.10210262 | FX63 | 0.65 | 0.61308217 |
CC40 | 0.15 | 0.15696717 | FX7 | 0.1 | 0.11401913 |
FX104 | 0.05 | 0.0656351 | FX76 | 0.6 | 0.56567484 |
The measured value and actual value of the leaf shape factor of 6 first group cluster blade sample of table
Feature | Measured value | Predicted value | Feature | Measured value | Predicted value |
CC10 | 0.63 | 0.6242989 | FX61 | 0.59 | 0.5846074 |
CC11 | 0.61 | 0.6045739 | FX64 | 0.7 | 0.6869076 |
CC12 | 0.6 | 0.5941265 | FX65 | 0.64 | 0.636514 |
The measured value and actual value of the blade area of 7 first group cluster blade sample of table
Feature | Measured value | Predicted value | Feature | Measured value | Predicted value |
CC10 | 26.98 | 27.45045 | GQ54 | 10.78 | 12.71664 |
CC11 | 17.85 | 19.23766 | GQ6 | 11.85 | 13.69531 |
CC12 | 15.07 | 16.25292 | HZ19 | 18.23 | 19.44564 |
The measured value and actual value of the Net Photosynthetic Rate of 8 first group cluster blade sample of table
Feature | Measured value | Predicted value | Feature | Measured value | Predicted value |
CC10 | 18.1 | 17.869152 | FX64 | 15.33 | 15.324247 |
CC11 | 17.07 | 16.932437 | FX65 | 10.58 | 10.968846 |
CC12 | 17.03 | 17.026104 | FX66 | 12.21 | 12.541189 |
Claims (9)
1. a kind of structure side of the leaf morphology feature and photosynthesis characteristics prediction model of the xylophyta based on DNA methylation level
Method includes the following steps:
1) blade for collecting the same species xylophyta of national NATURAL DISTRIBUTION, obtains blade representative sample;
2) phenotypic characteristic and photosynthesis characteristics for measuring the blade representative sample, obtain the photosynthesis characteristics data and blade table of blade
Type characteristic;
The leaf morphology feature includes leaf area, leaf length, leaf width degree, leaf perimeter, ratio of length to breadth and the leaf shape factor;
The photosynthesis characteristics include Net Photosynthetic Rate, stomatal conductance, CO2Concentration and efficiency of water application;
3) methylation state of DNA for measuring endonuclease bamhi in the blade representative sample, obtains the DNA methylation water of endonuclease bamhi
It is flat, the full-length genome average dna methylation level of each blade representative sample is calculated;
4) with the average dna methyl of the full-length genome of the photosynthesis characteristics data of the blade, leaf morphology characteristic and blade
It is candidate variables to change horizontal, generates the important feature variable of difference to geographical location using Random Forest model screening, obtains leaf
Piece phenotypic characteristic data and full-length genome average dna methylation level;
5) it using the leaf morphology characteristic and full-length genome average dna methylation level as characteristic variable, utilizes
Square error and 26 Cluster Evaluation indexs determine the optimum cluster group number of blade representative sample in being organized in NbClust software packages;
6) the optimum cluster group number of the blade representative sample is input in improved Fuzzy C-Means Clustering Algorithm, is calculated
To the subordinated-degree matrix of every group cluster sample;
The improved Fuzzy C-Means Clustering Algorithm is as follows:
A. it is sample data number to give optimum cluster group the number c, n, sets iteration stopping threshold value as ε=10-5, setting maximum
Iterations t=300, random initializtion Subject Matrix U, enables iteration count t=0;
If finite aggregate X={ x1,x2,...,xn, and the member in X is known as m characteristic variable, X is expressed as the matrix of n × m such as
Under:
Wherein, m indicates that the number of characteristic variable, n represent blade representative sample number;
The n sample of matrix X is divided into c groups (2≤c≤n), the fuzzy clustering matrix U for the c groups being divided into is:
In the matrix U, μijIndicate sample xjWith the membership of classification i, and 0≤μij≤1,C cluster centre
For:
Select minimal error quadratic sum as clustering criteria, shown in the object function such as formula (1) of clustering:
In addition constraints formula as shown in formula (2):
Solve to obtain formula shown in formula (3):
B. update fuzzy clustering matrix and cluster centre matrix are calculated according to formula (3);
If c. P(t)-P(t-1)< ε then stop calculating, and export fuzzy clustering matrix U and cluster centre matrix P, otherwise enable t=t+
1, step (2) is turned to until output matrix U and matrix P;
7) it is based on the subordinated-degree matrix for every group cluster sample that the step 6) obtains, is calculated per full-length genome in group cluster sample
The enzymes combinations importance that the correlation and gradient boosted tree of average dna methylation level and each characteristic variable of blade obtain, obtains
Important endonuclease bamhi to each group cluster sample combines;
Each characteristic variable of blade includes leaf area, Net Photosynthetic Rate and the leaf shape factor;
8) the DNA methylation level of the important endonuclease bamhi combination obtained using the step 7) utilizes Gauss as regression variable
Radial basis function establishes LS-SVM regressive prediction models, obtains shown in leaf morphology feature and photosynthesis characteristics model such as formula (9);
There is no the limitation of time sequencing between step 2 and 3.
2. construction method according to claim 1, which is characterized in that screened using Random Forest model in the step 4)
Select the mean value of the Mean Decrease Accuracy and Mean Decrease Gini of characteristic variable 15 or more for selection
Variable as important feature variable.
3. the method for the xylophyta leaf morphology feature and photosynthesis characteristics of prediction according to claim 1, feature exist
In the method for establishing LS-SVM regressive prediction models using Gaussian radial basis function includes the following steps:
In SVM, it is assumed that sample training collection isIn sample training collection T, xiFor
Training sample concentrates the input variable of i-th of sample, yiThe output variable of i-th of sample, R is concentrated to represent for training sample
Real number field, n represent input sample number, and regression function is
In formula (4), w and b are regression parameter,It is characterized mapping, x is the input variable of training sample set;
And solution is converted into problem in LS-SVM:
In formula (5), γ is regularization parameter, and ξ is relaxation factor, In×1=(1,1 ..., 1) ', In×1=(1,1 ..., 1) ' beSuch a matrix;
It is as follows to construct Lagrange functions:
Wherein, α is Lagrange multiplier;Under ask the saddle point of L (w, b, ξ, α), i.e. optimum point;
W, ξ in subtractive (7), can obtain:
In formula (8), Ω=(xix′j), i, j=1,2 ..., n, E are the unit matrix of n × n.
α and b are obtained by solving formula (8), then the estimation function of least square method supporting vector machine is:
Wherein, k (xi,x′j) it is kernel function, choose Gauss Radial basis kernel functions
Input variable is standardized before being returned, and by parameter optimization, enables γ=10, σ=1.
4. construction method according to claim 1, which is characterized in that endonuclease bamhi in blade representative sample in the step 3)
Methylation state of DNA assay method, include the following steps:
31) use EcoRI/HpaII and EcoRI/MspI restriction enzymes to the genomic DNA of the blade representative sample into
Row digestion, obtained endonuclease bamhi;
32) endonuclease bamhi expanded in advance successively and selective amplification, is carried out according to obtained selective amplification product
Parting obtains the methylation state of DNA of endonuclease bamhi according to genotyping result.
5. construction method according to claim 5, which is characterized in that the parting is to carry out the selective amplification product
Electrophoresis scores obtained electrophoretic band in binary-coded character matrix, indicates band missing with " 0 ", item is indicated with " 1 "
The presence of band;CNG (1,0) represents hemimethylation state, and CG (0,1) represents permethylated state, and (1,1) is represented without the shape that methylates
State, (0,0) represent unknown methylation state;Shown in the methylation level calculation formula such as formula (10) of full-length genome:
The DNA methylation of full-length genome is horizontal=and (site of hemimethylation state+permethylated state site+unknown methylates
The site of state)/(site of the site of hemimethylation state+permethylated state site+unknown methylation state+without methyl
The site of change state) formula (10).
6. construction method according to claim 1, which is characterized in that the quantity of the blade representative sample is 200 or more.
7. construction method according to claim 1, which is characterized in that calculation formula such as formula (11) institute of the leaf shape factor
Show:
8. a kind of prediction side of the leaf morphology feature and photosynthesis characteristics prediction model of the xylophyta based on DNA methylation level
Method, which is characterized in that with the DNA methylation water of the important endonuclease bamhi combination in claim 1~7 any one the method
It is predicted in the leaf morphology feature and photosynthesis characteristics model of flat input claim 1~7 any one the method structure leaf
The shape factor, leaf area and leaf Net Photosynthetic Rate;
The DNA methylation level of the important endonuclease bamhi combination is construction method described in claim 1~7 any one
In obtained important endonuclease bamhi combine the DNA methylation water being calculated according to the methylation level calculation formula of full-length genome
It is flat.
9. prediction technique according to claim 8, which is characterized in that the DNA methylation level of enzymes combinations to the blade profile factor,
The influence degree of leaf area and Net Photosynthetic Rate draws edge effect figure;
The method for drafting of the edge effect figure is the DNA methylation level that combines important endonuclease bamhi as input variable,
Then the function of the drafting edge effect figure carried in gbm is called to draw leaf shape of the more important endonuclease bamhi to each blade
The edge effect figure of the factor, leaf area and leaf Net Photosynthetic Rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810120969.0A CN108319984B (en) | 2018-02-06 | 2018-02-06 | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810120969.0A CN108319984B (en) | 2018-02-06 | 2018-02-06 | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108319984A true CN108319984A (en) | 2018-07-24 |
CN108319984B CN108319984B (en) | 2019-07-02 |
Family
ID=62903048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810120969.0A Expired - Fee Related CN108319984B (en) | 2018-02-06 | 2018-02-06 | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108319984B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109554502A (en) * | 2019-01-03 | 2019-04-02 | 北京林业大学 | A kind of detection DNA methylation site is to the method and its application technology of quantitative character additivity and disconnected partial allel |
CN111027612A (en) * | 2019-12-04 | 2020-04-17 | 国网天津市电力公司电力科学研究院 | Energy metering data feature reduction method and device based on weighted entropy FCM |
CN111915062A (en) * | 2020-07-08 | 2020-11-10 | 西北农林科技大学 | Greenhouse crop water demand regulation and control method with water utilization rate and photosynthetic rate coordinated |
CN112950571A (en) * | 2021-02-25 | 2021-06-11 | 中国科学院苏州生物医学工程技术研究所 | Method, device and equipment for establishing positive and negative classification model and computer storage medium |
WO2022023208A1 (en) * | 2020-07-30 | 2022-02-03 | Evonik Operations Gmbh | Dna-methylation-based quality control of the origin of organisms |
CN114814099A (en) * | 2022-04-25 | 2022-07-29 | 南京农业大学 | Photosynthesis prediction method based on grape leaf shape |
CN114885163A (en) * | 2018-09-02 | 2022-08-09 | Lg电子株式会社 | Method for encoding and decoding image signal and computer readable recording medium |
CN116153437A (en) * | 2023-04-19 | 2023-05-23 | 乐百氏(广东)饮用水有限公司 | Water quality safety evaluation and water quality prediction method and system for drinking water source |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102224247A (en) * | 2008-09-24 | 2011-10-19 | 巴斯夫植物科学有限公司 | Plants having enhanced yield-related traits and a method for making the same |
CN103233072A (en) * | 2013-05-06 | 2013-08-07 | 中国海洋大学 | High-flux mythelation detection technology for DNA (deoxyribonucleic acid) of complete genome |
CN104899474A (en) * | 2015-06-09 | 2015-09-09 | 大连三生科技发展有限公司 | Method and system for rectifying MB-seq methylation level based on ridge regression |
CN107025384A (en) * | 2015-10-15 | 2017-08-08 | 赵乐平 | A kind of construction method of complex data forecast model |
CN107114235A (en) * | 2017-04-10 | 2017-09-01 | 中国林业科学研究院林业研究所 | A kind of method that utilization DNA methylation inhibitor builds plant population |
CN107301330A (en) * | 2017-06-02 | 2017-10-27 | 西安电子科技大学 | A kind of method of utilization full-length genome data mining methylation patterns |
-
2018
- 2018-02-06 CN CN201810120969.0A patent/CN108319984B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102224247A (en) * | 2008-09-24 | 2011-10-19 | 巴斯夫植物科学有限公司 | Plants having enhanced yield-related traits and a method for making the same |
CN103233072A (en) * | 2013-05-06 | 2013-08-07 | 中国海洋大学 | High-flux mythelation detection technology for DNA (deoxyribonucleic acid) of complete genome |
CN104899474A (en) * | 2015-06-09 | 2015-09-09 | 大连三生科技发展有限公司 | Method and system for rectifying MB-seq methylation level based on ridge regression |
CN107025384A (en) * | 2015-10-15 | 2017-08-08 | 赵乐平 | A kind of construction method of complex data forecast model |
CN107114235A (en) * | 2017-04-10 | 2017-09-01 | 中国林业科学研究院林业研究所 | A kind of method that utilization DNA methylation inhibitor builds plant population |
CN107301330A (en) * | 2017-06-02 | 2017-10-27 | 西安电子科技大学 | A kind of method of utilization full-length genome data mining methylation patterns |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114885163B (en) * | 2018-09-02 | 2024-04-23 | Lg电子株式会社 | Method for encoding and decoding image signal and computer readable recording medium |
CN114885163A (en) * | 2018-09-02 | 2022-08-09 | Lg电子株式会社 | Method for encoding and decoding image signal and computer readable recording medium |
CN109554502A (en) * | 2019-01-03 | 2019-04-02 | 北京林业大学 | A kind of detection DNA methylation site is to the method and its application technology of quantitative character additivity and disconnected partial allel |
CN111027612A (en) * | 2019-12-04 | 2020-04-17 | 国网天津市电力公司电力科学研究院 | Energy metering data feature reduction method and device based on weighted entropy FCM |
CN111027612B (en) * | 2019-12-04 | 2024-01-30 | 国网天津市电力公司电力科学研究院 | Energy metering data feature reduction method and device based on weighted entropy FCM |
CN111915062B (en) * | 2020-07-08 | 2023-06-20 | 西北农林科技大学 | Greenhouse crop water demand regulation and control method with water utilization rate and photosynthesis rate being coordinated |
CN111915062A (en) * | 2020-07-08 | 2020-11-10 | 西北农林科技大学 | Greenhouse crop water demand regulation and control method with water utilization rate and photosynthetic rate coordinated |
WO2022023208A1 (en) * | 2020-07-30 | 2022-02-03 | Evonik Operations Gmbh | Dna-methylation-based quality control of the origin of organisms |
CN112950571A (en) * | 2021-02-25 | 2021-06-11 | 中国科学院苏州生物医学工程技术研究所 | Method, device and equipment for establishing positive and negative classification model and computer storage medium |
CN112950571B (en) * | 2021-02-25 | 2024-02-13 | 中国科学院苏州生物医学工程技术研究所 | Method, device, equipment and computer storage medium for establishing yin-yang classification model |
CN114814099B (en) * | 2022-04-25 | 2023-09-12 | 南京农业大学 | Photosynthesis prediction method based on grape leaf shape |
CN114814099A (en) * | 2022-04-25 | 2022-07-29 | 南京农业大学 | Photosynthesis prediction method based on grape leaf shape |
CN116153437A (en) * | 2023-04-19 | 2023-05-23 | 乐百氏(广东)饮用水有限公司 | Water quality safety evaluation and water quality prediction method and system for drinking water source |
Also Published As
Publication number | Publication date |
---|---|
CN108319984B (en) | 2019-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108319984A (en) | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level | |
Vincent et al. | Host associations and beta diversity of fungal endophyte communities in New Guinea rainforest trees | |
Lepais et al. | Species relative abundance and direction of introgression in oaks | |
Liorzou et al. | Nineteenth century French rose (Rosa sp.) germplasm shows a shift over time from a European to an Asian genetic background | |
Şakiroğlu et al. | Inferring population structure and genetic diversity of broad range of wild diploid alfalfa (Medicago sativa L.) accessions using SSR markers | |
CN106446600A (en) | CRISPR/Cas9-based sgRNA design method | |
Fayaz et al. | Genetic diversity and molecular characterization of Iranian durum wheat landraces (Triticum turgidum durum (Desf.) Husn.) using DArT markers | |
CN106755441B (en) | Method for performing forest multi-character polymerization breeding based on multi-character genome selection | |
CN109345089A (en) | Enterprise development state evaluating method and system based on big data | |
CN107278877A (en) | A kind of full-length genome selection and use method of corn seed-producing rate | |
CN109545278A (en) | A kind of method of plant identification lncRNA and interaction of genes | |
CN111243676B (en) | High-throughput sequencing data-based wilt disease onset prediction model and application | |
Duk et al. | The genetic landscape of fiber flax | |
Hong et al. | Genetic diversity and distinctness based on morphological and SSR markers in peanut | |
CN110564884B (en) | Method for excavating salix matsudana salt-tolerant pivot gene | |
Yardibi et al. | The trend of breeding value research in animal science: bibliometric analysis | |
CN107918725A (en) | A kind of DNA methylation Forecasting Methodology based on machine learning selection optimal characteristics | |
CN118216422B (en) | Phenotype assisted lemon breeding method based on deep learning | |
CN113584175A (en) | Group of molecular markers for evaluating renal papillary cell carcinoma progression risk and screening method and application thereof | |
CN110853711B (en) | Whole genome selection model for predicting fructose content of tobacco and application thereof | |
Wang et al. | Genetic diversity analysis and potential suitable habitat of Chuanminshen violaceum for climate change | |
CN105907860B (en) | It is a kind of to utilize | Δ (SNP-index) | carry out the QTL-seq method and its application of character positioning | |
CN110853710B (en) | Whole genome selection model for predicting starch content of tobacco and application thereof | |
Mao et al. | Species identification in the Rhododendron vernicosum–R. decorum species complex (Ericaceae) | |
Mugnai et al. | Camellia japonica L. genotypes identified by an artificial neural network based on phyllometric and fractal parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190702 Termination date: 20210206 |