[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106874704A - The sub- recognition methods of key regulatory in a kind of common regulated and control network of gene based on linear model - Google Patents

The sub- recognition methods of key regulatory in a kind of common regulated and control network of gene based on linear model Download PDF

Info

Publication number
CN106874704A
CN106874704A CN201710004254.4A CN201710004254A CN106874704A CN 106874704 A CN106874704 A CN 106874704A CN 201710004254 A CN201710004254 A CN 201710004254A CN 106874704 A CN106874704 A CN 106874704A
Authority
CN
China
Prior art keywords
gene
regulator
expression
linear model
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710004254.4A
Other languages
Chinese (zh)
Other versions
CN106874704B (en
Inventor
王伟胜
曾亚菲
骆嘉伟
刘智明
蔡洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201710004254.4A priority Critical patent/CN106874704B/en
Publication of CN106874704A publication Critical patent/CN106874704A/en
Application granted granted Critical
Publication of CN106874704B publication Critical patent/CN106874704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses the sub- recognition methods of key regulatory in a kind of gene based on linear model altogether regulated and control network, using gene expression profile data and gene regulation relation data, the identification of key regulatory in the common regulated and control network of gene is completed by building the expression of disease gene known to Linear Model for Prediction.The present invention realizes simple, key regulatory in the common regulated and control network of gene only just can need to be relatively accurately identified according to gene expression profile data and gene regulation relation, and being experimentally confirmed the regulator of identification has critically important biological meaning, there is important theory significance and practical value for the research of disease mechanisms.

Description

Method for identifying key regulators in gene co-regulation network based on linear model
Technical Field
The invention belongs to the field of computational biology, and relates to a method for identifying key regulators in a gene co-regulation network based on a linear model.
Background
In the post-genome era, understanding the functions of genes, non-coding RNAs, proteins and other related biomolecules suggests that the mechanism of realization of biological processes becomes one of the most important research targets in current computing system biology and bioinformatics. Among them, the study of gene regulation is a very important subject. Understanding the regulation mechanism of gene expression plays an important role in understanding the mechanisms of biological processes and disease development. In eukaryotes, there are two important classes of regulatory factors: transcription Factor (TF) and microRNA (miRNA) which regulate the expression level of a target gene at the transcription level and the post-transcription level respectively. Transcription factors are a class of proteins with specific functions that turn on the transcription process of a gene by binding to the promoter region of the gene. miRNA is a new gene regulatory element discovered in recent years, is an endogenous non-coding RNA with a regulatory function found in eukaryotes, and has a size of about 20-25 nucleotides. Transcription factors, mirnas, play important roles in the regulation of gene expression, which extends throughout a variety of biological activities and disease processes. On the basis, researches find that the transcription factors and the miRNA have wide interaction and cooperative regulation, and the transcription factors and the miRNA form a complex co-regulation network. The co-regulation network comprises transcription factor regulation miRNA, transcription factor regulation target gene, miRNA regulation transcription factor and regulation function of the target gene, and the regulation functions reflect each stage of the life process and function execution of cell molecules, so that the co-regulation network comprises more abundant biological information than a single network. Therefore, effective identification of key regulators on the co-regulatory network is important for clinical treatment of diseases and drug design, which may provide a new approach for treatment of human diseases.
With the rapid development of high-throughput technology, a large amount of genomics, transcriptomics, proteomics and other omics data are generated, and a new opportunity is provided for biomolecular function research. The previous identification algorithm for key points mainly focuses on the identification of key proteins on a protein interaction network. Evolutionary studies of transcriptional regulatory networks are more difficult than protein interaction networks. Firstly, credible transcription regulation network data is still difficult to obtain; secondly, in view of the existing transcription regulation network, due to the functional characteristics of the network, the presented topological characteristics are greatly different from the protein interaction network, and the presented topological characteristics of the regulation network are more complex due to the tropism of the regulation function. Thus, the recognition of key regulators on the regulatory network is more complex than the recognition of key proteins. In recent years, research on regulation and control networks is increasing, and there have been many methods for identifying key regulators on the regulation and control networks based on computation, mainly the following methods: based on information flow models (RWRs), ranking algorithms (PageRanking), constructing classifiers (SVM), Regularized least-squares classification, Bayesian networks, regression-based models, and the like. However, the existing methods have more or less some problems: such as inability to process large data, too high of a time complexity, accuracy to be improved, etc. In 2015, Alexandra and the like propose an MIPRIP method, a linear model is used for identifying key regulators on a regulation network, and experimental results show that the linear model-based method can effectively identify regulators with important biological significance. However, the method only considers the relation between the transcription factor and the gene, does not consider the interaction and cooperative regulation relation between regulators in the co-regulation network, and simultaneously, the identification precision is also to be improved.
Therefore, there is a need to design a method for identifying key regulators in a gene co-regulation network based on a linear model.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for identifying key regulators in a gene co-regulation network based on a linear model. The method for identifying the key regulators in the gene co-regulation network based on the linear model can identify the key regulators with biological significance in the gene co-regulation network more accurately only according to gene expression profile data and gene regulation relation.
The technical solution of the invention is as follows:
a method for identifying key regulators in a gene co-regulation network based on a linear model comprises the following steps:
step 1) constructing a gene co-regulation network:
inputting gene expression profile data, gene regulation relationship and Protein interaction data (PPI), filtering out action relationship pairs without expression profile data nodes, and establishing a gene co-regulation network (GCN), wherein the GCN comprises three nodes: regulator miRNA (microRNA), regulator TF and gene, wherein action edges exist among nodes: miRNA-gene, TF-gene and gene-gene;
if any two points in the gene co-regulation network GCN have an action relation, the edge weight is 1, otherwise, the edge weight is 0;
step 2) respectively calculating activity values of a regulator miRNA, a regulator TF and adjacent genes of known disease genes;
activity values, i.e., the influence values of miRNA, TF, and adjacent genes on known disease genes;
step 3) in the constructed gene co-regulation network GCN, constructing a linear model by using gene expression profile data and activity values of the regulator and the adjacent genes obtained in the step 2), predicting the expression of the known disease genes, and obtaining the predicted expression value of the known disease genes;
and 4) converting the linear model constructed in the step 3) into an optimization problem according to the minimization of the difference between the predicted expression value and the real expression value of the known disease gene, solving the optimization problem based on the mixed integer linear programming idea, and finally identifying a key regulator in the gene co-regulation network.
Further, the linear model expression constructed for predicting the expression of known disease genes is as follows:
wherein i represents a known disease gene, m, t and g represent a regulator miRNA, a regulator TF and a neighboring gene of the known disease gene i respectively;
g′i,srepresenting the predicted expression value of the known disease Gene i in sample s, β0Additional weights (additive offset) for linear models, M, T, G for miRNA set, TF set, gene set, βm、βt、βgRespectively representing the optimization parameters of m, t and g, and directly calculating by using an optimizer during the optimization problem processing in the step 4);
esm,i、tst,i、gsg,irespectively representing the action side weights of m, t, g and i, and taking the value as 0 or 1;
actm,s、actt,s、actg,srespectively representing the activity values of m, t and g in a sample s;
the sample s refers to data of a certain observed individual with a known disease.
Further, said minimizing the difference between the predicted expression value and the true expression value of the gene transforms the linear model into an optimization problem, expressed as:
wherein, gi,s、g′i,sRespectively representing the real expression value and the predicted expression value of the disease gene i in a sample S, and respectively representing a known disease gene set and a total sample set of the disease by O and S;
solving the optimization problem by adopting a Gurobi optimizer, recording the times of each regulator selected by the optimizer in the process of solving the optimization problem, ranking all regulators according to the selection times, and taking the regulator with the rank of 50 as a final candidate regulator.
After the Gurobi optimizer is installed, the Gurobi function can be directly called to perform optimization problem processing only by introducing a Gurobi package into the R language, and the Gurobi function has three input parameters: the optimization model is obtained by converting the constructed linear model into an optimization problem by minimizing the difference between the predicted expression value and the actual expression value of the known disease gene of the constructed linear model, wherein the optimization model is the timeLimit and the OutputFlag, the timeLimit generally takes the value of 600, and the OutputFlag takes the default value of 0. To obtain a series of models of typically different sizes, a linear model was constructed by constraining the number of regulators of the gene. For each known disease gene, the number of regulators is set to 1 to k respectively to construct a linear model.
Further, the activity values of the regulator miRNA, the regulator TF and the adjacent gene are calculated by the following two methods, respectively:
1) calculating the activity values of the regulator miRNA and the regulator TF:
first, the reference expression values of all target genes of the regulator r are calculated:
wherein r represents a regulator, namely a regulator miRNA or a regulator TF;target Gene g representing regulator rtThe reference expression value of (a) is gene gtThe average of the expression values in all samples where the expression level of regulator r tended to 0; e (r) ->0 indicates that the expression level of the regulator r tends to 0;
the reference expression value of the target gene refers to the expression value of the target gene when no regulation effect is exerted;
secondly, calculating the difference between the reference expression value of the target gene and the real expression value after the influence of the regulator, namely the expression level change value of the target geneComprises the following steps:
wherein, ygt,sTarget Gene g representing regulator rtThe true expression value in the sample s,target Gene g representing regulator rtA change in expression level of;
thirdly, a simple linear model is constructed according to the expression level change value of the target gene, and the activity value act of the regulon is solvedr,s
Wherein G' represents a target gene set of a regulator r,respectively representing the sum of the expression level change values of the target gene set of the regulon r and the sum of the reference expression values;
3) calculating the activity value of the adjacent genes, and solving by adopting the cumulative effect based on the expression influence of the adjacent genes on all action genes, namely:
wherein N represents the total number of genes in the sample s, gsg,iRepresenting the weight of the action of the gene g with respect to the gene i in the sample s, gi,sRepresents the expression value of the gene i in the sample s, which is the data of a certain observed individual with known diseases.
Further, after normalizing the activity values of the regulon and the adjacent gene obtained in the step 2), the activity values are used for constructing a linear model in the step 3).
Advantageous effects
The invention provides a method (co-BOTLM) for identifying key regulators in a gene co-regulation network based on a linear model, which utilizes gene expression profile data and gene regulation relation to predict the expression of known disease genes by constructing the linear model to complete the identification of the key regulators in the gene co-regulation network.
Compared with the existing method for identifying the key regulators based on the linear model, the co-BOTLM method has the following advantages:
1) the method is applied to a co-regulation network, the co-regulation network contains richer biological information than a single network, so that the identified regulators have more important biological significance;
2) adding protein interaction data (PPI information) taking into account that the expression of a gene may be affected by a neighboring gene;
3) and a new method is introduced to calculate the activity values of the regulator and the adjacent genes, so that the accuracy of the cancer gene expression prediction is effectively improved. The method is simple to realize, and the key regulators in the gene co-regulation network can be accurately identified only according to the gene expression profile data and the gene regulation relationship.
Experiments prove that the co-BOTLM can effectively identify key regulators in a gene co-regulation network, and the identified key regulators have important biological significance. Meanwhile, compared with other methods, the accuracy is improved. The specific experimental result chart is compared and analyzed in detail in the examples.
Drawings
FIG. 1 is a flow chart of the co-BOTLM of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the following figures and specific examples:
example 1:
identification model of key regulators in gene co-regulation network based on linear model
The invention defines key regulators in a gene co-regulatory network as: by utilizing gene expression profile data and gene regulation relation, the expression of known disease genes is predicted by constructing a linear model, so that regulators seriously influencing the disease gene expression in a co-regulation network are identified.
To clearly describe the model of key regulator recognition in a linear model-based gene co-regulatory network, the inventors defined the correlation of this model as follows:
the proposed construction of a linear model to predict the expression of known disease genes is as follows:
the key regulator recognition model in a linear model-based gene co-regulatory network aims at recognizing regulators that seriously affect the expression of disease genes in the co-regulatory network. The identification of key nodes in the gene co-regulation network is completed by constructing a linear model by utilizing gene expression profile data and a gene regulation relation to predict the expression of known disease genes.
The whole process of the key regulator identification method in the gene co-regulation network based on the linear model is shown in figure 1. Firstly, inputting gene expression profile data, gene regulation relation and PPI data. The method co-BOTLM can be divided into 4 sub-processes:
1) constructing a gene co-regulation network;
2) considering that the expression of the gene may be affected by the regulator and the adjacent gene, activity values of miRNA, TF and the adjacent gene (i.e., influence values of miRNA, TF and the adjacent gene on the known disease gene) are calculated for the known disease gene, respectively;
3) constructing a linear model by using the expression profile data of the genes in the obtained gene co-regulation network, and predicting the expression of the genes with known diseases;
4) converting the linear model into an optimization problem according to the minimization of the difference between the gene prediction expression value and the real expression value, solving the optimization problem based on a mixed integer linear programming idea (MILP), finally identifying a key regulator in the gene co-regulation network, and ending the whole identification process;
solving the optimization problem by adopting a Gurobi optimizer, recording the times of each regulator selected by the optimizer in the process of solving the optimization problem, ranking all regulators according to the selection times, and taking the regulator with the rank of 50 as a final candidate regulator.
After the Gurobi optimizer is installed, the Gurobi function can be directly called to perform optimization problem processing only by introducing a Gurobi package into the R language, and the Gurobi function has three input parameters: the optimization model comprises an optimization model, timeLimit and OutputFlag, wherein the timeLimit generally takes a value of 600, the OutputFlag takes a default value of 0, and the optimization model is obtained by converting a constructed linear model into an optimization problem by minimizing the difference between the predicted expression value and the real expression value of the known disease gene. To obtain a series of models of typically different sizes, a linear model was constructed by constraining the number of regulators of the gene. For each known disease gene, the number of regulators is set to 1 to k respectively to construct a linear model. In this example, k is 5 (after many experiments, when k is 5, the experiment effect is optimal).
Validity verification method of key regulator identification method in gene co-regulation network based on linear model
To verify the effectiveness of the co-BOTLM method, the co-BOTLM method was applied to a set of ovarian cancer data sets. The experimental data set included: ovarian cancer sample data, gene regulatory relationships, PPI data, known ovarian cancer-associated disease genes. The ovarian cancer sample data is downloaded from a TCGA database, 385 samples are obtained in total, and an ovarian cancer expression profile data set containing 559 miRNA and 12456 genes is obtained by filtering genes with undersized absolute values of expression values or no obvious differential expression in each sample. Action relation data include miRNA-gene, TF-gene and PPI data, which are respectively downloaded from MicroCosm website, ENCODE database and Biogrid database. By mapping the ovarian cancer expression profile data set and the action relation with each other, a miRNA-TF gene co-regulation network is finally constructed, wherein the network comprises three types of nodes: 12381 genes, 559 miRNAs and 75 TF, the functional relationship existing between the nodes: 59660 for gene-gene, 241722 for miRNA-gene and 9877 for TF-gene. For known ovarian cancer related disease genes, 379 genes are downloaded from the DDOC database, and the disease genes without expression profile data or regulation and control relationship are filtered out, and finally, 123 genes are remained.
In the example, a three-fold cross validation experiment is performed, the prediction precision of the co-BOTLM method is compared with that of the MIPRIP method proposed by Alexandra et al, a Pearson correlation coefficient PCC is used for calculating the similarity between disease gene expression data predicted by the co-BOTLM method and real expression data, the higher the PCC value is, the higher the similarity is, and the higher the accuracy of a linear model constructed by the co-BOTLM method is, so that the precision of the experiment result is higher. The PCC values in the examples are calculated using the cor function in the R language. Meanwhile, in the example, characteristic and functional enrichment analysis is also carried out on the regulons identified by the co-BOTLM method.
1. Analyzing experimental results and verifying algorithm effectiveness
Table 1: first-20 ranked regulators in miRNA-TF gene co-regulation network
No. Identified key regulators Number of target genes Number of optimizer selections
1 hsa-mir-106a* 377 50
2 hsa-mir-586 508 43
3 hsa-mir-423-5p 496 38
4 hsa-mir-515-3p 512 34
5 hsa-mir-181a-2* 496 34
6 hsa-mir-768-3p 530 32
7 hsa-mir-663 480 32
8 hsa-mir-539 382 31
9 hsa-mir-206 477 30
10 hsa-mir-509-3p 552 30
11 hsa-mir-362-3p 512 25
12 hsa-mir-378* 519 24
13 hsa-mir-520c-3p 566 24
14 hsa-mir-33a 523 24
15 hsa-mir-29a* 495 23
16 hsa-mir-193a-3p 496 23
17 hsa-mir-601 484 23
18 FOXA2 169 23
19 hsa-mir-26b 466 22
20 hsa-mir-30b 541 22
In the example, after the three-fold cross validation experiment, the average PPC value is finally obtained to be 0.535, which shows that the gene expression value predicted by the linear model in the invention has higher similarity with the real expression value, so that the accuracy of the linear model constructed by the co-BOTLM method is higher, and the key regulators in the network can be effectively identified. After the experiment is finished, ranking all regulators according to the selection times of the optimizer to the regulators, and taking the first 50 regulators as candidate key regulators in the example. In table 1 above, the top 20 regulators are listed, and it can be seen that none of the genes regulated by any of the regulators other than FOXA2 is less than 300, and many of them have been confirmed to be associated with ovarian cancer. Because of the too little TF experimental data, the FOXA2 target gene is less. This indicates that the identified regulators have a role in the co-regulatory network of ovarian cancer genes, which may be related to the expression of a large number of genes, including known genes of ovarian cancer disease, and thus have a critical role in the co-regulatory network.
2. Method co-BOTLM is compared with MIPRIP method experiment, and accuracy of algorithm is verified
Table 2: PCC value of method MIPRIP experimental result
No. 1 2 3 4 5
1 0.3329907 0.4312150 0.4436449 0.4731776 0.4893458
2 0.3195237 0.4221495 0.4500000 0.4687850 0.4851402
3 0.3214019 0.4341121 0.4571028 0.4768224 0.4916822
Note: 1-3: represents a three-fold cross validation experiment, 1-5: expressing the number k value of regulators for constructing linear model
Table 3: PCC values of Process co-BOTLM experiment results
No. 1 2 3 4 5
1 0.5018750 0.5709821 0.5940179 0.6112500 0.6227679
2 0.4858036 0.5575893 0.5869643 0.6025893 0.6164286
3 0.4956250 0.5518750 0.5691964 0.5918750 0.6059821
The MIPRIP method and the co-bollm method of the present invention are both based on linear models to identify key regulators of specific diseases, however, there are three differences: 1) the MIPRIP method is applied to a regulation network, the co-BOTLM method is applied to a co-regulation network, and transcription factors and miRNA have wide interaction and cooperative regulation, so that the co-regulation network contains richer biological information than a single network; 2) for factors affecting the expression of disease genes, the co-BOTLM method also considers the possible effect of adjacent genes on the factors, except transcription factors and miRNA; 3) the MIPRIP method is different from the co-BOTLM method in the calculation mode of the activity values of the transcription factors and the miRNA. Since the MIPRIP method is applied to the regulation and control of a network, without considering the co-regulation relationship in the network, the transcription factor is regarded as a common gene in the present example when a comparative laboratory is performed. Tables 2 and 3 show the PCC values obtained from the experimental results of the MIPRIP method and the co-BOTLM method, respectively, and it is obvious from the tables that the co-BOTLM method obtains higher PCC values, the average PCC value is 0.571, and the average PCC value of the MIPRIP method is 0.433. Obviously, the gene expression value predicted by the co-BOTLM method has higher similarity with the real expression value, so that the experiment indirectly shows that the co-BOTLM method has higher precision and the reliability of the identified key regulator is higher.
3. Experiment result function enrichment analysis, and result validity verification
Table 4: top 10 regulator GO enrichment analysis
An Ncellular component assembly: regulon ranking, enriched GO terms: GO terms ranked 3 top by P-value (smaller is better), GO number: number of GO terms P-value <0.05, P-value: <0.05 indicated high enrichment.
Table 5: top 10 regulator KEGG pathway enrichment analysis
No.: regulator ranking, enriched KEGG pathway: and (3) KEGG channels ranked in the top 3 according to P-value (the smaller the better), the number of KEGG: number of KEGG of P-value <0.05, P-value: <0.05 indicated high enrichment.
In order to verify that the key regulators identified by the co-BOTLM method in the invention are biologically significant, in this example, GOstats in the R language is used to perform GO enrichment analysis and KEGG pathway enrichment analysis on the identified key regulators respectively. Table 4 and table 5 show the GO and KEGG pathway enrichment analysis results for the top 10 regulators, respectively.
It is clear from table 4 that most of the top 10 regulators identified by the co-BOTLM method in the present invention are enriched with more than 300 GO terms, wherein the more frequently enriched GO terms are: cellular components, cellular processes, cell death, negative regulation of dentriticcell differentiation, and the like, indicate that the identified regulators participate in a large number of cell-related life processes. The number of GO terms enriched by hsa-mir-515-3p and hsa-mir-768-3p is less than 100, probably because the target genes of the two miRNAs are less matched with the GOstats library, and Jiang et al have demonstrated in 2016 that hsa-mir-768-3p has a potential prognostic function in ovarian cancer because of its down-regulation linked to MEK/ERK-mediated enhancement in protein synthesis in melanoma cells. Similarly, it is evident from table 5 that most of the top 10 regulators are enriched in at least 5 KEGG pathways, among which the more frequently enriched biological processes are: the fact that the identified regulators are involved in a large number of cancers and signaling pathways and have close relationship with the cancers is shown by the fact that the regulators identified in the conservation, the pathway in the cancer, the signaling pathway, the ErbB signaling pathway and the like. In conclusion, it is well demonstrated that experimentally identified regulators are involved in a large number of biological processes, especially those associated with cellular activity and cancer, and thus are of great biological interest.

Claims (5)

1. A method for identifying key regulators in a gene co-regulation network based on a linear model is characterized by comprising the following steps:
step 1) constructing a gene co-regulation network:
inputting gene expression profile data, gene regulation relation and protein interaction data, filtering action relation pairs without expression profile data nodes, establishing a gene co-regulation network GCN, wherein the gene co-regulation network GCN comprises three nodes in common: regulator miRNA, regulator TF and gene, there are action edges between the nodes: miRNA-gene, TF-gene and gene-gene;
if any two points in the gene co-regulation network GCN have an action relation, the edge weight is 1, otherwise, the edge weight is 0;
step 2) respectively calculating activity values of a regulator miRNA, a regulator TF and adjacent genes of known disease genes;
step 3) in the constructed gene co-regulation network GCN, constructing a linear model by using gene expression profile data and activity values of the regulator and the adjacent genes obtained in the step 2), predicting the expression of the known disease genes, and obtaining the predicted expression value of the known disease genes;
and 4) converting the linear model constructed in the step 3) into an optimization problem according to the minimization of the difference between the predicted expression value and the real expression value of the known disease gene, solving the optimization problem based on the mixed integer linear programming idea, and finally identifying a key regulator in the gene co-regulation network.
2. The method for identifying key regulators in a linear model-based gene co-regulation network according to claim 1, wherein the linear model expression constructed for predicting the expression of known disease genes is as follows:
g i , s &prime; = &beta; 0 + &Sigma; m = 0 M &beta; m * es m , i * act m , s + &Sigma; t = 1 T &beta; t * ts t , i * act t , s + &Sigma; g = 1 G &beta; g * gs g , i * act g , s
wherein i represents a known disease gene, m, t and g represent a regulator miRNA, a regulator TF and a neighboring gene of the known disease gene i respectively;
g′i,srepresenting the predicted expression value of the known disease Gene i in sample s, β0Additional weight of linear model M, T, G represents miRNA set, TF set and gene set, βm、βt、βgRespectively representing the optimization parameters of m, t and g, and directly calculating by using an optimizer during the optimization problem processing in the step 4);
esm,i、tst,i、gsg,irespectively representing the action side weights of m, t, g and i, and taking the value as 0 or 1;
actm,s、actt,s、actg,srespectively representing the activity values of m, t and g in a sample s;
the sample s refers to data of a certain observed individual with a known disease.
3. The method for identifying key regulators in a linear model-based gene co-regulation network according to claim 2, wherein the linear model is transformed into an optimization problem according to minimization of the difference between the predicted expression value and the true expression value of the gene, which is expressed as:
wherein, gi,s、g′i,sRespectively representing the real expression value and the predicted expression value of the disease gene i in a sample S, and respectively representing a known disease gene set and a total sample set of the disease by O and S;
solving the optimization problem by adopting a Gurobi optimizer, recording the times of each regulator selected by the optimizer in the process of solving the optimization problem, ranking all regulators according to the selection times, and taking the regulator with the rank of 50 as a final candidate regulator.
4. The method for identifying key regulators in a linear model-based gene co-regulation network according to any one of claims 1-3, wherein the activity values of the regulator miRNA, the regulator TF and the adjacent genes are calculated by the following two methods respectively:
1) calculating the activity values of the regulator miRNA and the regulator TF:
first, the reference expression values of all target genes of the regulator r are calculated:
y r , g t b = E ( y r , g t | e ( r ) - > 0 )
wherein r represents a regulator, namely a regulator miRNA or a regulator TF;target Gene g representing regulator rtThe reference expression value of (a) is gene gtThe average of the expression values in all samples where the expression level of regulator r tended to 0; e (r) ->0 indicates that the expression level of the regulator r tends to 0;
secondly, calculating the difference between the reference expression value of the target gene and the real expression value after the influence of the regulator, namely the expression level change value of the target geneComprises the following steps:
wherein,target Gene g representing regulator rtThe true expression value in the sample s,target Gene g representing regulator rtA change in expression level of;
thirdly, a simple linear model is constructed according to the expression level change value of the target gene, and the activity value act of the regulon is solvedr,s
Wherein G' represents a target gene set of a regulator r,respectively representing the sum of the expression level change values of the target gene set of the regulon r and the sum of the reference expression values;
2) calculating the activity value of the adjacent genes, and solving by adopting the cumulative effect based on the expression influence of the adjacent genes on all action genes, namely:
wherein N represents the total number of genes in the sample s, gsg,iRepresenting the weight of the action of the gene g with respect to the gene i in the sample s, gi,sRepresents the expression value of the gene i in the sample s, which is the data of a certain observed individual with known diseases.
5. The method for identifying key regulators in the linear model-based gene co-regulation network according to claim 4, wherein the activity values of the regulators and adjacent genes obtained in the step 2) are normalized and then used for constructing the linear model in the step 3).
CN201710004254.4A 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network Active CN106874704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710004254.4A CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710004254.4A CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Publications (2)

Publication Number Publication Date
CN106874704A true CN106874704A (en) 2017-06-20
CN106874704B CN106874704B (en) 2019-02-19

Family

ID=59164588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710004254.4A Active CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Country Status (1)

Country Link
CN (1) CN106874704B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391962A (en) * 2017-09-05 2017-11-24 武汉古奥基因科技有限公司 The method of gene or site to disease regulation relationship is analysed based on multigroup credit
CN107679367A (en) * 2017-09-20 2018-02-09 湖南大学 A kind of common regulated and control network functional module recognition methods and system based on the network node degree of association
CN109308934A (en) * 2018-08-20 2019-02-05 唐山照澜海洋科技有限公司 A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm
CN111304200A (en) * 2020-02-11 2020-06-19 山东大学 CeRNA (cellular ribonucleic acid) regulation and control network for regulating and controlling osteointegration around rat implant with hyperlipidemia and application of network
CN111613268A (en) * 2020-05-27 2020-09-01 中山大学 Method for determining gene expression regulation mechanism based on single cell transcriptome data
CN111833964A (en) * 2020-06-24 2020-10-27 华中农业大学 Method for mining superior locus of Bayesian network optimized by integer linear programming
CN112102876A (en) * 2020-09-27 2020-12-18 西安交通大学 Method for automatically modeling gene circuit and transcription regulation and control relation
CN115798600A (en) * 2023-02-03 2023-03-14 北京灵迅医药科技有限公司 Genome data analysis method, apparatus, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030104463A1 (en) * 2001-12-03 2003-06-05 Siemens Aktiengesellschaft Identification of pharmaceutical targets
CN101719194A (en) * 2009-12-03 2010-06-02 上海大学 Artificial gene regulatory network simulation method
CN101719195A (en) * 2009-12-03 2010-06-02 上海大学 Inference method of stepwise regression gene regulatory network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030104463A1 (en) * 2001-12-03 2003-06-05 Siemens Aktiengesellschaft Identification of pharmaceutical targets
CN101719194A (en) * 2009-12-03 2010-06-02 上海大学 Artificial gene regulatory network simulation method
CN101719195A (en) * 2009-12-03 2010-06-02 上海大学 Inference method of stepwise regression gene regulatory network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YING LIN等: "Transcription factor and miRNA", 《SCIENTIFIC REPORTS》 *
许艳等: "整合分析基因表达与拷贝数变异识别癌症的驱动基因及调控子miRNAs", 《现代生物医学进展》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391962A (en) * 2017-09-05 2017-11-24 武汉古奥基因科技有限公司 The method of gene or site to disease regulation relationship is analysed based on multigroup credit
CN107679367A (en) * 2017-09-20 2018-02-09 湖南大学 A kind of common regulated and control network functional module recognition methods and system based on the network node degree of association
CN107679367B (en) * 2017-09-20 2020-02-21 湖南大学 Method and system for identifying co-regulation network function module based on network node association degree
CN109308934A (en) * 2018-08-20 2019-02-05 唐山照澜海洋科技有限公司 A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm
CN111304200A (en) * 2020-02-11 2020-06-19 山东大学 CeRNA (cellular ribonucleic acid) regulation and control network for regulating and controlling osteointegration around rat implant with hyperlipidemia and application of network
CN111304200B (en) * 2020-02-11 2022-04-15 山东大学 CeRNA (cellular ribonucleic acid) regulation and control network for regulating and controlling osteointegration around rat implant with hyperlipidemia and application of network
CN111613268A (en) * 2020-05-27 2020-09-01 中山大学 Method for determining gene expression regulation mechanism based on single cell transcriptome data
CN111613268B (en) * 2020-05-27 2023-02-24 中山大学 Method for determining gene expression regulation mechanism based on single cell transcriptome data
CN111833964A (en) * 2020-06-24 2020-10-27 华中农业大学 Method for mining superior locus of Bayesian network optimized by integer linear programming
CN112102876A (en) * 2020-09-27 2020-12-18 西安交通大学 Method for automatically modeling gene circuit and transcription regulation and control relation
CN115798600A (en) * 2023-02-03 2023-03-14 北京灵迅医药科技有限公司 Genome data analysis method, apparatus, device and storage medium

Also Published As

Publication number Publication date
CN106874704B (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN106874704B (en) A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network
US20210397995A1 (en) Systems and methods relating to network-based biomarker signatures
Jelizarow et al. Over-optimism in bioinformatics: an illustration
JP6407242B2 (en) System and method for network-based biological activity assessment
CN110459264B (en) Method for predicting relevance of circular RNA and diseases based on gradient enhanced decision tree
Kim et al. Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization
CN106295246A (en) Find the lncRNA relevant to tumor and predict its function
CN111933212A (en) Clinical omics data processing method and device based on machine learning
CN107679367B (en) Method and system for identifying co-regulation network function module based on network node association degree
CN105808976A (en) Recommendation model based miRNA target gene prediction method
Zheng et al. An adaptive sparse subspace clustering for cell type identification
Zhong et al. scGET: predicting cell fate transition during early embryonic development by single-cell graph entropy
CN108427865B (en) Method for predicting correlation between LncRNA and environmental factors
Tran et al. scREMOTE: Using multimodal single cell data to predict regulatory gene relationships and to build a computational cell reprogramming model
Kalyakulina et al. Disease classification for whole-blood DNA methylation: meta-analysis, missing values imputation, and XAI
CN109712717A (en) A kind of cancer correlation MicroRNA recognition methods based on miRNA- gene regulation module
Chu et al. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis
Sarkar et al. Identification of miRNA biomarkers for diverse cancer types using statistical learning methods at the whole-genome scale
Gonçalves et al. Regulatory snapshots: integrative mining of regulatory modules from expression time series and regulatory networks
Reddy et al. Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization
Liu et al. Towards key genes identification for breast cancer survival risk with neural network models
Liu et al. miRNA-disease associations prediction based on neural tensor decomposition
KR20170017231A (en) METHOD OF ACCESS TO IDENTIFYING GENE-microRNA MODULES IN CANCER
Ceddia et al. Network modeling and analysis of normal and cancer gene expression data
Zhang et al. Finding disagreement pathway signatures and constructing an ensemble model for cancer classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant