Definition of the Subject
Data integration and model building have become essential activities in biological research as technological advancements continue to empower themeasurement of biological data of increasing diversity and scale. High‐throughput technologies provide a wealth of global data sets(e. g. genomics, transcriptomics, proteomics, metabolomics), and the challenge becomes how to integrate this data to maximize the amount of usefulbiological information that can be extracted. Integrating biological data is important and challenging because of the nature of biology. Biologicalsystems have evolved over the course of billions of years, and in that time biological mechanisms have become very diverse, with molecular machines ofintricate detail. Thus, while there are certainly great general scientific principles to be distilled – such as the foundational evolutionarytheory – much of biology is found in the details of these evolved systems. This emphasis on the details of systems...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- Constraint-based analysis:
-
A modeling framework based on excluding infeasible network states via environmental, physicochemical, and regulatory constraints to improve predictions of achievable cellular states and behavior.
- Data space:
-
Multidimensional space containing all possible states of a system; this space can be reduced using defined constraints.
- Interaction network:
-
A graph where the nodes represent biomolecules (e. g. genes) and the edges represent defined interactions between the nodes, whether they be direct physical interactions (e. g. protein–protein binding, protein–DNA binding) or functional relationships (e. g. synthetic lethality).
- Biochemical reaction network:
-
Collection of metabolic, signaling, or regulatory chemical reactions described in stoichiometric detail.
- Statistical inferrence network:
-
A network model designed from statistical inference from large-scale biological data sets to be quantitatively predictive for novel perturbations and/or environmental conditions.
- Genome:
-
The complete DNA nucleotide sequence in all chromosomes of an organism.
- Transcriptome:
-
The complete set of RNA transcripts produced from an organism's genome under a particular set of conditions.
- Proteome:
-
The complete set of expressed proteins produced by the genome.
- Metabolome:
-
The complete set of small molecules which are the intermediates and products of an organism's metabolism.
- Boolean network:
-
A set of N discrete‐valued variables, \( { \sigma _1 ,\sigma _2 , \dots ,\sigma _N } \) where \( { \sigma _n \in \{ {0,1} \} } \). To each node a set of \( { k_n } \) nodes, \( { \sigma _{n_1} ,\sigma _{n_2} , \dots ,\sigma _{n_{k_n} } } \) is assigned, which controls the value of \( { \sigma _n } \) through the equation \( \sigma _n (t+1)=f_n (\sigma _{n_1} (t), \dots ,\sigma _{n_{k_n} } (t)) \). In the case of Boolean networks, the functions \( { f_n } \) can be chosen from the ensemble of all possible Boolean functions.
Bibliography
Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol 223(1):1–18
Alm E, Arkin AP (2003) Biological networks. Curr Opin Struct Biol 13(2):193–202
Almaas E, Kovacs B et al (2004) Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427(6977):839–43
Basso K, Margolin AA et al (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382–90
Beard DA, Babson E et al (2004) Thermodynamic constraints for biochemical networks. J Theor Biol 228(3):327–33
Beard DA, Liang SD et al (2002) Energy balance for analysis of complex metabolic networks. Biophys J 83(1):79–86
Bonneau R, Reiss DJ et al (2006) The Inferelator: an algorithm for learning parsimonious regulatory networks from systems‐biology data sets de novo. Genome Biol 7(5):R36
Burgard AP, Pharkya P et al (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84(6):647–57
Christopher R, Dhiman A et al (2004) Data-driven computer simulation of human cancer cell. Ann NY Acad Sci 1020:132–53
Cohen JE (2004) Mathematics is biology's next microscope, only better; biology is mathematics' next physics, only better. PLoS Biol 2(12):e439
Covert MW, Knight EM et al (2004) Integrating high‐throughput and computational data elucidates bacterial networks. Nature 429(6987):92–6
Covert MW, Leung TH et al (2005) Achieving stability of lipopolysaccharide‐induced NF-kappaB activation. Science 309(5742):1854–1857
Deshpande N, Addess KJ et al (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res (Database issue) 33:D233–7
Duarte NC, Becker SA et al (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci USA 104(6):1777–82
Duarte NC, Herrgard MJ et al (2004) Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res 14(7):1298–309
Edwards JS, Ibarra RU et al (2001) In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 19(2):125–30
Edwards JS, Palsson BO (2000) Robustness analysis of the Escherichia coli metabolic network. Biotechnol Prog 16(6):927–39
Faith JJ, Hayete B et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5(1):e8
Famili I, Forster J et al (2003) Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc Natl Acad Sci USA 100(23):13134–13139
Faure A, Naldi A et al (2006) Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics 22(14):e124–e131
Forster J, Famili I et al (2003) Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae. Omics 7(2):193–202
Francke C, Siezen RJ et al (2005) Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol 13(11):550–558
Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science 303(5659):799–805
Gianchandani EP, Papin JA et al (2006) Matrix formalism to describe functional states of transcriptional regulatory systems. PLoS Comput Biol 2(8):e101
Han JD, Bertin N et al (2004) Evidence for dynamically organized modularity in the yeast protein‐protein interaction network. Nature 430(6995):88–93
Hashimoto RF, Kim S et al (2004) Growing genetic regulatory networks from seed genes. Bioinformatics 20(8):1241–7
Heinemann M, Kummel A et al (2005) In silico genome-scale reconstruction and validation of the Staphylococcus aureus metabolic network. Biotechnol Bioeng 92(7):850–864
Hendriks BS, Wiley HS et al (2003) HER2-mediated effects on EGFR endosomal sorting: analysis of biophysical mechanisms. Biophys J 85(4):2732–2745
Herrgard MJ, Palsson BO (2005) Untangling the web of functional and physical interactions in yeast. J Biol 4(2):5
Hoffmann A, Levchenko A et al (2002) The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science 298(5596):1241–1245
Hood L, Heath JR et al (2004) Systems biology and new technologies enable predictive and preventative medicine. Science 306(5696):640–643
Hua Q, Joyce AR et al (2006) Metabolic analysis of adaptive evolution for in silico‐designed lactate‐producing strains. Biotechnol Bioeng 95(5):992–1002
Hwang D, Rust AG et al (2005) A data integration methodology for systems biology. Proc Natl Acad Sci USA 102(48):17296–17301
Hwang D, Smith JJ et al (2005) A data integration methodology forsystems biology: experimental verification. Proc Natl Acad Sci USA 102(48):17302–17307
Ibarra RU, Edwards JS et al (2002) Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420(6912):186–189
Ideker T (2004) A systems approach to discovering signaling and regulatory pathways–or, how to digest large interaction networks into relevant pieces. Adv Exp Med Biol 547:21–30
Ideker T, Galitski T et al (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372
Ideker T, Ozier O et al (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics (Suppl) 18(1):S233–S2340
Jamshidi N, Edwards JS et al (2001) Dynamic simulation of the human red blood cell metabolic network. Bioinformatics 17(3):286–287
Kauffman SA (1993) The origins of order : self organization and selection in evolution. Oxford University Press, New York
Kelley BP, Yuan B et al (2004) PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res (Web Server issue) 32:W83–W88
Kim SY, Imoto S et al (2003) Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform 4(3):228–235
Kirschner MW (2005) The meaning of systems biology. Cell 121(4):503–504
Kitano H (2002) Computational systems biology. Nature 420(6912):206–210
Kurzweil R (2005) The singularity is near: when humans transcend biology. Penguin, London
Lahdesmaki H, Hautaniemi S et al (2006) Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing 86(4):814–834
Lahdesmaki H, Shmulevich I et al (2003) On Learning Gene Regulatory Networks Under the Boolean Network Model. Machine Learning 52(1–2):147–167
Levy S, Sutton G et al (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biol 5(10):e254
Li F, Long T et al (2004) The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA 101(14):4781–4786
Li H, Zhan M (2006) Systematic intervention of transcription for identifying network response to disease and cellular phenotypes. Bioinformatics 22(1):96–102
Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5(4):264–276
Margolin AA, Wang K et al (2006) Reverse engineering cellular networks. Nat Protoc 1(2):662–671
Mulquiney PJ, Kuchel PW (2003) Modelling metabolism with Mathematica, detailed examples including erythrocyte metabolism. CRC Press, Boca Raton
Pal R, Datta A et al (2005) Intervention in context‐sensitive probabilistic Boolean networks. Bioinformatics 21(7):1211–1218
Palsson B (2004) Two‐dimensional annotation of genomes. Nat Biotechnol 22(10):1218–1219
Papin JA, Hunter T et al (2005) Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol 6(2):99–111
Papin JA, Palsson BO (2004) The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys J 87(1):37–46
Papin JA, Palsson BO (2004) Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk. J Theor Biol 227(2):283–297
Papin JA, Price ND et al (2002) The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J Theor Biol 215(1):67–82
Pharkya P, Burgard AP et al (2003) Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock. Biotechnol Bioeng 84(7):887–899
Pharkya P, Burgard AP et al (2004) OptStrain: a computational framework for redesign of microbial production systems. Genome Res 14(11):2367–76
Pournara I, Wernisch L (2004) Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 20(17):2934–2942
Price ND, Papin JA et al (2002) Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res 12(5):760–769
Price ND, Reed JL et al (2004) Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol 2(11):886–897
Price ND, Schellenberger J et al (2004) Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys J 87(4):2172–2186
Reed JL, Palsson BO (2003) Thirteen years of building constraint-based in silico models of Escherichia coli. J Bacteriol 185(9):2692–2699
Reed JL, Palsson BO (2004) Genome-scale in silico models of E. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states. Genome Res 14(9):1797–1805
Reed JL, Vo TD et al (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4(9):R54
Reiss DJ, Baliga NS et al (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7:280
Rual JF, Venkatesan K et al (2005) Towards a proteome-scale map of the human protein‐protein interaction network. Nature 437(7062):1173–1178
Sachs K, Perez O et al (2005) Causal protein‐signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529
Sauer U (2004) High‐throughput phenomics: experimental methods for mapping fluxomes. Curr Opin Biotechnol 15(1):58–63
Shannon P, Markiel A et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–504
Shmulevich I, Dougherty ER et al (2002) Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2):261–274
Shmulevich I, Dougherty ER et al (2002) From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proceedings of the IEEE 90(11):1778–1792
Shmulevich I, Dougherty ER et al (2002) Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 18(10):1319–1331
Smith HO, Tomb JF et al (1995) Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science 269(5223):538–540
Stelzl U, Worm U et al (2005) A human protein‐protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968
Thakar J, Pillione M et al (2007) Modelling Systems-Level Regulation of Host Immune Responses. PloS Comput Biol 3(6):e109
Thiele I, Price ND et al (2005) Candidate metabolic network states in human mitochondria. Impact of diabetes, ischemia, and diet. J Biol Chem 280(12):11683–11695
Thiele I, Vo TD et al (2005) Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-scale characterization of single- and double‐deletion mutants. J Bacteriol 187(16):5818–5830
Tong AH, Lesage G et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813
von Dassow G, Meir E et al (2000) The segment polarity network is a robust developmental module. Nature 406(6792):188–192
Werner SL, Barken D et al (2005) Stimulus specificity of gene expression programs determined by temporal control of IKK activity. Science 309(5742):1857–1861
Westbrook J, Feng Z et al (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res 30(1):245–248
Westerhoff HV, Palsson BO (2004) The evolution of molecular biology into systems biology. Nat Biotechnol 22(10):1249–1252
Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21(1):71–79
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag
About this entry
Cite this entry
Eddy, J.A., Price, N.D. (2009). Biological Data Integration and Model Building. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_34
Download citation
DOI: https://doi.org/10.1007/978-0-387-30440-3_34
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-75888-6
Online ISBN: 978-0-387-30440-3
eBook Packages: Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics