[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Biological Data Integration and Model Building

  • Reference work entry
Encyclopedia of Complexity and Systems Science

Definition of the Subject

Data integration and model building have become essential activities in biological research as technological advancements continue to empower themeasurement of biological data of increasing diversity and scale. High‐throughput technologies provide a wealth of global data sets(e. g. genomics, transcriptomics, proteomics, metabolomics), and the challenge becomes how to integrate this data to maximize the amount of usefulbiological information that can be extracted. Integrating biological data is important and challenging because of the nature of biology. Biologicalsystems have evolved over the course of billions of years, and in that time biological mechanisms have become very diverse, with molecular machines ofintricate detail. Thus, while there are certainly great general scientific principles to be distilled – such as the foundational evolutionarytheory – much of biology is found in the details of these evolved systems. This emphasis on the details of systems...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 2,999.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

Constraint-based analysis:

A modeling framework based on excluding infeasible network states via environmental, physicochemical, and regulatory constraints to improve predictions of achievable cellular states and behavior.

Data space:

Multidimensional space containing all possible states of a system; this space can be reduced using defined constraints.

Interaction network:

A graph where the nodes represent biomolecules (e. g. genes) and the edges represent defined interactions between the nodes, whether they be direct physical interactions (e. g. protein–protein binding, protein–DNA binding) or functional relationships (e. g. synthetic lethality).

Biochemical reaction network:

Collection of metabolic, signaling, or regulatory chemical reactions described in stoichiometric detail.

Statistical inferrence network:

A network model designed from statistical inference from large-scale biological data sets to be quantitatively predictive for novel perturbations and/or environmental conditions.

Genome:

The complete DNA nucleotide sequence in all chromosomes of an organism.

Transcriptome:

The complete set of RNA transcripts produced from an organism's genome under a particular set of conditions.

Proteome:

The complete set of expressed proteins produced by the genome.

Metabolome:

The complete set of small molecules which are the intermediates and products of an organism's metabolism.

Boolean network:

A set of N discrete‐valued variables, \( { \sigma _1 ,\sigma _2 , \dots ,\sigma _N } \) where \( { \sigma _n \in \{ {0,1} \} } \). To each node a set of \( { k_n } \) nodes, \( { \sigma _{n_1} ,\sigma _{n_2} , \dots ,\sigma _{n_{k_n} } } \) is assigned, which controls the value of \( { \sigma _n } \) through the equation \( \sigma _n (t+1)=f_n (\sigma _{n_1} (t), \dots ,\sigma _{n_{k_n} } (t)) \). In the case of Boolean networks, the functions \( { f_n } \) can be chosen from the ensemble of all possible Boolean functions.

Bibliography

  1. Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol 223(1):1–18

    MathSciNet  Google Scholar 

  2. Alm E, Arkin AP (2003) Biological networks. Curr Opin Struct Biol 13(2):193–202

    Google Scholar 

  3. Almaas E, Kovacs B et al (2004) Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427(6977):839–43

    ADS  Google Scholar 

  4. Basso K, Margolin AA et al (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382–90

    Google Scholar 

  5. Beard DA, Babson E et al (2004) Thermodynamic constraints for biochemical networks. J Theor Biol 228(3):327–33

    MathSciNet  Google Scholar 

  6. Beard DA, Liang SD et al (2002) Energy balance for analysis of complex metabolic networks. Biophys J 83(1):79–86

    Google Scholar 

  7. Bonneau R, Reiss DJ et al (2006) The Inferelator: an algorithm for learning parsimonious regulatory networks from systems‐biology data sets de novo. Genome Biol 7(5):R36

    Google Scholar 

  8. Burgard AP, Pharkya P et al (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84(6):647–57

    Google Scholar 

  9. Christopher R, Dhiman A et al (2004) Data-driven computer simulation of human cancer cell. Ann NY Acad Sci 1020:132–53

    ADS  Google Scholar 

  10. Cohen JE (2004) Mathematics is biology's next microscope, only better; biology is mathematics' next physics, only better. PLoS Biol 2(12):e439

    Google Scholar 

  11. Covert MW, Knight EM et al (2004) Integrating high‐throughput and computational data elucidates bacterial networks. Nature 429(6987):92–6

    ADS  Google Scholar 

  12. Covert MW, Leung TH et al (2005) Achieving stability of lipopolysaccharide‐induced NF-kappaB activation. Science 309(5742):1854–1857

    ADS  Google Scholar 

  13. Deshpande N, Addess KJ et al (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res (Database issue) 33:D233–7

    Google Scholar 

  14. Duarte NC, Becker SA et al (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci USA 104(6):1777–82

    ADS  Google Scholar 

  15. Duarte NC, Herrgard MJ et al (2004) Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res 14(7):1298–309

    Google Scholar 

  16. Edwards JS, Ibarra RU et al (2001) In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 19(2):125–30

    Google Scholar 

  17. Edwards JS, Palsson BO (2000) Robustness analysis of the Escherichia coli metabolic network. Biotechnol Prog 16(6):927–39

    Google Scholar 

  18. Faith JJ, Hayete B et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5(1):e8

    Google Scholar 

  19. Famili I, Forster J et al (2003) Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc Natl Acad Sci USA 100(23):13134–13139

    ADS  Google Scholar 

  20. Faure A, Naldi A et al (2006) Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics 22(14):e124–e131

    Google Scholar 

  21. Forster J, Famili I et al (2003) Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae. Omics 7(2):193–202

    Google Scholar 

  22. Francke C, Siezen RJ et al (2005) Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol 13(11):550–558

    Google Scholar 

  23. Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science 303(5659):799–805

    ADS  Google Scholar 

  24. Gianchandani EP, Papin JA et al (2006) Matrix formalism to describe functional states of transcriptional regulatory systems. PLoS Comput Biol 2(8):e101

    ADS  Google Scholar 

  25. Han JD, Bertin N et al (2004) Evidence for dynamically organized modularity in the yeast protein‐protein interaction network. Nature 430(6995):88–93

    ADS  Google Scholar 

  26. Hashimoto RF, Kim S et al (2004) Growing genetic regulatory networks from seed genes. Bioinformatics 20(8):1241–7

    Google Scholar 

  27. Heinemann M, Kummel A et al (2005) In silico genome-scale reconstruction and validation of the Staphylococcus aureus metabolic network. Biotechnol Bioeng 92(7):850–864

    Google Scholar 

  28. Hendriks BS, Wiley HS et al (2003) HER2-mediated effects on EGFR endosomal sorting: analysis of biophysical mechanisms. Biophys J 85(4):2732–2745

    Google Scholar 

  29. Herrgard MJ, Palsson BO (2005) Untangling the web of functional and physical interactions in yeast. J Biol 4(2):5

    Google Scholar 

  30. Hoffmann A, Levchenko A et al (2002) The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science 298(5596):1241–1245

    ADS  Google Scholar 

  31. Hood L, Heath JR et al (2004) Systems biology and new technologies enable predictive and preventative medicine. Science 306(5696):640–643

    ADS  Google Scholar 

  32. Hua Q, Joyce AR et al (2006) Metabolic analysis of adaptive evolution for in silico‐designed lactate‐producing strains. Biotechnol Bioeng 95(5):992–1002

    Google Scholar 

  33. Hwang D, Rust AG et al (2005) A data integration methodology for systems biology. Proc Natl Acad Sci USA 102(48):17296–17301

    ADS  Google Scholar 

  34. Hwang D, Smith JJ et al (2005) A data integration methodology forsystems biology: experimental verification. Proc Natl Acad Sci USA 102(48):17302–17307

    ADS  Google Scholar 

  35. Ibarra RU, Edwards JS et al (2002) Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420(6912):186–189

    ADS  Google Scholar 

  36. Ideker T (2004) A systems approach to discovering signaling and regulatory pathways–or, how to digest large interaction networks into relevant pieces. Adv Exp Med Biol 547:21–30

    Google Scholar 

  37. Ideker T, Galitski T et al (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372

    Google Scholar 

  38. Ideker T, Ozier O et al (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics (Suppl) 18(1):S233–S2340

    Google Scholar 

  39. Jamshidi N, Edwards JS et al (2001) Dynamic simulation of the human red blood cell metabolic network. Bioinformatics 17(3):286–287

    Google Scholar 

  40. Kauffman SA (1993) The origins of order : self organization and selection in evolution. Oxford University Press, New York

    Google Scholar 

  41. Kelley BP, Yuan B et al (2004) PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res (Web Server issue) 32:W83–W88

    Google Scholar 

  42. Kim SY, Imoto S et al (2003) Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform 4(3):228–235

    Google Scholar 

  43. Kirschner MW (2005) The meaning of systems biology. Cell 121(4):503–504

    Google Scholar 

  44. Kitano H (2002) Computational systems biology. Nature 420(6912):206–210

    ADS  Google Scholar 

  45. Kurzweil R (2005) The singularity is near: when humans transcend biology. Penguin, London

    Google Scholar 

  46. Lahdesmaki H, Hautaniemi S et al (2006) Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing 86(4):814–834

    Google Scholar 

  47. Lahdesmaki H, Shmulevich I et al (2003) On Learning Gene Regulatory Networks Under the Boolean Network Model. Machine Learning 52(1–2):147–167

    Google Scholar 

  48. Levy S, Sutton G et al (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biol 5(10):e254

    Google Scholar 

  49. Li F, Long T et al (2004) The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA 101(14):4781–4786

    ADS  Google Scholar 

  50. Li H, Zhan M (2006) Systematic intervention of transcription for identifying network response to disease and cellular phenotypes. Bioinformatics 22(1):96–102

    Google Scholar 

  51. Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5(4):264–276

    Google Scholar 

  52. Margolin AA, Wang K et al (2006) Reverse engineering cellular networks. Nat Protoc 1(2):662–671

    Google Scholar 

  53. Mulquiney PJ, Kuchel PW (2003) Modelling metabolism with Mathematica, detailed examples including erythrocyte metabolism. CRC Press, Boca Raton

    Google Scholar 

  54. Pal R, Datta A et al (2005) Intervention in context‐sensitive probabilistic Boolean networks. Bioinformatics 21(7):1211–1218

    Google Scholar 

  55. Palsson B (2004) Two‐dimensional annotation of genomes. Nat Biotechnol 22(10):1218–1219

    Google Scholar 

  56. Papin JA, Hunter T et al (2005) Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol 6(2):99–111

    Google Scholar 

  57. Papin JA, Palsson BO (2004) The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys J 87(1):37–46

    Google Scholar 

  58. Papin JA, Palsson BO (2004) Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk. J Theor Biol 227(2):283–297

    Google Scholar 

  59. Papin JA, Price ND et al (2002) The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J Theor Biol 215(1):67–82

    Google Scholar 

  60. Pharkya P, Burgard AP et al (2003) Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock. Biotechnol Bioeng 84(7):887–899

    Google Scholar 

  61. Pharkya P, Burgard AP et al (2004) OptStrain: a computational framework for redesign of microbial production systems. Genome Res 14(11):2367–76

    Google Scholar 

  62. Pournara I, Wernisch L (2004) Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 20(17):2934–2942

    Google Scholar 

  63. Price ND, Papin JA et al (2002) Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res 12(5):760–769

    Google Scholar 

  64. Price ND, Reed JL et al (2004) Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol 2(11):886–897

    Google Scholar 

  65. Price ND, Schellenberger J et al (2004) Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys J 87(4):2172–2186

    Google Scholar 

  66. Reed JL, Palsson BO (2003) Thirteen years of building constraint-based in silico models of Escherichia coli. J Bacteriol 185(9):2692–2699

    Google Scholar 

  67. Reed JL, Palsson BO (2004) Genome-scale in silico models of E. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states. Genome Res 14(9):1797–1805

    Google Scholar 

  68. Reed JL, Vo TD et al (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4(9):R54

    Google Scholar 

  69. Reiss DJ, Baliga NS et al (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7:280

    Google Scholar 

  70. Rual JF, Venkatesan K et al (2005) Towards a proteome-scale map of the human protein‐protein interaction network. Nature 437(7062):1173–1178

    ADS  Google Scholar 

  71. Sachs K, Perez O et al (2005) Causal protein‐signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529

    ADS  Google Scholar 

  72. Sauer U (2004) High‐throughput phenomics: experimental methods for mapping fluxomes. Curr Opin Biotechnol 15(1):58–63

    MathSciNet  ADS  Google Scholar 

  73. Shannon P, Markiel A et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–504

    Google Scholar 

  74. Shmulevich I, Dougherty ER et al (2002) Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2):261–274

    Google Scholar 

  75. Shmulevich I, Dougherty ER et al (2002) From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proceedings of the IEEE 90(11):1778–1792

    Google Scholar 

  76. Shmulevich I, Dougherty ER et al (2002) Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 18(10):1319–1331

    Google Scholar 

  77. Smith HO, Tomb JF et al (1995) Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science 269(5223):538–540

    ADS  Google Scholar 

  78. Stelzl U, Worm U et al (2005) A human protein‐protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968

    Google Scholar 

  79. Thakar J, Pillione M et al (2007) Modelling Systems-Level Regulation of Host Immune Responses. PloS Comput Biol 3(6):e109

    ADS  Google Scholar 

  80. Thiele I, Price ND et al (2005) Candidate metabolic network states in human mitochondria. Impact of diabetes, ischemia, and diet. J Biol Chem 280(12):11683–11695

    Google Scholar 

  81. Thiele I, Vo TD et al (2005) Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-scale characterization of single- and double‐deletion mutants. J Bacteriol 187(16):5818–5830

    Google Scholar 

  82. Tong AH, Lesage G et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813

    ADS  Google Scholar 

  83. von Dassow G, Meir E et al (2000) The segment polarity network is a robust developmental module. Nature 406(6792):188–192

    ADS  Google Scholar 

  84. Werner SL, Barken D et al (2005) Stimulus specificity of gene expression programs determined by temporal control of IKK activity. Science 309(5742):1857–1861

    ADS  Google Scholar 

  85. Westbrook J, Feng Z et al (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res 30(1):245–248

    Google Scholar 

  86. Westerhoff HV, Palsson BO (2004) The evolution of molecular biology into systems biology. Nat Biotechnol 22(10):1249–1252

    Google Scholar 

  87. Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21(1):71–79

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag

About this entry

Cite this entry

Eddy, J.A., Price, N.D. (2009). Biological Data Integration and Model Building. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_34

Download citation

Publish with us

Policies and ethics