from Part IV - Big data over biological networks
Published online by Cambridge University Press: 18 December 2015
Network discovery is often of primary interest in many scientific domains. It becomes much more challenging in biological domain because: (1) such networks are not directly observable in the experiments; (2) such networks are dynamic, i.e. different parts of the network are activated from time to time and from condition to condition; and (3) the increasingly available biological data are often big (volume), heterogeneous (variety), and error prone (veracity). There is an urgent need for the new methods, algorithms and tools to discover networks from big biological data. In this chapter, we make two assumptions that lead to two approaches to network discovery from big biological data. (1) The true network topology is a distribution of candidate topologies. The challenge is that an exponential number of possible topologies are computational intractable to characterize. Our strategy, i.e. gene set Gibbs sampling (GSGS), is to draw sample topologies and use them to infer the true topology – an approximate learning falling into stochastic algorithm framework. (2) The true network topology is deterministic. The challenge is the large search space, where we design an artificial intelligence algorithm, i.e. gene set simulated annealing (GSSA), to efficiently and intelligently explore the search space of network structures. We use both simulation data and real-world data to demonstrate the performance of our approaches compared to the selected competing approaches.
Introduction
The past decade has witnessed a tremendous explosion in the amount of data generated through high-throughput molecular profiling technologies such as microarrays and next-generation sequencing. Big molecular profiling datasets are enabling a high-resolution view of biological systems and allowing scientists to interrogate the biomolecular activities of tens of thousands of genes simultaneously. However, challenges remain in analyzing big molecular profiling data and gaining meaningful insights into the biomolecular interaction and regulation mechanisms. These mechanisms are often understood through the inference of biological networks using computational systems biology approaches. A wide range of methods have been proposed in the literature for inferring the structure of different types of biological networks, such as gene regulatory networks, protein– protein interaction networks, and signaling networks in the form of Bayesian networks [1, 2], probabilistic Boolean networks (PBNs) [3, 4],mutual information networks [5–7], graphical Gaussian models [8–11], and other approaches [12–16].
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.