Abstract
A detailed algorithmic explanation is required for how a network of chemical reactions can generate the sophisticated behavior displayed by living cells. Though several previous works have shown that reaction networks are computationally universal and can in principle implement any algorithm, there is scope for constructions that map well onto biological reality, make efficient use of the computational potential of the native dynamics of reaction networks, and make contact with statistical mechanics. We describe a new reaction network scheme for solving a large class of statistical problems. Specifically we show how reaction networks can implement information projection, and consequently a generalized Expectation-Maximization algorithm, to solve maximum likelihood estimation problems in partially-observed exponential families on categorical data. Our scheme can be thought of as an algorithmic interpretation of E. T. Jaynes’s vision of statistical mechanics as statistical inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This situation can arise because only a linear projection is observable. It can also happen because we require a rich family of probability distributions on the space of s points, but don’t want to give away the nice properties of exponential families. We can achieve both by imagining that our observation s comes from projection from a data vector x living in a higher-dimensional space, and then employ an exponential family of probability distributions on this higher-dimensional space.
References
Amari, S.: Information Geometry and Its Applications. AMS, vol. 194. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Andersen, E.B.: Sufficiency and exponential families for discrete sample spaces. J. Am. Stat. Assoc. 65(331), 1248–1255 (1970)
Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)
Angeli, D., De Leenheer, P., Sontag, E.: A Petri net approach to persistence analysis in chemical reaction networks. In: Queinnec, I., Tarbouriech, S., Garcia, G., Niculescu, S.-I. (eds.) Biology and Control Theory: Current Challenges. LNCIS, vol. 357, pp. 181–216. Springer, Berlin / Heidelberg (2007). https://doi.org/10.1007/978-3-540-71988-5_9
Angeli, D., De Leenheer, P., Sontag, E.D.: A Petri net approach to the study of persistence in chemical reaction networks. Math. Biosci. 210(2), 598–618 (2007)
Baez, J., Stay, M.: Algorithmic thermodynamics. Math. Struct. Comput. Sci. 22(5), 771–787 (2012)
Birch, M.W.: Maximum likelihood in three-way contingency tables. J. R. Stat. Soc. Ser. B 25, 220–233 (1963)
Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)
Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)
Cencov, N.N.: Statistical Decision Rules and Optimal Inference. Translation of Mathematical Monographs, vol. 53. American Mathematical Society, Providence (2000)
Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)
Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theory 49(6), 1474–1490 (2003)
Csiszár, I., Shields, P.C.: Information theory and statistics: a tutorial. Found. Trends Commun. Inf. Theory 1(4), 417–528 (2004)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.), 1–38 (1977)
Desvillettes, L., Fellner, K., Tang, B.Q.: Trend to equilibrium for reaction-diffusion systems arising from complex balanced chemical reaction networks. SIAM J. Math. Anal. 49(4), 2666–2709 (2017)
Feinberg, M.: On chemical kinetics of a certain class. Arch. Ration. Mech. Anal. 46, 1–41 (1972)
Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/~FEINBERG/LecturesOnReactionNetworks/
Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)
Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1
Gopalkrishnan, M., Miller, E., Shiu, A.: A geometric approach to the global attractor conjecture. SIAM J. Appl. Dyn. Syst. 13(2), 758–797 (2014)
Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)
Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)
Horn, F.J.M.: The dynamics of open reaction systems. In: Mathematical Aspects of Chemical and Biochemical Problems and Quantum Chemistry. Proceedings of Symposia in Applied Mathematics, vol. VIII, New York (1974)
Ikeda, S., Tanaka, T., Amari, S.: Information geometry of turbo and low-density parity-check codes. IEEE Trans. Inf. Theory 50(6), 1097–1114 (2004)
Ikeda, S., Tanaka, T., Amari, S.: Stochastic reasoning, free energy, and information geometry. Neural Comput. 16(9), 1779–1810 (2004)
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)
Miller, E.: Theory and applications of lattice point methods for binomial ideals. In: Fløystad, G., Johnsen, T., Knutsen, A. (eds.) Combinatorial Aspects of Commutative Algebra and Algebraic Geometry, pp. 99–154. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19492-4_8
Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)
Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. Syst. Biol. IET 5(4), 252–260 (2011)
Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology, vol. 13. Cambridge University Press, Cambridge (2005)
Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14
Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)
Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)
Sarpeshkar, R.: Analog synthetic biology. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 372(2012), 20130110 (2014)
Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)
Tribus, M., McIrvine, E.C.: Energy and information. Sci. Am. 225(3), 179–190 (1971)
Van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, vol. 1. Elsevier, New York (1992)
Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine, vol. 25. MIT Press, Cambridge (1961)
Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)
Zellner, A.: Optimal information processing and Bayes’s theorem. Am. Stat. 42(4), 278–280 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A
Appendix A
Proof
(Proof of Theorem 2). (1) Fix \(\alpha \in \mathbb {R}^n_{>0}\). We first prove uniqueness: suppose for contradiction that there are at least two points of intersection \(\alpha ^*_1,\alpha ^*_2\) of the polytope \((\alpha + V^\perp )\cap \mathbb {R}^n_{\ge 0}\) with the hypersurface \(e^V\). Since \(\alpha ^*_1,\alpha ^*_2\in e^V\), we have \(\log \alpha _1^* - \log \alpha _2^*\in V\). Since \(\alpha - \alpha _1^*\in V^\perp \), we have \((\alpha -\alpha _1^*)\cdot (\log \alpha _1^* - \log \alpha _2^*)=0\). Then by the Pythagorean theorem, \(D(\alpha \Vert \alpha ^*_1) = D(\alpha \Vert \alpha ^*_2) + D(\alpha ^*_2\Vert \alpha ^*_1)\) which implies \(D(\alpha \Vert \alpha ^*_1)\ge D(\alpha \Vert \alpha ^*_2)\). By a symmetric argument, \(D(\alpha \Vert \alpha ^*_2)\ge D(\alpha \Vert \alpha ^*_1)\) and we conclude \(D(\alpha \Vert \alpha ^*_2)= D(\alpha \Vert \alpha ^*_1)\). In particular, \(D(\alpha ^*_2\Vert \alpha ^*_1)=0\) which implies \(\alpha ^*_1=\alpha ^*_2\) by Note 1.
To prove that there exists at least one point of intersection, and to show (2), fix \(\beta \in e^V\). We will show that the E-Projection \(\alpha ^*\) of \(\beta \) to \((\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}\) belongs to \(e^V\). This point \(\alpha ^*\) exists since \(D(x\Vert \beta )\) is continuous, and hence attains its minimum over the compact set \((\alpha +V^\perp )\cap \mathbb {R}^n_{\ge 0}\). Further, because \(\alpha ^*\) is an infimum, we need that \(\lim _{\lambda \rightarrow 0} \frac{d f((1-\lambda )\alpha ^* + \lambda \alpha )}{d\lambda }=0\). That is, \((\alpha - \alpha ^*)\log \frac{\alpha ^*}{\beta }=0\), which implies that \(\alpha ^*\in e^V\) since \(\alpha \) could have been replaced by any other arbitrary point of \((\alpha +V^\perp )\cap \mathbb {R}^n_{>0}\).
(3) now follows because \(\alpha ^*\in e^V\) implies \(D(\alpha \Vert \alpha ^*) + D(\alpha ^*\Vert \beta ) = D(\alpha \Vert \beta )\) for all \(\beta \in e^V\), hence \(\alpha ^*\) is the M-Projection of \(\alpha \) to \(e^V\).
Proof
(Proof of Theorem 6).
-
(1)
From the chain rule, \(\dot{D}(x(t) \Vert y_A\circ \theta (t)) = ( \nabla _x D\cdot \dot{x} + \nabla _\theta D\cdot \dot{\theta })|_{(x(t),\theta (t))}\). From Theorem 4, the first term is nonpositive with equality iff x(t) is the E-Projection of \(y(\theta (t))\) onto \(H_{x_0}\). From Theorem 5, the second term is nonpositive with equality iff \(y(\theta (t))\) is the M-Projection of x(t) onto \(y_A(\mathbb {R}^m)\). Hence \(dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0\) with equality iff both x(t) is the E-Projection of \(y_A\circ \theta (t)\) to \(H_{x_0}\) and \(y_A\circ \theta (t)\) is the M-Projection of x(t) to \(y_A(\mathbb {R}^m)\).
-
(2)
Since \(D(x(t) \Vert y_A\circ \theta (t))\) has a lower bound and \(dD(x(t) \Vert y_A\circ \theta (t))/dt \le 0\), eventually \(dD(x(t) \Vert y_A\circ \theta (t))/dt = 0\) at which point, by the above argument, both the E-Projection and M-Projection subnetworks are stationary, so that \(\dot{x}=0\) and \(\dot{\theta }=0\). Hence the limit \((\hat{x},\hat{\theta })=\lim _{t\rightarrow \infty }(x(t), \theta (t))\) exists.
-
(3)
follows since \(\nabla _\theta D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \dot{\theta }(t)/\theta (t)|_{\hat{x},\hat{\theta }}=0\) when \(\hat{\theta }\in \mathbb {R}^\varTheta _{>0}\).
-
(4)
\(\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }} = \log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) \). By (1), the point \(\hat{x}\) is the E-Projection of \(y_A(\hat{\theta })\) to \(H_{x_0}\). Hence by Theorem 2, the point \(\hat{x}\) is the Birch point of \(x_0\) relative to the affine space \(\log y_A(\mathbb {R}^m)\), so that \((x - \hat{x})\log \left( \frac{\hat{x}}{y_A(\hat{\theta })}\right) =0\) for all \(x\in H_{x_0}\). Hence the gradient \(\nabla _x D(x\Vert y_A(\theta ))|_{\hat{x},\hat{\theta }}\) is perpendicular to \(H_{R_\mathcal {B}}\).
Example 8
Consider \(A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\end{array}\right) \) and \(\mathcal {S}=\left( \begin{array}{rrr}1&{}0&{}1\\ 1&{}1&{}1\end{array}\right) \). The vector \(\left( \begin{array}{r}1 \\ 0 \\ -1 \end{array}\right) \) spans \(\ker \mathcal {S}\). The corresponding reaction network is
Here the concentration of \(X_2\) remains invariant with time. Let c be the initial concentration of \(X_2\). If \(c < 1/3\) then the system admits two stable equilibria and one unstable equilibrium. The points \((y_1,c,y_2,\sqrt{y_1},\sqrt{y_2})\) and \((y_2,c,y_1,\sqrt{y_2},\sqrt{y_1})\) are the stable equilibria where \(y_1=\frac{1-c}{2}+\frac{\sqrt{(1-3c)(1+c)}}{2}\) and \(y_2=\frac{1-c}{2}-\frac{\sqrt{(1-3c)(1+c)}}{2}\), and \(\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) \) is the unstable equilibrium. On the other hand, if \(c\ge 1/3\) then there is only one equilibrium point at \(\left( \frac{1-c}{2},c,\frac{1-c}{2},\sqrt{\frac{1}{3}},\sqrt{\frac{1}{3}}\right) \), and this point is stable.
Example 9
Consider \(A=\left( \begin{array}{ccc}2&{}1&{}0\\ 0&{}1&{}2\\ \end{array}\right) \) and \(\mathcal {S}=\left( \begin{array}{rrr}1&{}-1&{}0\\ 1&{}1&{}1\end{array}\right) \). The vector \(\left( \begin{array}{r}1 \\ 1 \\ -2 \end{array}\right) \) spans \(\ker \mathcal {S}\). The corresponding reaction network is
Here the set \(\{X_1,X_2,\theta _1\}\) is a critical siphon. If we start at the initial concentrations \(x_1=0.05,x_2=0.05,x_3=0.9,\theta _1=0.1,\theta _2=1.0\), then the system converges to \(x_1=0,x_2=0,x_3=1,\theta _1=0,\theta _2=1\), hence this system is not persistent. This provides one explanation for this data: all the outcomes were of type \(X_3\). If instead we start at \(\theta _1=0.5,\theta _2=1.0\) and the same x concentrations, then the system converges to \(x_1=x_2=x_3=1/3,\theta _1=\theta _2=1/\sqrt{c}\). This provides a different explanation for the same data: all three outcomes have occurred equally frequently.
Example 10
Boltzmann machines are a popular model in machine learning. Formally a Boltzmann machine is a graph \(G=(V,E)\), each of whose nodes can be either 1 or 0. One associates to every configuration \(s \in \{0,1\}^V\) of the Boltzmann machine an energy \(E(s)=-\sum _i b_i s_i-\sum _{ij}w_{ij}s_is_j\). The probability of the Boltzmann machine being in configuration s is given by the exponential family \(P(s;b,w)\propto \exp (-E(s))\). Boltzmann machines can be used to do inference conditioned on partial observations, and learning of the maximum likelihood values of the parameters \(b_i,w_{ij}\) can be done by a stochastic gradient descent.
Our EM scheme can be used to implement the learning rule of arbitrary Boltzmann machines in chemistry. We illustrate the construction on the 3-node Boltzmann machine with \(V=\{x_1,x_2,x_3\}\):
with biases \(b_1,b_2,b_3\) and weights \(w_{12}\) and \(w_{23}\). We will work with parameters \(\theta _i=\exp (b_i)\) and \(\theta _{ij}=\exp (w_{ij})\). The design matrix \(A=(a_{ij})_{5\times 8}\) is
and the corresponding exponential model \(y_A:\mathbb {R}^5\rightarrow \mathbb {R}^8_{>0}\) sends \(\theta = (\theta _1,\theta _2,\theta _3,\theta _{ 12},\theta _{23}) \longmapsto (\theta ^{a_{.1}},\theta ^{a_{.2}},\dots ,\theta ^{a_{.8}})\). If the node \(x_2\) is hidden then the observation matrix \(\mathcal S\) is
Our EM scheme yields the reaction network
Suppose we observe a marginal distribution (0.24, 0.04, 0.17, 0.55) on the visible nodes \(x_1,x_3\). To solve for the maximum likelihood \(\hat{\theta }\), we can initialize the system with \(X_{000}=0.24,X_{001}=0.04,X_{010}=0,X_{011}=0,X_{100}=0.17,X_{101}=0.55,X_{110}=0,X_{111}=0\) and all \(\theta \)’s initialized to 1, the system reaches steady state at \(\hat{\theta }_1=0.5176,\hat{\theta }_2=0.0018,\hat{\theta }_3=0.3881,\hat{\theta }_{12}=0.8246,\hat{\theta }_{23}=0.7969, \hat{X}_{000}=0.2391,\hat{X}_{001}=0.0389, \hat{X}_{010}=0.0009, \hat{X}_{011}=0.011, \hat{X}_{100}=0.1695, \hat{X}_{101}=0.5487, \hat{X}_{110}=0.0005, \hat{X}_{111}=0.0013\).
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Viswa Virinchi, M., Behera, A., Gopalkrishnan, M. (2018). A Reaction Network Scheme Which Implements the EM Algorithm. In: Doty, D., Dietz, H. (eds) DNA Computing and Molecular Programming. DNA 2018. Lecture Notes in Computer Science(), vol 11145. Springer, Cham. https://doi.org/10.1007/978-3-030-00030-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-00030-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00029-5
Online ISBN: 978-3-030-00030-1
eBook Packages: Computer ScienceComputer Science (R0)