[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Connecting weighted automata, tensor networks and recurrent neural networks through spectral learning

Published: 30 March 2022 Publication History

Abstract

In this paper, we present connections between three models used in different research fields: weighted finite automata (WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks which encompasses a set of optimization techniques for high-order tensors used in quantum physics and numerical analysis. We first present an intrinsic relation between WFA and the tensor train decomposition, a particular form of tensor network. This relation allows us to exhibit a novel low rank structure of the Hankel matrix of a function computed by a WFA and to design an efficient spectral learning algorithm leveraging this structure to scale the algorithm up to very large Hankel matrices. We then unravel a fundamental connection between WFA and second-order recurrent neural networks (2-RNN): in the case of sequences of discrete symbols, WFA and 2-RNN with linear activation functions are expressively equivalent. Leveraging this equivalence result combined with the classical spectral learning algorithm for weighted automata, we introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous input vectors. This algorithm relies on estimating low rank sub-blocks of the Hankel tensor, from which the parameters of a linear 2-RNN can be provably recovered. The performances of the proposed learning algorithm are assessed in a simulation study on both synthetic and real-world data.

References

[1]
Angluin D Queries and concept learning Machine Learning 1988 2 4 319-342
[2]
Avcu, E., Shibata, C., & Heinz, J. (2017). Subregular complexity and deep learning. In CLASP papers in computational linguistics (p. 20).
[3]
Ayache, S., Eyraud, R., & Goudian, N. (2018). Explaining black boxes on sequential data using weighted automata (pp. 81–103).
[4]
Bailly, R., Denis, F., & Ralaivola, L. (2009). Grammatical inference as a principal component analysis problem (pp. 33–40).
[5]
Balle, B., & Mohri, M. (2012). Spectral learning of general weighted automata via constrained matrix completion (pp. 2159–2167)
[6]
Balle B, Carreras X, Luque FM, and Quattoni A Spectral learning of weighted automata Machine Learning 2014 96 1–2 33-63
[7]
Biamonte, J., & Bergholm, V. (2017). Tensor networks in a nutshell. arXiv preprint arXiv:170800006.
[8]
Boots B, Siddiqi SM, and Gordon GJ Closing the learning-planning loop with predictive state representations International Journal of Robotics Research 2011 30 7 954-966
[9]
Cai TT and Zhang A Rop: Matrix recovery via rank-one projections The Annals of Statistics 2015 43 1 102-138
[10]
Candes EJ and Plan Y Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements IEEE Transactions on Information Theory 2011 57 4 2342-2359
[11]
Carlyle JW and Paz A Realizations by stochastic finite automata Journal of Computer and System Sciences 1971 5 1 26-40
[12]
Caron, R., & Traynor, T. (2005). The zero set of a polynomial. WSMR Report, 05–02.
[13]
Chen, Y., Gilroy, S., Maletti, A., May, J., & Knight, K. (2018). Recurrent neural networks as weighted language recognizers (pp. 2261–2271).
[14]
Cichocki A, Lee N, Oseledets I, Phan AH, Zhao Q, Mandic DP, et al. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends&#x00AE Machine Learning 2016 9 4–5 249-429
[15]
Cohen, N., Sharir, O., & Shashua, A. (2016). On the expressive power of deep learning: A tensor analysis (pp. 698–728).
[16]
Critch, A. (2013). Algebraic geometry of hidden markov and related models. University of California. PhD thesis.
[17]
Critch A and Morton J Algebraic geometry of matrix product states SIGMA 2014 10 095 095
[18]
Denis, F., & Esposito, Y. (2008). On rational stochastic languages. Fundamenta Informaticae, 86(1, 2), 41–77.
[19]
Downey, C., Hefny, A., Boots, B., Gordon, G. J., & Li, B. (2017). Predictive state recurrent neural networks (pp. 6055–6066).
[20]
Elman JL Finding structure in time Cognitive Science 1990 14 2 179-211
[21]
Federer, H. (2014). Geometric measure theory. Springer.
[22]
Fliess M Matrices de Hankel Journal de Mathématiques Pures et Appliquées 1974 53 9 197-222
[23]
Gelß, P. (2017). The tensor-train format and its applications: Modeling and analysis of chemical reaction networks, catalytic processes, fluid flows, and brownian dynamics. PhD thesis
[24]
Gers FA, Schmidhuber J, and Cummins F Learning to forget: Continual prediction with LSTM Neural Computation 2000 12 10 2451-2471
[25]
Giles, C. L., Sun, G. Z., Chen, H. H., Lee, Y. C., & Chen, D. (1990). Higher order recurrent networks and grammatical inference (pp. 380–387).
[26]
Giles CL, Miller CB, Chen D, Chen HH, Sun GZ, and Lee YC Learning and extracting finite state automata with second-order recurrent neural networks Neural Computation 1992 4 3 393-405
[27]
Glaude, H., & Pietquin, O. (2016). PAC learning of probabilistic automaton based on the method of moments (pp. 820–829).
[28]
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks (pp. 6645–6649).
[29]
Han, Z. Y., Wang, J., Fan, H., Wang, L., & Zhang, P. (2018). Unsupervised generative modeling using matrix product states. Physical Review X,8(3), 031012.
[30]
Hillar CJ and Lim LH Most tensor problems are np-hard Journal of the ACM (JACM) 2013 60 6 45
[31]
Holtz S, Rohwedder T, and Schneider R The alternating linear scheme for tensor optimization in the tensor train format SIAM Journal on Scientific Computing 2012 34 2 A683-A713
[32]
Hsu, D. J., Kakade, S. M., & Zhang, T. (2009). A spectral algorithm for learning hidden Markov models.
[33]
Jain, P., Meka, R., & Dhillon, I. S. (2010). Guaranteed rank minimization via singular value projection (pp. 937–945).
[34]
Khrulkov, V., Novikov, A., & Oseledets, I. (2018). Expressive power of recurrent neural networks.
[35]
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.
[36]
Klus S, Gelß P, Peitz S, and Schütte C Tensor-based dynamic mode decomposition Nonlinearity 2018 31 7 3359
[37]
Kolda TG and Bader BW Tensor decompositions and applications SIAM Review 2009 51 3 455-500
[38]
Lee Y, Doolen G, Chen H, Sun G, Maxwell T, Lee H, and Giles CL Machine learning using a higher order correlation network Physica D: Nonlinear Phenomena 1986 22 1–3 276-306
[39]
Li, T., Rabusseau, G., & Precup, D. (2018). Nonlinear weighted finite automata (pp. 679–688).
[40]
Lin, Q., Hammerschmidt, C., Pellegrino, G., & Verwer, S. (2016). Short-term time series forecasting with regression automata.
[41]
Lubich C, Rohwedder T, Schneider R, and Vandereycken B Dynamical approximation by hierarchical tucker and tensor-train tensors SIAM Journal on Matrix Analysis and Applications 2013 34 2 470-494
[42]
Ma, X., Zhang, P., Zhang, S., Duan, N., Hou, Y., Zhou, M., & Song, D. (2019). A tensorized transformer for language modeling. In Advances in neural information processing systems (pp. 2229–2239)
[43]
Merrill, W., Weiss, G., Goldberg, Y., Schwartz, R., Smith, N. A., & Yahav, E. (2020). A formal hierarchy of rnn architectures (pp. 443–459).
[44]
Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., & Khudanpur, S. (2011). Extensions of recurrent neural network language model (pp. 5528–5531).
[45]
Miller, J., Rabusseau, G., & Terilla, J. (2020) Tensor networks for language modeling. arXiv preprint arXiv: 200301039.
[46]
Novikov, A., Rodomanov, A., Osokin, A., & Vetrov, D. (2014). Putting mrfs on a tensor train (pp. 811–819).
[47]
Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in neural information processing systems (pp. 442–450)
[48]
Omlin CW and Giles CL Constructing deterministic finite-state automata in recurrent neural networks Journal of the ACM (JACM) 1996 43 6 937-972
[49]
Orús R A practical introduction to tensor networks: Matrix product states and projected entangled pair states Annals of Physics 2014 349 117-158
[50]
Oseledets IV Tensor-train decomposition SIAM Journal on Scientific Computing 2011 33 5 2295-2317
[51]
Peng, H., Schwartz, R., Thomson, S., & Smith, N. A. (2018). Rational recurrences (pp. 1203–1214).
[52]
Pollack, J. B. (1991). The induction of dynamical recognizers. In Connectionist approaches to language learning (pp. 123–148). Springer.
[53]
Quattoni, A., Carreras, X., & Gallé, M. (2017). A maximum matching algorithm for basis selection in spectral learning (pp. 1477–1485).
[54]
Rabusseau, G. (2016). A tensor perspective on weighted automata, low-rank regression and algebraic mixtures. Aix-Marseille Université. PhD thesis.
[55]
Rabusseau, G., Balle, B., & Pineau, J. (2017). Multitask spectral learning of weighted automata (pp. 2585–2594)
[56]
Rabusseau, G., Li, T., & Precup, D. (2019). Connecting weighted automata and recurrent neural networks through spectral learning (pp. 1630–1639).
[57]
Rauhut H, Schneider R, and Stojanac Ž Low rank tensor recovery via iterative hard thresholding Linear Algebra and its Applications 2017 523 220-262
[58]
Recasens, A., & Quattoni, A. (2013). Spectral learning of sequence taggers over continuous sequences (pp. 289–304).
[59]
Recht B, Fazel M, and Parrilo PA Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization SIAM Review 2010 52 3 471-501
[60]
Schollwöck U The density-matrix renormalization group in the age of matrix product states Annals of Physics 2011 326 1 96-192
[61]
Sedghi, H., & Anandkumar, A. (2016). Training input-output recurrent neural networks through spectral methods. arXiv preprint arXiv:160300954.
[62]
Siegelmann, H. T., & Sontag, E. D. (1992). On the computational power of neural nets. In Proceedings of COLT (pp. 440–449). ACM.
[63]
Socher. R., Chen, D., Manning, C. D., & Ng, A. (2013a). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926–934).
[64]
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013b). Recursive deep models for semantic compositionality over a sentiment treebank (pp. 1631–1642).
[65]
Stoudenmire, E., & Schwab, D. J. (2016). Supervised learning with tensor networks. In Advances in neural information processing systems (pp. 4799–4807)
[66]
Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks (pp. 1017–1024).
[67]
Thon M and Jaeger H Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework Journal of Machine Learning Research 2015 16 103-147
[68]
Tjandra, A., Sakti, S., & Nakamura, S. (2017). Compressing recurrent neural network with tensor train (pp. 4451–4458).
[69]
Wang, W., Aggarwal, V., & Aeron, S. (2017). Efficient low rank tensor ring completion (pp. 5697–5705).
[70]
Weiss, G., Goldberg, Y., & Yahav, E. (2018). Extracting automata from recurrent neural networks using queries and counterexamples (pp. 5244–5253).
[71]
Wu, Y., Zhang, S., Zhang, Y., Bengio, Y., & Salakhutdinov, R. R. (2016). On multiplicative integration with recurrent neural networks (pp. 2856–2864).
[72]
Yang, Y., Krompass, D., & Tresp, V. (2017). Tensor-train recurrent neural networks for video classification (pp. 3891–3900).
[73]
Yu, R., Zheng, S., Anandkumar, A., & Yue, Y. (2017). Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:171100073.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 113, Issue 5
May 2024
1040 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 30 March 2022
Accepted: 13 December 2021
Revision received: 13 September 2021
Received: 09 May 2020

Author Tags

  1. Weighted automata
  2. Spectral learning
  3. Recurrent neural networks
  4. Tensor networks
  5. Tensor train decomposition

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media