More Web Proxy on the site http://driver.im/

research-article

Connecting weighted automata, tensor networks and recurrent neural networks through spectral learning

Authors:

Guillaume RabusseauAuthors Info & Claims

Machine Learning, Volume 113, Issue 5

Pages 2619 - 2653

https://doi.org/10.1007/s10994-022-06164-1

Published: 30 March 2022 Publication History

Abstract

In this paper, we present connections between three models used in different research fields: weighted finite automata (WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks which encompasses a set of optimization techniques for high-order tensors used in quantum physics and numerical analysis. We first present an intrinsic relation between WFA and the tensor train decomposition, a particular form of tensor network. This relation allows us to exhibit a novel low rank structure of the Hankel matrix of a function computed by a WFA and to design an efficient spectral learning algorithm leveraging this structure to scale the algorithm up to very large Hankel matrices. We then unravel a fundamental connection between WFA and second-order recurrent neural networks (2-RNN): in the case of sequences of discrete symbols, WFA and 2-RNN with linear activation functions are expressively equivalent. Leveraging this equivalence result combined with the classical spectral learning algorithm for weighted automata, we introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous input vectors. This algorithm relies on estimating low rank sub-blocks of the Hankel tensor, from which the parameters of a linear 2-RNN can be provably recovered. The performances of the proposed learning algorithm are assessed in a simulation study on both synthetic and real-world data.

References

[1]

Angluin D Queries and concept learning Machine Learning 1988 2 4 319-342

[2]

Avcu, E., Shibata, C., & Heinz, J. (2017). Subregular complexity and deep learning. In CLASP papers in computational linguistics (p. 20).

[3]

Ayache, S., Eyraud, R., & Goudian, N. (2018). Explaining black boxes on sequential data using weighted automata (pp. 81–103).

[4]

Bailly, R., Denis, F., & Ralaivola, L. (2009). Grammatical inference as a principal component analysis problem (pp. 33–40).

[5]

Balle, B., & Mohri, M. (2012). Spectral learning of general weighted automata via constrained matrix completion (pp. 2159–2167)

[6]

Balle B, Carreras X, Luque FM, and Quattoni A Spectral learning of weighted automata Machine Learning 2014 96 1–2 33-63

Digital Library

[7]

Biamonte, J., & Bergholm, V. (2017). Tensor networks in a nutshell. arXiv preprint arXiv:170800006.

[8]

Boots B, Siddiqi SM, and Gordon GJ Closing the learning-planning loop with predictive state representations International Journal of Robotics Research 2011 30 7 954-966

Digital Library

[9]

Cai TT and Zhang A Rop: Matrix recovery via rank-one projections The Annals of Statistics 2015 43 1 102-138

[10]

Candes EJ and Plan Y Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements IEEE Transactions on Information Theory 2011 57 4 2342-2359

Digital Library

[11]

Carlyle JW and Paz A Realizations by stochastic finite automata Journal of Computer and System Sciences 1971 5 1 26-40

Digital Library

[12]

Caron, R., & Traynor, T. (2005). The zero set of a polynomial. WSMR Report, 05–02.

[13]

Chen, Y., Gilroy, S., Maletti, A., May, J., & Knight, K. (2018). Recurrent neural networks as weighted language recognizers (pp. 2261–2271).

[14]

Cichocki A, Lee N, Oseledets I, Phan AH, Zhao Q, Mandic DP, et al. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends&#x00AE Machine Learning 2016 9 4–5 249-429

Digital Library

[15]

Cohen, N., Sharir, O., & Shashua, A. (2016). On the expressive power of deep learning: A tensor analysis (pp. 698–728).

[16]

Critch, A. (2013). Algebraic geometry of hidden markov and related models. University of California. PhD thesis.

[17]

Critch A and Morton J Algebraic geometry of matrix product states SIGMA 2014 10 095 095

[18]

Denis, F., & Esposito, Y. (2008). On rational stochastic languages. Fundamenta Informaticae, 86(1, 2), 41–77.

[19]

Downey, C., Hefny, A., Boots, B., Gordon, G. J., & Li, B. (2017). Predictive state recurrent neural networks (pp. 6055–6066).

[20]

Elman JL Finding structure in time Cognitive Science 1990 14 2 179-211

[21]

Federer, H. (2014). Geometric measure theory. Springer.

[22]

Fliess M Matrices de Hankel Journal de Mathématiques Pures et Appliquées 1974 53 9 197-222

[23]

Gelß, P. (2017). The tensor-train format and its applications: Modeling and analysis of chemical reaction networks, catalytic processes, fluid flows, and brownian dynamics. PhD thesis

[24]

Gers FA, Schmidhuber J, and Cummins F Learning to forget: Continual prediction with LSTM Neural Computation 2000 12 10 2451-2471

Digital Library

[25]

Giles, C. L., Sun, G. Z., Chen, H. H., Lee, Y. C., & Chen, D. (1990). Higher order recurrent networks and grammatical inference (pp. 380–387).

[26]

Giles CL, Miller CB, Chen D, Chen HH, Sun GZ, and Lee YC Learning and extracting finite state automata with second-order recurrent neural networks Neural Computation 1992 4 3 393-405

Digital Library

[27]

Glaude, H., & Pietquin, O. (2016). PAC learning of probabilistic automaton based on the method of moments (pp. 820–829).

[28]

Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks (pp. 6645–6649).

[29]

Han, Z. Y., Wang, J., Fan, H., Wang, L., & Zhang, P. (2018). Unsupervised generative modeling using matrix product states. Physical Review X,8(3), 031012.

[30]

Hillar CJ and Lim LH Most tensor problems are np-hard Journal of the ACM (JACM) 2013 60 6 45

Digital Library

[31]

Holtz S, Rohwedder T, and Schneider R The alternating linear scheme for tensor optimization in the tensor train format SIAM Journal on Scientific Computing 2012 34 2 A683-A713

Digital Library

[32]

Hsu, D. J., Kakade, S. M., & Zhang, T. (2009). A spectral algorithm for learning hidden Markov models.

[33]

Jain, P., Meka, R., & Dhillon, I. S. (2010). Guaranteed rank minimization via singular value projection (pp. 937–945).

[34]

Khrulkov, V., Novikov, A., & Oseledets, I. (2018). Expressive power of recurrent neural networks.

[35]

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.

[36]

Klus S, Gelß P, Peitz S, and Schütte C Tensor-based dynamic mode decomposition Nonlinearity 2018 31 7 3359

[37]

Kolda TG and Bader BW Tensor decompositions and applications SIAM Review 2009 51 3 455-500

Digital Library

[38]

Lee Y, Doolen G, Chen H, Sun G, Maxwell T, Lee H, and Giles CL Machine learning using a higher order correlation network Physica D: Nonlinear Phenomena 1986 22 1–3 276-306

[39]

Li, T., Rabusseau, G., & Precup, D. (2018). Nonlinear weighted finite automata (pp. 679–688).

[40]

Lin, Q., Hammerschmidt, C., Pellegrino, G., & Verwer, S. (2016). Short-term time series forecasting with regression automata.

[41]

Lubich C, Rohwedder T, Schneider R, and Vandereycken B Dynamical approximation by hierarchical tucker and tensor-train tensors SIAM Journal on Matrix Analysis and Applications 2013 34 2 470-494

Digital Library

[42]

Ma, X., Zhang, P., Zhang, S., Duan, N., Hou, Y., Zhou, M., & Song, D. (2019). A tensorized transformer for language modeling. In Advances in neural information processing systems (pp. 2229–2239)

[43]

Merrill, W., Weiss, G., Goldberg, Y., Schwartz, R., Smith, N. A., & Yahav, E. (2020). A formal hierarchy of rnn architectures (pp. 443–459).

[44]

Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., & Khudanpur, S. (2011). Extensions of recurrent neural network language model (pp. 5528–5531).

[45]

Miller, J., Rabusseau, G., & Terilla, J. (2020) Tensor networks for language modeling. arXiv preprint arXiv: 200301039.

[46]

Novikov, A., Rodomanov, A., Osokin, A., & Vetrov, D. (2014). Putting mrfs on a tensor train (pp. 811–819).

[47]

Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in neural information processing systems (pp. 442–450)

[48]

Omlin CW and Giles CL Constructing deterministic finite-state automata in recurrent neural networks Journal of the ACM (JACM) 1996 43 6 937-972

Digital Library

[49]

Orús R A practical introduction to tensor networks: Matrix product states and projected entangled pair states Annals of Physics 2014 349 117-158

[50]

Oseledets IV Tensor-train decomposition SIAM Journal on Scientific Computing 2011 33 5 2295-2317

Digital Library

[51]

Peng, H., Schwartz, R., Thomson, S., & Smith, N. A. (2018). Rational recurrences (pp. 1203–1214).

[52]

Pollack, J. B. (1991). The induction of dynamical recognizers. In Connectionist approaches to language learning (pp. 123–148). Springer.

[53]

Quattoni, A., Carreras, X., & Gallé, M. (2017). A maximum matching algorithm for basis selection in spectral learning (pp. 1477–1485).

[54]

Rabusseau, G. (2016). A tensor perspective on weighted automata, low-rank regression and algebraic mixtures. Aix-Marseille Université. PhD thesis.

[55]

Rabusseau, G., Balle, B., & Pineau, J. (2017). Multitask spectral learning of weighted automata (pp. 2585–2594)

[56]

Rabusseau, G., Li, T., & Precup, D. (2019). Connecting weighted automata and recurrent neural networks through spectral learning (pp. 1630–1639).

[57]

Rauhut H, Schneider R, and Stojanac Ž Low rank tensor recovery via iterative hard thresholding Linear Algebra and its Applications 2017 523 220-262

[58]

Recasens, A., & Quattoni, A. (2013). Spectral learning of sequence taggers over continuous sequences (pp. 289–304).

[59]

Recht B, Fazel M, and Parrilo PA Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization SIAM Review 2010 52 3 471-501

Digital Library

[60]

Schollwöck U The density-matrix renormalization group in the age of matrix product states Annals of Physics 2011 326 1 96-192

[61]

Sedghi, H., & Anandkumar, A. (2016). Training input-output recurrent neural networks through spectral methods. arXiv preprint arXiv:160300954.

[62]

Siegelmann, H. T., & Sontag, E. D. (1992). On the computational power of neural nets. In Proceedings of COLT (pp. 440–449). ACM.

[63]

Socher. R., Chen, D., Manning, C. D., & Ng, A. (2013a). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926–934).

[64]

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013b). Recursive deep models for semantic compositionality over a sentiment treebank (pp. 1631–1642).

[65]

Stoudenmire, E., & Schwab, D. J. (2016). Supervised learning with tensor networks. In Advances in neural information processing systems (pp. 4799–4807)

[66]

Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks (pp. 1017–1024).

[67]

Thon M and Jaeger H Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework Journal of Machine Learning Research 2015 16 103-147

Digital Library

[68]

Tjandra, A., Sakti, S., & Nakamura, S. (2017). Compressing recurrent neural network with tensor train (pp. 4451–4458).

[69]

Wang, W., Aggarwal, V., & Aeron, S. (2017). Efficient low rank tensor ring completion (pp. 5697–5705).

[70]

Weiss, G., Goldberg, Y., & Yahav, E. (2018). Extracting automata from recurrent neural networks using queries and counterexamples (pp. 5244–5253).

[71]

Wu, Y., Zhang, S., Zhang, Y., Bengio, Y., & Salakhutdinov, R. R. (2016). On multiplicative integration with recurrent neural networks (pp. 2856–2864).

[72]

Yang, Y., Krompass, D., & Tresp, V. (2017). Tensor-train recurrent neural networks for video classification (pp. 3891–3900).

[73]

Yu, R., Zheng, S., Anandkumar, A., & Yue, Y. (2017). Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:171100073.

Recommendations

Learning minimal automata with recurrent neural networks
Abstract
In this article, we present a novel approach to learning finite automata with the help of recurrent neural networks. Our goal is not only to train a neural network that predicts the observable behavior of an automaton but also to learn its ...
Expressive power of first-order recurrent neural networks determined by their attractor dynamics

We characterize the attractor-based expressive power of several models of recurrent neural networks.The deterministic rational-weighted networks are Muller Turing equivalent.The deterministic real-weighted and evolving networks recognize the class of B ...
Extracting automata from recurrent neural networks using queries and counterexamples (extended version)
Abstract
We consider the problem of extracting a deterministic finite automaton (DFA) from a trained recurrent neural network (RNN). We present a novel algorithm that uses exact learning and abstract interpretation to perform efficient extraction of a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 113, Issue 5

May 2024

1040 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 30 March 2022

Accepted: 13 December 2021

Revision received: 13 September 2021

Received: 09 May 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents