Abstract
Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The datasets are available for download at https://github.com/cedricdeboom/character-level-rnn-datasets.
github.com/torvalds/linux/tree/master/kernel.
References
Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576
Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555
Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. arXiv:1609.01704
Cooijmans T, Ballas N, Laurent C, Courville A (2016) Recurrent batch normalization. arXiv:1603.09025
De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520
Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. NIPS. arXiv:1512.05287
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, London
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space Odyssey. arXiv:1503.04069
Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2013) Deep autoregressive networks. arXiv:1310.8499
Gregor K, Danihelka I, Graves A, Wierstra D (2015) DRAW: a recurrent neural network for image generation. arXiv:1502.04623
Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press
Hutter M (2012) The human knowledge compression contest
Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078
Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv:1508.06615
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR. arXiv:1412.6980
Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959
Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330
Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv:1707.05589
Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. arXiv:1609.07843
Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models. arXiv:1708.02182
Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)
Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech
Mujika A, Meier F, Steger A (2017) Fast–slow recurrent neural networks. arXiv:1705.08639
Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1
Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899
Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech. arXiv:1604.01792
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723
Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis
Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499
Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630
Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329
Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway networks. arXiv:1607.03474
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578
Funding
The hardware used to perform the experiments in this paper was funded by Nvidia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Cedric De Boom is funded by a Ph.D. grant of the Research Foundation—Flanders (FWO). The other authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
De Boom, C., Demeester, T. & Dhoedt, B. Character-level recurrent neural networks in practice: comparing training and sampling schemes. Neural Comput & Applic 31, 4001–4017 (2019). https://doi.org/10.1007/s00521-017-3322-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3322-z