Abstract
In this paper we present a new asynchronous algorithm for learning additively regularized topic models and discuss the main architectural details of our implementation. The key property of the new algorithm is that it behaves in a fully deterministic fashion, which is typically hard to achieve in a non-blocking parallel implementation. The algorithm had been recently implemented in the BigARTM library (http://bigartm.org). Our new algorithm is compatible with all features previously introduced in BigARTM library, including multimodality, regularizers and scores calculation. While the existing BigARTM implementation compares favorably with alternative packages such as Vowpal Wabbit or Gensim, the new algorithm brings further improvements in CPU utilization, memory usage, and spends even less time to achieve the same perplexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Front. Comput. Sci. Chin 4(2), 280–301 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Yuan, J., Gao, F., Ho, Q., Dai, W., Wei, J., Zheng, X., Xing, E.P., Liu, T.Y., Ma, W.Y.: LightLDA: big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1351–1361 (2015)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: NIPS, pp. 856–864. Curran Associates Inc. (2010)
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. J. Mach. Learn. Res. 10, 1801–1828 (2009)
Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: parallel latent Dirichlet allocation for large-scale applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02158-9_26
Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2(3), 26:1–26:18 (2011)
Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech DNNs. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 235–239. IEEE (2014)
Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M.-A., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: NIPS, pp. 1223–1231 (2012)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, Valletta, Malta, May 2010
Langford, J., Li, L., Strehl, A.: Vowpal wabbit open source project. Technical report. Yahoo! (2007)
McCallum, A.K.: A Machine Learning for Language Toolkit (2002). http://mallet.cs.umass.edu
Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Proc. VLDB Endow. 3(1–2), 703–710 (2010)
Vorontsov, K.V.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014)
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. Spec. Issue Data Anal. Intell. Optim. 101(1), 303–323 (2015)
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26123-2_36
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-Bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 29–37. ACM, New York (2015)
Acknowledgements
The work was supported by Russian Science Foundation (grant 15-18-00091). Also we would like to thank Prof. K. V. Vorontsov for constant attention to our research and detailed feedback to this paper.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Frei, O., Apishev, M. (2017). Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-52920-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)