Multi-objective Topic Modeling for Exploratory Search in Tech News

Anastasia Ianina¹²,
Lev Golitsyn¹³ &
Konstantin Vorontsov¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

1484 Accesses
4 Citations

An erratum to this publication is available online at https://doi.org/10.1007/978-3-319-71746-3_24

Abstract

Exploratory search is a paradigm of information retrieval, in which the user’s intention is to learn the subject domain better. To do this the user repeats “query–browse–refine” interactions with the search engine many times. We consider typical exploratory search tasks formulated by long text queries. People usually solve such a task in about half an hour and find dozens of documents using conventional search facilities iteratively. The goal of this paper is to reduce the time-consuming multi-step process to one step without impairing the quality of the search. Probabilistic topic modeling is a suitable text mining technique to retrieve documents, which are semantically relevant to a long text query. We use the additive regularization of topic models (ARTM) to build a model that meets multiple objectives. The model should have sparse, diverse and interpretable topics. Also, it should incorporate meta-data and multimodal data such as n-grams, authors, tags and categories. Balancing the regularization criteria is an important issue for ARTM. We tackle this problem with coordinate-wise optimization technique, which chooses the regularization trajectory automatically. We use the parallel online implementation of ARTM from the open source library BigARTM. Our evaluation technique is based on crowdsourcing and includes two tasks for assessors: the manual exploratory search and the explicit relevance feedback. Experiments on two popular tech news media show that our topic-based exploratory search outperforms assessors as well as simple baselines, achieving precision and recall of about 85–92%.

The original version of this chapter has been revised: The Acknowledgements section has been corrected. The erratum to this chapter is available at https://doi.org/10.1007/978-3-319-71746-3_24

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

Article Open access 25 May 2024

Topical Pattern Based Document Modelling and Relevance Ranking

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

References

Andrzejewski, D., Buttler, D.: Latent topic feedback for information retrieval. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2011, pp. 600–608 (2011)
Google Scholar
Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Additive regularization for topic modeling in sociological studies of user-generated texts. In: Sidorov, G., Herrera-Alcántara, O. (eds.) MICAI 2016. LNCS (LNAI), vol. 10061, pp. 169–184. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62434-1_14
Chapter Google Scholar
Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search (ACM Press Books), vol. 2. Addison-Wesley Professional, Harlow (2011)
Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Frei, O., Apishev, M.: Parallel non-blocking deterministic algorithm for online topic modeling. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 132–144. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_13
Chapter Google Scholar
Grant, C.E., George, C.P., Kanjilal, V., Nirkhiwale, S., Wilson, J.N., Wang, D.Z.: A topic-based search, visualization, and exploration system. In: FLAIRS Conference, pp. 43–48. AAAI Press, Massachusetts (2015)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM, New York (1999)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book Google Scholar
Marchionini, G.: Exploratory search: from finding to understanding. Commun. ACM 49(4), 41–46 (2006)
Article Google Scholar
Rönnqvist, S.: Exploratory topic modeling with distributional semantics. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 241–252. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_21
Chapter Google Scholar
Scherer, M., von Landesberger, T., Schreck, T.: Topic modeling for search and exploration in multivariate research data repositories. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 370–373. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40501-3_39
Chapter Google Scholar
Tan, Y., Ou, Z.: Topic-weak-correlated latent dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)
Google Scholar
Veas, E.E., di Sciascio, C.: Interactive topic analysis with visual analytics and recommender systems. In: 2nd Workshop on Cognitive Computing and Applications for Augmented Human Intelligence, CCAAHI 2015, International Joint Conference on Artificial Intelligence, IJCAI, Buenos Aires, Argentina, July 2015. CEUR-WS.org, Aachen (2015)
Google Scholar
Vorontsov, K., Potapenko, A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12580-0_3
Chapter Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special issue on data analysis and intelligent optimization with applications
Article MathSciNet Google Scholar
Vorontsov, K., Potapenko, A., Plavin, A.: Additive regularization of topic models for topic selection and sparse factorization. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds.) SLDS 2015. LNCS (LNAI), vol. 9047, pp. 193–202. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17091-6_14
Chapter Google Scholar
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 29–37. ACM, New York (2015)
Google Scholar
Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2006, pp. 178–185. ACM, New York (2006)
Google Scholar
White, R.W., Roth, R.A.: Exploratory Search: Beyond the Query-Response Paradigm. Synthesis Lectures on Information Concepts Retrieval, and Services. Morgan and Claypool Publishers, San Rafael (2009)
Article Google Scholar
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_6
Chapter Google Scholar

Download references

Acknowledgements

The work was supported by the Ministry of Education and Science of the Russian Federation (project RFMEFI57915X0117).

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Anastasia Ianina & Konstantin Vorontsov
Integrated Systems, Moscow, Russia
Lev Golitsyn

Authors

Anastasia Ianina
View author publications
You can also search for this author in PubMed Google Scholar
Lev Golitsyn
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasia Ianina .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
University of Helsinki, Helsinki, Finland
Lidia Pivovarova
Mendel University , Brno, Czech Republic
Jan Žižka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ianina, A., Golitsyn, L., Vorontsov, K. (2018). Multi-objective Topic Modeling for Exploratory Search in Tech News. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-71746-3_16
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-objective Topic Modeling for Exploratory Search in Tech News

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

Topical Pattern Based Document Modelling and Relevance Ranking

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-objective Topic Modeling for Exploratory Search in Tech News

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

Topical Pattern Based Document Modelling and Relevance Ranking

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation