More Web Proxy on the site http://driver.im/

survey

Mathematical Information Retrieval: A Review

Authors:

Sivaji BandyopadhyayAuthors Info & Claims

ACM Computing Surveys, Volume 57, Issue 3

Article No.: 61, Pages 1 - 34

https://doi.org/10.1145/3699953

Published: 11 November 2024 Publication History

Abstract

Mathematical formulas are commonly used to demonstrate theories and basic fundamentals in the Science, Technology, Engineering, and Mathematics (STEM) domain. The burgeoning research in the STEM domain results in the mass production of scientific documents that contain both textual and mathematical terms. In scientific information, the definition of mathematical formulas is expressed through context and symbolic structure that adheres to strong domain-specific notions. Whereas the retrieval of textual information is well-researched, and numerous text-based search engines are present. However, textual information retrieval systems are inadequate for searching scientific information containing mathematical formulas, including simple symbols to complicated mathematical structures. The retrieval of mathematical information is in its infancy, and it requires the inclusion of new technologies and tools to promote the retrieval of scientific information and the management of digital libraries. This article provides a comprehensive study of mathematical information retrieval and highlights their challenges and future opportunities.

References

[1]

Muhammad Adeel, Hui Siu Cheung, and Sikandar Hayat Khiyal. 2008. Math go! Prototype of a content based mathematical formula search engine. J. Theor. Appl. Inf. Technol. 4, 10 (2008).

[2]

Akiko Aizawa and Michael Kohlhase. 2021. Mathematical information retrieval. In Evaluating Information Retrieval and Access Tasks. Springer, Singapore, 169–185.

[3]

Akiko Aizawa, Michael Kohlhase, and Iadh Ounis. 2013. NTCIR-10 math pilot task overview. In 10th NTCIR Conference on Evaluation of Information Access Technologies.

[4]

Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz. 2014. NTCIR-11 Math-2 task overview. In 11th NTCIR Conference on Evaluation of Information Access Technologies, Vol. 11. 88–98.

[5]

Tajvinder Singh Atwal, Mark Scanlon, and Nhien-An Le-Khac. 2019. Shining a light on Spotlight: Leveraging Apple’s desktop search utility to recover deleted file metadata on macOS. Digit. Investig. 28 (2019), S105–S115.

Digital Library

[6]

Robin Avenoso, Behrooz Mansouri, and Richard Zanibbi. 2021. XY-PHOC symbol location embeddings for math formula retrieval and autocompletion. Work. Notes CLEF (2021).

[7]

Yefim Bakman. 2007. Robust understanding of word problems with extraneous information. arXivpreprint math/0701393.

[8]

Pushpak Bhattacharyya. 2015. Machine Translation. CRC Press.

Digital Library

[9]

Dorothea Blostein, Edward Lank, and Richard Zanibbi. 2000. Treatment of diagrams in document image analysis. In International Conference on Theory and Application of Diagrams. Springer, 330–344.

Digital Library

[10]

Chris Buckley and Ellen M. Voorhees. 2004. Retrieval evaluation with incomplete information. In 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 25–32.

Digital Library

[11]

O. Caprotti and D. Carlisle. 1999. OpenMath and MathML: Semantic markup for mathematics. XRDS: Crossr., ACM Mag. Stud. 6, 2 (1999), 11–14.

Digital Library

[12]

David Carlisle. 2000. OpenMath, MathML, and XSL. ACM SIGSAM Bull. 34, 2 (2000), 6–11.

Digital Library

[13]

Daniel Cera, Yinfei Yanga, Sheng-yi Konga, Nan Huaa, Nicole Limtiacob, Rhomni St. Johna, Noah Constanta, Mario Guajardo Cespedes, Steve Yuanc, Chris Tara, Yun-Hsuan Sunga, Brian Stropea, and Ray Kurzweila. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.1117

[14]

Davide Cervone. 2012. MathJax: A platform for mathematics on the Web. Not. AMS 59, 2 (2012), 312–316.

[15]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2020. An analysis of variable-size vector based approach for formula searching. In Proceedings of the CLEF (Conference and Labs of the Evaluation Forum) Working Notes. 1–13.

[16]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. BERT-based embedding model for formula retrieval. In Proceedings of the CLEF (Conference and Labs of the Evaluation Forum) Working Notes. 36–46.

[17]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. Efficient assessment of formula representation in embedded vector. In International Conference on Computing and Communication Systems. Springer, 25–33.

[18]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. Embedding and generalization of formula with context in the retrieval of mathematical information. J. King Saud Univ.-Comput. Inf. Sci. 34, 9 (2021).

[19]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2021. Mathematical information retrieval trends and techniques. In Deep Natural Language Processing and AI Applications for Industry 5.0. IGI Global, 74–92.

[20]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2022. A fine-tuning retrieval system for mathematical information. In 7th International Conference on Mathematics and Computing, Vol. 1412. Springer.

[21]

Pankaj Dadure, Partha Pakray, and Sivaji Bandyopadhyay. 2022. MathUSE: Mathematical information retrieval system using universal sentence encoder model. J. Inf. Sci. 50, 1 (2022), 1–19.

[22]

Yifan Dai, Liangyu Chen, and Zihan Zhang. 2020. An N-ary tree-based model for similarity evaluation on mathematical formulae. In IEEE International Conference on Systems, Man, and Cybernetics (SMC’20). IEEE, 2578–2584.

Digital Library

[23]

Kenny Davila and Richard Zanibbi. 2017. Layout and semantics: Combining representations for mathematical formula search. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1165–1168.

Digital Library

[24]

Kenny Davila, Richard Zanibbi, Andrew Kane, and Frank Wm Tompa. 2016. Tangent-3 at the NTCIR-12 MathIR task. In 12th NTCIR Conference on Evaluation of Information Access Technologies.

[25]

Dario De Nart and Carlo Tasso. 2014. A personalized concept-driven recommender system for scientific libraries. Procedia Comput. Sci. 38 (2014), 84–91.

[26]

Thomas Deselaers, Tobias Weyand, Daniel Keysers, Wolfgang Macherey, and Hermann Ney. 2005. FIRE in ImageCLEF 2005: Combining content-based image retrieval with textual information retrieval. In Workshop of the Cross-language Evaluation Forum for European Languages. Springer, 652–661.

[27]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

[28]

Sourish Dhar and Sudipta Roy. 2019. Mathematical document retrieval system based on signature hashing. Aptikom J. Comput. Sci. Inf. Technol. 4, 1 (2019), 45–56.

[29]

Sourish Dhar, Sudipta Roy, and Sujit Kumar Das. 2019. A critical survey of mathematical search engines. In 2nd International Conference on Computational Intelligence, Communications, and Business Analytics (CICBA’18). Springer, 193–207.

[30]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

[31]

Matthew E. Falagas, Eleni I. Pitsouni, George A. Malietzis, and Georgios Pappas. 2008. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. FASEB J. 22, 2 (2008), 338–342.

[32]

Deborah Ferreira and Andre Freitas. 2020. Natural language premise selection: Finding supporting statements for mathematical text. arXiv preprint arXiv:2004.14959

[33]

Dallas Fraser, Andrew Kane, and Frank Wm Tompa. 2018. Choosing math features for BM25 ranking with Tangent-L. In ACM Symposium on Document Engineering. 1–10.

Digital Library

[34]

Liangcai Gao, Zhuoren Jiang, Yue Yin, Ke Yuan, Zuoyu Yan, and Zhi Tang. 2017. Preliminary exploration of formula embedding for mathematical information retrieval: Can mathematical formulae be embedded like a natural language? arXiv preprint arXiv:1707.05154

[35]

Liangcai Gao, Yuehan Wang, Leipeng Hao, and Zhi Tang. 2014. ICST math retrieval system for NTCIR-11 Math-2 task. In 11th NTCIR Conference on Evaluation of Information Access Technologies.

[36]

Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang, and Zhi Tang. 2016. The math retrieval system of ICST for NTCIR-12 MathIR task. In 12th NTCIR Conference on Evaluation of Information Access Technologies.

[37]

Yoav Goldberg. 2016. A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57 (2016), 345–420.

[38]

Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Manag. Inf. Syst. 6, 4 (2015), 1–19.

Digital Library

[39]

Ferruccio Guidi and Claudio Sacerdoti Coen. 2016. A survey on retrieval of mathematical knowledge. Math. Comput. Sci. 10, 4 (2016), 409–427.

[40]

Hiroya Hagino and Hiroaki Saito. 2013. Partial-match retrieval with structure-reflected Indices at the NTCIR-10 math task. In 10th NTCIR Conference on Evaluation of Information Access Technologies.

[41]

Radu Hambasan and Michael Kohlhase. 2016. Faceted search for mathematics. In 6th International Conference on Mathematical Aspects of Computer and Information Sciences (MACIS’15). Springer, 406–420.

Digital Library

[42]

Radu Hambasan, Michael Kohlhase, and Corneliu-Claudiu Prodescu. 2014. MathWebSearch at NTCIR-11. In 11th NTCIR Conference on Evaluation of Information Access Technologies, Vol. 11. 114–119.

[43]

Ken Hillis, Michael Petit, and Kylie Jarrett. 2012. Google and the Culture of Search. Routledge.

Digital Library

[44]

Xuan Hu, Liangcai Gao, Xiaoyan Lin, Zhi Tang, Xiaofan Lin, and Josef B. Baker. 2013. WikiMirs: A mathematical information retrieval system for Wikipedia. In 13th ACM/IEEE-CS Joint Conference on Digital Libraries. 11–20.

Digital Library

[45]

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. 2015. Deep unordered composition rivals syntactic methods for text classification. In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1681–1691. DOI:

[46]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759

[47]

Shinil Kim, Seon Yang, and Youngjoong Ko. 2012. Mathematical equation retrieval using plain words as a query. In 21st ACM International Conference on Information and Knowledge Management. 2407–2410.

Digital Library

[48]

Michael Kohlhase and Mihnea Iancu. 2012. Searching the space of mathematical knowledge. DML and MIR (2012).

[49]

Michael Kohlhase, Bogdan A. Matican, and Corneliu-Claudiu Prodescu. 2012. MathWebSearch 0.5: Scaling an open formula search engine. In International Conference on Intelligent Computer Mathematics. Springer, 342–357.

Digital Library

[50]

Michael Kohlhase, Corneliu Prodescu, and Christian Liguda. 2014. XLSearch: A search engine for spreadsheets. arXiv preprint arXiv:1401.7584

[51]

Michael Kohlhase and Corneliu-Claudiu Prodescu. 2013. MathWebSearch at NTCIR-10. In 10th NTCIR Conference on Evaluation of Information Access Technologies. Citeseer.

[52]

Giovanni Yoko Kristianto, Minh-Quoc Nghiem, Yuichiroh Matsubayashi, and Akiko Aizawa. 2012. Extracting definitions of mathematical expressions in scientific papers. In 26th Annual Conference of Japanese Society for Artificial Intelligence (JSAI’12). 1–7.

[53]

Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2013. The MCAT math retrieval system for NTCIR-10 math track. In 10th NTCIR Conference on Evaluation of Information Access Technologies. 680–685.

[54]

Giovanni Yoko Kristianto, Goran Topić, and Akiko Aizawa. 2014. Exploiting textual descriptions and dependency graph for searching mathematical expressions in scientific papers. In 9th International Conference on Digital Information Management (ICDIM’14). IEEE, 110–117.

[55]

Giovanni Yoko Kristianto, Goran Topic, and Akiko Aizawa. 2016. MCAT math retrieval system for NTCIR-12 MathIR task. In 12th NTCIR Conference on Evaluation of Information Access Technologies. 120–126.

[56]

Giovanni Yoko Kristianto, Goran Topic, Florence Ho, and Akiko Aizawa. 2014. The MCAT math retrieval system for NTCIR-11 math track. In 11th NTCIR Conference on Evaluation of Information Access Technologies. 120–126.

[57]

Kriste Krstovski and David M. Blei. 2018. Equation embeddings. arXiv preprint arXiv:1803.09123

[58]

P. Pavan Kumar, Arun Agarwal, and Chakravarthy Bhagvati. 2012. A structure based approach for mathematical expression retrieval. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence. Springer, 23–34.

[59]

Siwei Lai, Kang Liu, Shizhu He, and Jun Zhao. 2016. How to generate a good word embedding. IEEE Intell. Syst. 31, 6 (2016), 5–14.

Digital Library

[60]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360

[61]

Ray R. Larson, Chloe Reynolds, and Fredric C. Gey. 2013. The abject failure of keyword IR for mathematics search: Berkeley at NTCIR-10 math. In 10th NTCIR Conference on Evaluation of Information Access Technologies.

[62]

Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. 2022. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 470 (2022), 443–456.

Digital Library

[63]

Xiaoyan Lin, Liangcai Gao, Xuan Hu, Zhi Tang, Yingnan Xiao, and Xiaozhong Liu. 2014. A mathematics retrieval system for formulae in layout presentations. In 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 697–706.

Digital Library

[64]

Aldo Lipani, Linda Andersson, Florina Piroi, Mihai Lupu, and Allan Hanbury. 2014. TUW-IMP at the NTCIR-11 Math-2. In 11th NTCIR Conference on Evaluation of Information Access Technologies.

[65]

M. Liska, P. Sojka, and M. Ruzicka. 2013. Mirmu at the NTCIR-10 Math task: Similarity search for mathematics. In 10th NTCIR Conference on Evaluation of Information Access Technologies.

[66]

Martin Liska, Petr Sojka, and Michal Ruzicka. 2015. Combining text and formula queries in math information retrieval: Evaluation of query results merging strategies. In 1st International Workshop on Novel Web Search Interfaces and Systems. 7–9.

[67]

Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015. Recommender system application developments: A survey. Decis. Supp. Syst. 74 (2015), 12–32.

Digital Library

[68]

Antonio Mallia, Michał Siedlaczek, and Torsten Suel. 2019. An experimental study of index compression and DAAT query processing methods. In European Conference on Information Retrieval. Springer, 353–368.

Digital Library

[69]

Behrooz Mansouri, Douglas W. Oard, and Richard Zanibbi. 2020. DPRL systems in the CLEF 2020 ARQMath lab. In Working Notes of CLEF 2020-Conference and Labs of the Evaluation Forum.

[70]

Behrooz Mansouri, Douglas W. Oard, and Richard Zanibbi. 2021. DPRL systems in the CLEF 2021 ARQMath lab: Sentence-BERT for answer retrieval, learning-to-rank for formula retrieval. In Proceedings of the CLEF (Conference and Labs of the Evaluation Forum) Working Notes, (2021).

[71]

Behrooz Mansouri, Shaurya Rohatgi, Douglas W. Oard, Jian Wu, C. Lee Giles, and Richard Zanibbi. 2019. Tangent-CFT: An embedding model for mathematical formulas. In ACM SIGIR International Conference on Theory of Information Retrieval. 11–18.

Digital Library

[72]

Behrooz Mansouri, Richard Zanibbi, and Douglas W. Oard. 2021. Learning to rank for mathematical formula retrieval. In 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 952–961.

Digital Library

[73]

Behrooz Mansouri, Richard Zanibbi, Douglas W. Oard, and Anurag Agarwal. 2021. Overview of ARQMath-2 (2021): Second CLEF lab on answer retrieval for questions on math. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 215–238.

Digital Library

[74]

Kim Marriott, Bernd Meyer, and Kent B. Wittenburg. 1998. A survey of visual language specification and recognition. In Visual Language Theory. Springer, 5–85.

Digital Library

[75]

Millecamp Martijn, Cristina Conati, and Katrien Verbert. 2022. “Knowing me, knowing you”: Personalized explanations for a music recommender system. User Model. User-Adapt. Interact. 32, 1 (2022), 1–38.

[76]

David Matthews. 2019. Craft beautiful equations in Word with LaTeX. Nature 570, 7760 (2019), 263–265.

[77]

Jordan Meadows and André Freitas. 2023. Introduction to mathematical language processing: Informal proofs, word problems, and supporting tasks. Trans. Assoc. Comput. Ling. 11 (2023), 1162–1184.

[78]

Norman Meuschke, Vincent Stange, Moritz Schubotz, and Bela Gipp. 2018. HyPlag: A hybrid approach to academic plagiarism detection. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1321–1324.

Digital Library

[79]

Bruce R. Miller and Abdou Youssef. 2003. Technical aspects of the digital library of mathematical functions. Ann. Math. Artif. Intell. 38, 1 (2003), 121–136.

Digital Library

[80]

Robert Miner and Rajesh Munavalli. 2007. An approach to mathematical search through query formulation and data normalization. In Towards Mechanized Mathematical Assistants. Springer, 342–355.

Digital Library

[81]

Ramesh R. Naik, Maheshkumar B. Landge, and C. Namrata Mahender. 2015. A review on plagiarism detection tools. Int. J. Comput. Applic. 125, 11 (2015).

[82]

Tam T. Nguyen, Kuiyu Chang, and Siu Cheung Hui. 2012. A math-aware search engine for math question answering system. In 21st ACM International Conference on Information and Knowledge Management. 724–733.

Digital Library

[83]

Tam T. Nguyen, Siu Cheung Hui, and Kuiyu Chang. 2012. A lattice-based approach for mathematical search using formal concept analysis. Expert Syst. Applic. 39, 5 (2012), 5820–5828.

Digital Library

[84]

Gavin Nishizawa, Jennifer Liu, Yancarlos Diaz, Abishai Dmello, Wei Zhong, and Richard Zanibbi. 2020. MathSeer: A math-aware search interface with intuitive formula editing, reuse, and lookup. In European Conference on Information Retrieval. Springer, 470–475.

Digital Library

[85]

Vít Novotnỳ, Petr Sojka, Michal Štefánik, and Dávid Lupták. 2020. Three is better than one: Ensembling math information retrieval systems. In Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum.

[86]

Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Maarten de Rijke, and Matthew Lease. 2018. Neural information retrieval: At the end of the early years. Inf. Retr. J. 21, 2 (2018), 111–182.

Digital Library

[87]

Irina Pak and Phoey Lee Teh. 2018. Text segmentation techniques: A critical review. Innov. Comput., Optimiz. Applic. (2018), 167–181.

[88]

Amarnath Pathak, Partha Pakray, and Ranjita Das. 2019. LSTM neural network based math information retrieval. In 2nd International Conference on Advanced Computational and Communication Paradigms (ICACCP’19). IEEE, 1–6.

[89]

Amarnath Pathak, Partha Pakray, and Alexander Gelbukh. 2018. A formula embedding approach to math information retrieval. Computación Sistemas 22, 3 (2018), 819–833.

[90]

Amarnath Pathak, Partha Pakray, and Alexander Gelbukh. 2019. Binary vector transformation of math formula for mathematical information retrieval. J. Intell. Fuzzy Syst. 36, 5 (2019), 4685–4695.

[91]

Nidhin Pattaniyil and Richard Zanibbi. 2014. Combining TF-IDF text retrieval with an inverted index over symbol pairs in math expressions: The tangent math search engine at NTCIR 2014. In 11th NTCIR Conference on Evaluation of Information Access Technologies.

[92]

Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A pre-trained model for mathematical formula understanding. arXiv preprint arXiv:2105.00377

[93]

José María González Pinto, Simon Barthel, and Wolf-Tilo Balke. 2014. QUALIBETA at the NTCIR-11 Math 2 task: An attempt to query math collections. In 11th NTCIR Conference on Evaluation of Information Access Technologies.

[94]

Faisal Rahutomo, Teruaki Kitasuka, and Masayoshi Aritsugi. 2012. Semantic cosine similarity. In 7th International Student Conference on Advanced Science and Technology (ICAST’12), Vol. 4. 1–2.

[95]

Marek Rei, Gamal K. O. Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. arXiv preprint arXiv:1611.04361

[96]

Anja Reusch, Maik Thiele, and Wolfgang Lehner. 2021. TU_DBS in the ARQMath Lab 2021. In Proceedings of the CLEF (Conference and Labs of the Evaluation Forum) Working Notes, (2021).

[97]

Rosa M. Rodríguez, Luis Martínez, Vicenç Torra, Z. S. Xu, and Francisco Herrera. 2014. Hesitant fuzzy sets: State of the art and future directions. Int. J. Intell. Syst. 29, 6 (2014), 495–524.

Digital Library

[98]

Subhro Roy and Dan Roth. 2018. Mapping to declarative knowledge for word problem solving. Trans. Assoc. Comput. Ling. 6 (2018), 159–172.

[99]

Maja R. Rudolph, Francisco J. R. Ruiz, Stephan Mandt, and David M. Blei. 2016. Exponential family embeddings. arXiv preprint arXiv:1608.00778

[100]

Michal Ruzicka, Petr Sojka, and Martin Liska. 2014. Math indexer and searcher under the hood: History and development of a winning strategy. In 11th NTCIR Conference on Evaluation of Information Access Technologies. 127–134.

[101]

M. Ruzicka, P. Sojka, and M. Liska. 2016. Math indexer and searcher under the hood: Fine-tuning query expansion and unification strategies. In 12th NTCIR Conference on Evaluation of Information Access Technologies. Noriko Kando, Tetsuya Sakai, and Mark Sanderson,(Eds.). 331–337.

[102]

Tetsuya Sakai and Noriko Kando. 2008. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11, 5 (2008), 447–470. DOI:

Digital Library

[103]

Gerard Salton and Donna Harman. 2003. Information retrieval. In Encyclopedia of Computer Science. Morgan & Claypool Publishers, 858–863.

Digital Library

[104]

D. Saravanan. 2017. Effective video data retrieval using image key frame selection. In 1st International Conference on Computational Intelligence and Informatics. Springer, 145–155.

[105]

Philipp Scharpf, Ian Mackerracher, Moritz Schubotz, Joeran Beel, Corinna Breitinger, and Bela Gipp. 2019. AnnoMathTeX—A formula identifier annotation recommender system for STEM documents. In 13th ACM Conference on Recommender Systems. 532–533.

Digital Library

[106]

Moritz Schubotz, Alexey Grigorev, Marcus Leich, Howard S. Cohl, Norman Meuschke, Bela Gipp, Abdou S. Youssef, and Volker Markl. 2016. Semantification of identifiers in mathematics for better math information retrieval. In 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 135–144.

Digital Library

[107]

Moritz Schubotz, Marcus Leich, and Volker Markl. 2013. Querying large collections of mathematical publications: NTCIR10 Math task. In 10th NTCIR Conference on Evaluation of Information Access Technologies.

[108]

Moritz Schubotz, Norman Meuschke, Marcus Leich, and Bela Gipp. 2016. Exploring the one-brain barrier: A manual contribution to the NTCIR-12 MathIR task. In 12th NTCIR Conference on Evaluation of Information Access Technologies. 309–317.

[109]

Moritz Schubotz, Abdou Youssef, Volker Markl, and Howard S. Cohl. 2015. Challenges of mathematical information retrieval in the NTCIR-11 Math Wikipedia task. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 951–954.

[110]

Moritz Schubotz, Abdou Youssef, Volker Markl, Howard S. Cohl, and Jimmy J. Li. 2014. Evaluation of similarity-measure factors for formulae based on the NTCIR-11 Math task. In 11th NTCIR Conference on Evaluation of Information Access Technologies.

[111]

Dongdong Shan, Shuai Ding, Jing He, Hongfei Yan, and Xiaoming Li. 2012. Optimized top-k processing with global page scores on block-max indexes. In 5th ACM International Conference on Web Search and Data Mining. 423–432.

Digital Library

[112]

Yuqi Shen, Cheng Chen, Yifan Dai, Jinfang Cai, and Liangyu Chen. 2021. A hybrid model combining formulae with keywords for mathematical information retrieval. Int. J. Softw. Eng. Knowl. Eng. 31, 11n12 (2021), 1583–1602.

[113]

Marco Antonio Calijorne Soares and Fernando Silva Parreiras. 2020. A literature review on question answering techniques, paradigms and systems. J. King Saud Univ.-Comput. Inf. Sci. 32, 6 (2020), 635–646.

[114]

Petr Sojka and Martin Liska. 2011. The art of mathematics retrieval. In 11th ACM Symposium on Document Engineering. 57–60.

Digital Library

[115]

Petr Sojka, Michal Ruzicka, and Vit Novotny. 2018. MIaS: Math-aware retrieval in digital mathematical libraries. In 27th ACM International Conference on Information and Knowledge Management. 1923–1926.

Digital Library

[116]

David Stalnaker and Richard Zanibbi. 2015. Math expression retrieval using an inverted index over symbol pairs. In Document Recognition and Retrieval XXII, Vol. 9402. SPIE, 34–45.

[117]

Peter Stanchev, Jiří Rákosník, Radoslav Pavlov, and Georgi Simeonov. 2015. Presenting and searching mathematics in digital repositories. Digit. Present. Preserv. Cultur. Scient. Herit. 5 (2015), 65–71.

[118]

Yiannos Stathopoulos, Simon Baker, Marek Rei, and Simone Teufel. 2018. Variable typing: Assigning meaning to variables in mathematical text. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 303–312.

[119]

Yiannos Stathopoulos and Simone Teufel. 2015. Retrieval of research-level mathematical information needs: A test collection and technical terminology experiment. In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 334–340.

[120]

Pothula Sujatha and P. Dhavachelvan. 2011. Precision at K in multilingual information retrieval. Int. J. Comput. Applic. 24 (2011), 40–43.

[121]

Abhinav Thanda, Ankit Agarwal, Kushal Singla, Aditya Prakash, and Abhishek Gupta. 2016. A document retrieval system for math queries. In NTCIR Conference on Evaluation of Information Access Technologies (NTCIR’16).

[122]

Xuedong Tian and Jiameng Wang. 2021. Retrieval of scientific documents based on HFS and BERT. IEEE Access 9 (2021), 8708–8717.

[123]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Conference on Advances in Neural Information Processing Systems. 5998–6008. Retrieved from https://dl.acm.org/doi/abs/10.5555/3295222.3295349

[124]

Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, and Ke Yuan. 2015. WikiMirs 3.0: A hybrid MIR system based on the context, structure and importance of formulae in a document. In 15th ACM/IEEE-CS Joint Conference on Digital Libraries. 173–182.

Digital Library

[125]

Keita Del Valle Wangari, Richard Zanibbi, and Anurag Agarwal. 2014. Discovering real-world use cases for a multimodal math search interface. In 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 947–950.

Digital Library

[126]

Meichao Yan, Yu Wen, Qingxuan Shi, and Xuedong Tian. 2022. A multimodal retrieval and ranking method for scientific documents based on HFS and XLNet. Scient. Program. 2022, 1 (2022).

[127]

Seon Yang and Youngjoong Ko. 2014. Mathematical formula search using natural language queries. Advan. Electric. Comput. Eng. 14, 4 (2014), 99–104.

[128]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Advan. Neural Inf. Process. Syst. 32 (2019).

[129]

Ke Yuan, Liangcai Gao, Zhuoren Jiang, and Zhi Tang. 2021. Formula citation graph based mathematical information retrieval. In International Conference on Document Analysis and Recognition. Springer, 631–647.

Digital Library

[130]

Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, and Zhi Tang. 2016. A mathematical information retrieval system based on RankBoost. In 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. 259–260.

Digital Library

[131]

Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila. 2016. NTCIR-12 MathIR task overview. In 12th NTCIR Conference on Evaluation of Information Access Technologies.

[132]

Richard Zanibbi and Dorothea Blostein. 2012. Recognition and retrieval of mathematical expressions. Int. J. Docum. Anal. Recog. 15, 4 (2012), 331–357.

Digital Library

[133]

Richard Zanibbi, Kenny Davila, Andrew Kane, and Frank Wm Tompa. 2016. Multi-stage math formula search: Using appearance-based similarity metrics at scale. In 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 145–154.

Digital Library

[134]

Richard Zanibbi, Douglas W. Oard, Anurag Agarwal, and Behrooz Mansouri. 2020. Overview of ARQMath 2020: CLEF lab on answer retrieval for questions on math. In International Conference of the Cross-language Evaluation Forum for European Languages. Springer, 169–193.

Digital Library

[135]

Qun Zhang and Abdou Youssef. 2014. An approach to math-similarity search. In International Conference on Intelligent Computer Mathematics. Springer, 404–418.

[136]

Wei Zhong, Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. 2023. One blade for one purpose: Advancing math information retrieval using hybrid search. In 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 141–151.

Digital Library

[137]

Wei Zhong, Shaurya Rohatgi, Jian Wu, C. Lee Giles, and Richard Zanibbi. 2020. Accelerating substructure similarity search for formula retrieval. In European Conference on Information Retrieval. Springer, 714–727.

Digital Library

[138]

Wei Zhong and Richard Zanibbi. 2019. Structural similarity search for formulas using leaf-root paths in operator subtrees. In European Conference on Information Retrieval. Springer, 116–129.

Digital Library

[139]

Wei Zhong, Xinyu Zhang, Ji Xin, Jimmy Lin, and Richard Zanibbi. 2021. Approach zero and anserini at the CLEF-2021 ARQMath track: Applying substructure search and BM25 on operator tree path tokens. In Proceedings of the CLEF (Conference and Labs of the Evaluation Forum) Working Notes, (2021).

Index Terms

Mathematical Information Retrieval: A Review

Recommendations

Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Information retrieval~(IR), bibliometric and natural language processing (NLP) techniques could enhance scholarly search, retrieval and user ...
Bibliometric-Enhanced Information Retrieval: 14th International BIR Workshop (BIR 2024)
Advances in Information Retrieval
Abstract
The series takes place at ECIR 2024 as a full-day workshop. BIR addresses research topics related to academic search and recommendation, at the intersection of ...
Bibliometric-Enhanced Information Retrieval: 13th International BIR Workshop (BIR 2023)
Advances in Information Retrieval
Abstract
The of the Bibliometric-enhanced Information Retrieval (BIR) workshop series will take place at ECIR 2023 as a full-day workshop. BIR tackles issues related to, for instance, academic search and recommendation, at the intersection ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 57, Issue 3

March 2025

984 pages

EISSN:1557-7341

DOI:10.1145/3697147

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2024

Online AM: 09 October 2024

Accepted: 04 October 2024

Revised: 03 July 2024

Received: 04 March 2023

Published in CSUR Volume 57, Issue 3

Check for updates

Author Tags

Qualifiers

Survey

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
280
Total Downloads

Downloads (Last 12 months)280
Downloads (Last 6 weeks)175

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents