Abstract
This paper explores the potential of leveraging Large Language Models (LLMs) for the tasks of automated annotation and Part-of-Math (POM) tagging of equations. Traditional methods for math term annotation and POM tagging rely heavily on manually crafted rules and limited datasets, which often result in scalability issues and insufficient adaptability to new domains. In contrast, LLMs, with their vast knowledge and advanced natural language understanding capabilities, present a promising alternative. Our methodology involves crafting prompts for LLMs to elicit answers that can be read as key-value pairs where the keys are math terms and the values are the corresponding annotations. We also investigate the effect on the performance of LLMs when we provide in the prompt different levels of context, such as the sentence or paragraph containing the input equation. The performance is evaluated by consistency between the ground truth and the output of LLMs. Consistency is assessed by a separate LLM session and with a different prompt. Our results show that when different levels of context are involved, the consistency rate of binary classification increased from 14.8% to 24.5%, and the favorable outcomes rate of multi-class classification increased from 47.1% to 77.5%. Finally, we conclude by discussing the implications of our findings for the future of mathematical knowledge management. We propose that LLMs could play a key role in automating the annotation and tagging of mathematical content, thereby enhancing the accessibility and utility of mathematical knowledge in digital libraries and beyond.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
OpenAI. Chatgpt: Optimizing language models for dialogue, 2022. https://openai.com/blog/chatgpt/
Wang, X., et al.: Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022)
Imani, S., Du, L., Shrivastava, H.: Mathprompter: mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398 (2023)
Youssef, A.: Part-of-math tagging and applications. In: Geuvers, H., England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) Intelligent Computer Mathematics. CICM 2017. LNCS, vol. 10383, pp. 356–374. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62075-6_25
Youssef, A., Miller, B.R.: A contextual and labeled math-dataset derived from NIST’s DLMF. In: Benzmüller, C., Miller, B. (eds.) Intelligent Computer Mathematics. CICM 2020. LNCS, vol. 12236, pp. 324–330. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53518-6_25
Olver, F.W.J., et al. (eds.): NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/, Release 1.2.0 of 2024-03-15
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27 (2014)
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. (CSUR) 41(2), 1–69 (2009)
He, X., Yiu, S.M.: Controllable dictionary example generation: generating example sentences for specific targeted audiences. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 610–627 (2022)
Shan, R., Youssef, A.: Towards math terms disambiguation using machine learning. In: Kamareddine, F., Sacerdoti Coen, C. (eds.) Intelligent Computer Mathematics. CICM 2021. LNCS, vol. 12833, pp. 90–106. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81097-9_7
Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Team, G., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Zhang, Z., Zhang, A., Li, M., Smola, A.: Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493 (2022)
Hou, Y., et al.: Large language models are zero-shot rankers for recommender systems. In: Goharian, N., et al. (eds.) Advances in Information Retrieval. ECIR 2024. LNCS, vol. 14609, pp. 364–381. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-56060-6_24
Song, F., et al.: Preference ranking optimization for human alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, pp. 18990–18998 (2024)
Guha, N., et al.: Legalbench: a collaboratively built benchmark for measuring legal reasoning in large language models. Adv. Neural Inf. Process. Syst. 36 (2024)
Meskó, B.: Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med. Internet Res. 25, e50638 (2023)
Giray, L.: Prompt engineering with ChatGPT: a guide for academic writers. Ann. Biomed. Eng. 51(12), 2629–2633 (2023)
Denny, P., Kumar, V., Giacaman, N.: Conversing with copilot: exploring prompt engineering for solving CS1 problems using natural language. In: Proceedings of the 54th ACM Technical Symposium on Computer Science Education, vol. 1, pp. 1136–1142 (2023)
Liu, P., Yuan, W., Jinlan, F., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
White, J., et al.: A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation (PDF). In: ACL-2002: 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, 25–26 July 2004
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shan, R., Youssef, A. (2024). Using Large Language Models to Automate Annotation and Part-of-Math Tagging of Math Equations. In: Kohlhase, A., Kovács, L. (eds) Intelligent Computer Mathematics. CICM 2024. Lecture Notes in Computer Science(), vol 14960. Springer, Cham. https://doi.org/10.1007/978-3-031-66997-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-66997-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-66996-5
Online ISBN: 978-3-031-66997-2
eBook Packages: Computer ScienceComputer Science (R0)