Evaluating and Improving the Extraction of Mathematical Identifier Definitions

Moritz Schubotz²¹,
Leonard Krämer²¹,
Norman Meuschke²¹,
Felix Hamborg²¹ &
…
Bela Gipp²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1345 Accesses

Abstract

Mathematical formulae in academic texts significantly contribute to the overall semantic content of such texts, especially in the fields of Science, Technology, Engineering and Mathematics. Knowing the definitions of the identifiers in mathematical formulae is essential to understand the semantics of the formulae. Similar to the sense-making process of human readers, mathematical information retrieval systems can analyze the text that surrounds formulae to extract the definitions of identifiers occurring in the formulae. Several approaches for extracting the definitions of mathematical identifiers from documents have been proposed in recent years. So far, these approaches have been evaluated using different collections and gold standard datasets, which prevented comparative performance assessments. To facilitate future research on the task of identifier definition extraction, we make three contributions. First, we provide an automated evaluation framework, which uses the dataset and gold standard of the NTCIR-11 Math Retrieval Wikipedia task. Second, we compare existing identifier extraction approaches using the developed evaluation framework. Third, we present a new identifier extraction approach that uses machine learning to combine the well-performing features of previous approaches. The new approach increases the precision of extracting identifier definitions from 17.85% to 48.60%, and increases the recall from 22.58% to 28.06%. The evaluation framework, the dataset and our source code are openly available at: https://ident.formulasearchengine.com.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Evaluation of NLP Methods to Extract Mathematical Token Descriptors

Evaluation and Domain Adaptation of Similarity Models for Short Mathematical Texts

SsciBERT: a pre-trained language model for social science texts

Article 17 December 2022

References

Akbik, A., Guan, X., Li, Y.: Multilingual aliasing for auto-generating proposition banks. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) 6th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers (COLING 2016), December 11–16, 2016, Osaka, Japan, pp. 3466–3474. ACL (2016)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology (TIST 2011) vol. 2, no. 3, p. 27 (2011)
Google Scholar
Corneli, J., Schubotz, M.: math.wikipedia.org: a vision for a collaborative semiformal, language independent math(s) encyclopedia. In: Conference on Artificial Intelligence and Theorem Proving (AITP 2017) (2017)
Google Scholar
Hamborg, F., Meuschke, N., Gipp, B.: Matrix-based news aggregation: exploring different news perspectives. In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017)
Google Scholar
Hamborg, F., et al.: Identification and analysis of media bias in news articles. In: Gaede, M., Trkulja, V., Petra, V. (eds.) Proceedings of the 15th International Symposium of Information Science, Berlin, pp. 224–236, March 2017
Google Scholar
Henriksson, A., et al.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomed. Semant. 5(1), 6 (2014)
Article Google Scholar
Kristianto, G.Y., Topic, G., Aizawa, A.: Extracting textual descriptions of mathematical expressions in scientific papers. In: D-Lib Magazine (D-Lib 2014), vol. 20, no. 11, p. 9 (2014)
Google Scholar
Manning, C.D., et al.: The stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations (ACL 2014), pp. 55–60 (2014)
Google Scholar
Pagel, R., Schubotz, M.: Mathematical language processing project. In: England, M., et al. (eds.) Joint Proceedings of the MathUI, OpenMath and ThEdu Workshops and Work in Progress Track at CICM Co-located with Conferences on Intelligent Computer Mathematics (CICM 2014), Coimbra, Portugal, July 7–11, 2014, vol. 1186. CEUR Workshop Proceedings. CEUR-WS.org (2014)
Google Scholar
Schubotz, M.: Augmenting Mathematical Formulae for More Effective Querying & Efficient Presentation. Epubli Verlag, Berlin (2017). ISBN: 9783745062083
Google Scholar
Schubotz, M., Veenhuis, D., Cohl, H.S.: Getting the units right. In: Kohlhase, A., et al. (ed.) Joint Proceedings of the FM4M, MathUI, and ThEdu Workshops, Doctoral Program, and Work in Progress at the Conference on Intelligent Computer Mathematics 2016 Co-located with the 9th Conference on Intelligent Computer Mathematics (CICM 2016), Bialystok, Poland, July 25–29, 2016, Vol. 1785. CEUR Workshop Proceedings. CEUR-WS.org (2016)
Google Scholar
Schubotz, M., et al.: Challenges of mathematical information retrieval in the NTCIR-11 math Wikipedia task. In: Baeza-Yates, R.A., et al. (eds.) Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 951–954. ACM, Santiago (2015). ISBN: 978-1-4503-3621-5
Google Scholar
Schubotz, M., et al.: Semantification of identifiers in mathematics for better math information retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pp. 135–144. ACM, Pisa (2016). ISBN: 978-1-4503-4069-4
Google Scholar

Download references

Author information

Authors and Affiliations

University of Konstanz, Konstanz, Germany
Moritz Schubotz, Leonard Krämer, Norman Meuschke, Felix Hamborg & Bela Gipp

Authors

Moritz Schubotz
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Krämer
View author publications
You can also search for this author in PubMed Google Scholar
Norman Meuschke
View author publications
You can also search for this author in PubMed Google Scholar
Felix Hamborg
View author publications
You can also search for this author in PubMed Google Scholar
Bela Gipp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moritz Schubotz .

Editor information

Editors and Affiliations

Dublin City University, Dublin, Ireland
Gareth J.F. Jones
Trinity College Dublin, Dublin, Ireland
Séamus Lawless
National University of Distance Education, Madrid, Spain
Julio Gonzalo
Dublin City University, Dublin, Ireland
Liadh Kelly
Université Grenoble Alpes, Grenoble, France
Lorraine Goeuriot
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schubotz, M., Krämer, L., Meuschke, N., Hamborg, F., Gipp, B. (2017). Evaluating and Improving the Extraction of Mathematical Identifier Definitions. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-65813-1_7
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65812-4
Online ISBN: 978-3-319-65813-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating and Improving the Extraction of Mathematical Identifier Definitions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Evaluation of NLP Methods to Extract Mathematical Token Descriptors

Evaluation and Domain Adaptation of Similarity Models for Short Mathematical Texts

SsciBERT: a pre-trained language model for social science texts

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evaluating and Improving the Extraction of Mathematical Identifier Definitions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Evaluation of NLP Methods to Extract Mathematical Token Descriptors

Evaluation and Domain Adaptation of Similarity Models for Short Mathematical Texts

SsciBERT: a pre-trained language model for social science texts

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation