[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Evaluating and Improving the Extraction of Mathematical Identifier Definitions

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

  • 1345 Accesses

Abstract

Mathematical formulae in academic texts significantly contribute to the overall semantic content of such texts, especially in the fields of Science, Technology, Engineering and Mathematics. Knowing the definitions of the identifiers in mathematical formulae is essential to understand the semantics of the formulae. Similar to the sense-making process of human readers, mathematical information retrieval systems can analyze the text that surrounds formulae to extract the definitions of identifiers occurring in the formulae. Several approaches for extracting the definitions of mathematical identifiers from documents have been proposed in recent years. So far, these approaches have been evaluated using different collections and gold standard datasets, which prevented comparative performance assessments. To facilitate future research on the task of identifier definition extraction, we make three contributions. First, we provide an automated evaluation framework, which uses the dataset and gold standard of the NTCIR-11 Math Retrieval Wikipedia task. Second, we compare existing identifier extraction approaches using the developed evaluation framework. Third, we present a new identifier extraction approach that uses machine learning to combine the well-performing features of previous approaches. The new approach increases the precision of extracting identifier definitions from 17.85% to 48.60%, and increases the recall from 22.58% to 28.06%. The evaluation framework, the dataset and our source code are openly available at: https://ident.formulasearchengine.com.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akbik, A., Guan, X., Li, Y.: Multilingual aliasing for auto-generating proposition banks. In: Calzolari, N., Matsumoto, Y., Prasad, R. (eds.) 6th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers (COLING 2016), December 11–16, 2016, Osaka, Japan, pp. 3466–3474. ACL (2016)

    Google Scholar 

  2. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology (TIST 2011) vol. 2, no. 3, p. 27 (2011)

    Google Scholar 

  3. Corneli, J., Schubotz, M.: math.wikipedia.org: a vision for a collaborative semiformal, language independent math(s) encyclopedia. In: Conference on Artificial Intelligence and Theorem Proving (AITP 2017) (2017)

    Google Scholar 

  4. Hamborg, F., Meuschke, N., Gipp, B.: Matrix-based news aggregation: exploring different news perspectives. In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017)

    Google Scholar 

  5. Hamborg, F., et al.: Identification and analysis of media bias in news articles. In: Gaede, M., Trkulja, V., Petra, V. (eds.) Proceedings of the 15th International Symposium of Information Science, Berlin, pp. 224–236, March 2017

    Google Scholar 

  6. Henriksson, A., et al.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomed. Semant. 5(1), 6 (2014)

    Article  Google Scholar 

  7. Kristianto, G.Y., Topic, G., Aizawa, A.: Extracting textual descriptions of mathematical expressions in scientific papers. In: D-Lib Magazine (D-Lib 2014), vol. 20, no. 11, p. 9 (2014)

    Google Scholar 

  8. Manning, C.D., et al.: The stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations (ACL 2014), pp. 55–60 (2014)

    Google Scholar 

  9. Pagel, R., Schubotz, M.: Mathematical language processing project. In: England, M., et al. (eds.) Joint Proceedings of the MathUI, OpenMath and ThEdu Workshops and Work in Progress Track at CICM Co-located with Conferences on Intelligent Computer Mathematics (CICM 2014), Coimbra, Portugal, July 7–11, 2014, vol. 1186. CEUR Workshop Proceedings. CEUR-WS.org (2014)

    Google Scholar 

  10. Schubotz, M.: Augmenting Mathematical Formulae for More Effective Querying & Efficient Presentation. Epubli Verlag, Berlin (2017). ISBN: 9783745062083

    Google Scholar 

  11. Schubotz, M., Veenhuis, D., Cohl, H.S.: Getting the units right. In: Kohlhase, A., et al. (ed.) Joint Proceedings of the FM4M, MathUI, and ThEdu Workshops, Doctoral Program, and Work in Progress at the Conference on Intelligent Computer Mathematics 2016 Co-located with the 9th Conference on Intelligent Computer Mathematics (CICM 2016), Bialystok, Poland, July 25–29, 2016, Vol. 1785. CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  12. Schubotz, M., et al.: Challenges of mathematical information retrieval in the NTCIR-11 math Wikipedia task. In: Baeza-Yates, R.A., et al. (eds.) Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pp. 951–954. ACM, Santiago (2015). ISBN: 978-1-4503-3621-5

    Google Scholar 

  13. Schubotz, M., et al.: Semantification of identifiers in mathematics for better math information retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), pp. 135–144. ACM, Pisa (2016). ISBN: 978-1-4503-4069-4

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moritz Schubotz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schubotz, M., Krämer, L., Meuschke, N., Hamborg, F., Gipp, B. (2017). Evaluating and Improving the Extraction of Mathematical Identifier Definitions. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65813-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65812-4

  • Online ISBN: 978-3-319-65813-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics