JEFL: Joint Embedding of Formal Proof Libraries

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12941))

Included in the following conference series:

International Symposium on Frontiers of Combining Systems

293 Accesses

Abstract

The heterogeneous nature of the logical foundations used in different interactive proof assistant libraries has rendered discovery of similar mathematical concepts among them difficult. In this paper, we compare a previously proposed algorithm for matching concepts across libraries with our unsupervised embedding approach that can help us retrieve similar concepts. Our approach is based on the fasttext implementation of Word2Vec, on top of which a tree traversal module is added to adapt its algorithm to the representation format of our data export pipeline. We compare the explainability, customizability, and online-servability of the approaches and argue that the neural embedding approach has more potential to be integrated into an interactive proof assistant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Role of the Mizar Mathematical Library for Interactive Proof Development in Mizar

Article Open access 25 November 2017

Prototyping Controlled Mathematical Languages in Jupyter Notebooks

Classification of Alignments Between Concepts of Formal Mathematical Systems

Notes

1.
As to writing of this paper, one more loss function (the one-vs-all, or the ova loss) has been added to the latest version of fasttext, making in total eight variations.
2.
The first author finds this note https://github.com/renpengcheng-github/nlp/tree/master/3.word2vec (in Chinese) particularly helpful in understanding Word2Vec.
3.
We use the term stochastic gradient descent here for convention though we are in fact doing stochastic gradient ascent.
4.
c62abb89396a94520f009f9095874953735e0d75.
5.
This gives a total of 35597 s-expressions for Word2Vec training.

References

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://aclanthology.org/Q17-1010
Bortin, M., Lüth, C.: Structured formal development with quotient types in Isabelle/HOL. In: Autexier, S., et al. (eds.) CICM 2010. LNCS (LNAI), vol. 6167, pp. 34–48. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14128-7_5
Chapter Google Scholar
Carlisle, D., Davenport, J., Dewar, M., Hur, N., Naylor, W.: Conversion between MathMl and OpenMath. Technical Report 24.969, The OpenMath Society (2001)
Google Scholar
Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS (LNAI), vol. 8543, pp. 267–281. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08434-3_20
Chapter Google Scholar
Gauthier, T., Kaliszyk, C.: Sharing HOL4 and HOL light proof knowledge. In: Davis, M., Fehnker, A., McIver, A., Voronkov, A. (eds.) LPAR 2015. LNCS, vol. 9450, pp. 372–386. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48899-7_26
Chapter MATH Google Scholar
Gauthier, T., Kaliszyk, C.: Aligning concepts across proof assistant libraries. J. Symbol. Comput. 90, 89–123 (2019). https://doi.org/10.1016/j.jsc.2018.04.005
Article MathSciNet MATH Google Scholar
Gauthier, T., Kaliszyk, C., Urban, J.: Initial experiments with statistical conjecturing over large formal corpora. In: Kohlhase, A. (ed.) Work in Progress at the Conference on Intelligent Computer Mathematics 2016 (CICM-WiP 2016), CEUR, vol. 1785, pp. 219–228. CEUR-WS.org (2016)
Google Scholar
Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formalized Reasoning 3(2), 153–245 (2010). https://doi.org/10.6092/issn.1972-5787/1980
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks (2016)
Google Scholar
Hales, T.C.: Introduction to the Flyspeck project. In: Coquand, T., Lombardi, H., Roy, M.F. (eds.) Mathematics, Algorithms, Proofs, No. 05021 in Dagstuhl Seminar Proceedings, Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, Dagstuhl, Germany, pp. 1–11 (2006). http://drops.dagstuhl.de/opus/volltexte/2006/432
Harrison, J.: HOL light: an overview. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 60–66. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_4
Chapter Google Scholar
Heras, J., Komendantskaya, E.: Proof pattern search in Coq/SSReflect. CoRR abs/1402.0081 (2014). http://arxiv.org/abs/1402.0081
Huet, G.P., Herbelin, H.: 30 years of research and development around Coq. In: Jagannathan, S., Sewell, P. (eds.) ACM Symposium on Principles of Programming Languages, POPL 2014, pp. 249–250. ACM (2014). https://doi.org/10.1145/2535838.2537848
Hurd, J.: The OpenTheory standard theory library. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 177–191. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_14
Chapter Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, pp. 427–431. Association for Computational Linguistics, April 2017. https://www.aclweb.org/anthology/E17-2068
Kaliszyk, C., Krauss, A.: Scalable LCF-style proof translation. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 51–66. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39634-2_7
Chapter Google Scholar
Kaliszyk, C., Urban, J.: Lemma mining over HOL Light. In: LPAR, pp. 503–517 (2013)
Google Scholar
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reasoning 53(2), 173–213 (2014). https://doi.org/10.1007/s10817-014-9303-3
Kaliszyk, C., Urban, J.: HOL(y)Hammer: online ATP service for HOL Light. Math. Comput. Sci. 9(1), 5–22 (2014). https://doi.org/10.1007/s11786-014-0182-0
Article MATH Google Scholar
Keller, C., Werner, B.: Importing HOL light into Coq. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 307–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5_22
Chapter Google Scholar
Klein, G.: Proof engineering considered essential. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. LNCS, vol. 8442, pp. 16–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06410-9_2
Chapter Google Scholar
McCarthy, J.: Recursive functions symbolic expressions and their computation by machine, Part I. Commun. ACM 3(4), 184–195 (1960). https://doi.org/10/fvx5pv. http://dl.acm.org/citation.cfm?id=367177.367199. zSCC: NoCitationData[s0]. ISBN 0001-0782
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Müller, D., Gauthier, T., Kaliszyk, C., Kohlhase, M., Rabe, F.: Classification of alignments between concepts of formal mathematical systems. In: Geuvers, H., England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS (LNAI), vol. 10383, pp. 83–98. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62075-6_7
Chapter Google Scholar
Obua, S., Skalberg, S.: Importing HOL into Isabelle/HOL. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 298–302. Springer, Heidelberg (2006). https://doi.org/10.1007/11814771_27
Chapter Google Scholar
Paulson, L.C.: Isabelle: the next seven hundred theorem provers. In: Lusk, E., Overbeek, R. (eds.) CADE 1988. LNCS, vol. 310, pp. 772–773. Springer, Heidelberg (1988). https://doi.org/10.1007/BFb0012891
Chapter Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014. https://doi.org/10.1145/2623330.2623732
Rabe, F.: The MMT API: a generic MKM system. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS (LNAI), vol. 7961, pp. 339–343. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39320-4_25
Chapter Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp. 45–50, May 2010. http://is.muni.cz/publication/884893/en
So, C.M., Watt, S.M.: On the conversion between content MathMl and OpenMath. In: Proceedings of the Conference on the Communicating Mathematics in the Digital Era (CMDE 2006), pp. 169–182 (2006)
Google Scholar
Sutcliffe, G.: The TPTP world - infrastructure for automated reasoning. In: LPAR (Dakar), pp. 1–12 (2010)
Google Scholar
Urban, J.: MoMM - fast interreduction and retrieval in large libraries of formalized mathematics. Int. J. Artif. Intell. Tools 15(1), 109–130 (2006). http://ktiml.mff.cuni.cz/~urban/MoMM/momm.ps
Urban, J.: MPTP 0.2: design, implementation, and initial experiments. J. Autom. Reasoning 37(1–2), 21–43 (2006). https://doi.org/10.1007/s10817-006-9032-3

Download references

Acknowledgements

We are largely indebted to Thibault Gauthier for his work on alignments and the various data exports that we re-use. We thank Josef Urban for the Mizar export and his invitation to Prague to discuss research. We also thank Tomáš Mikolov for valuable insights for the current work. This work was supported by the ERC grant no. 714034 SMART and by the University of Innsbruck PhD scholarship.

Author information

Authors and Affiliations

University of Innsbruck, Innsbruck, Austria
Qingxiang Wang & Cezary Kaliszyk
University of Warsaw, Warsaw, Poland
Cezary Kaliszyk

Authors

Qingxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cezary Kaliszyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cezary Kaliszyk .

Editor information

Editors and Affiliations

University of Liverpool, Liverpool, UK
Boris Konev
The University of Manchester, Manchester, UK
Giles Reger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Kaliszyk, C. (2021). JEFL: Joint Embedding of Formal Proof Libraries. In: Konev, B., Reger, G. (eds) Frontiers of Combining Systems. FroCoS 2021. Lecture Notes in Computer Science(), vol 12941. Springer, Cham. https://doi.org/10.1007/978-3-030-86205-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-86205-3_9
Published: 01 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86204-6
Online ISBN: 978-3-030-86205-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics