Abstract
The heterogeneous nature of the logical foundations used in different interactive proof assistant libraries has rendered discovery of similar mathematical concepts among them difficult. In this paper, we compare a previously proposed algorithm for matching concepts across libraries with our unsupervised embedding approach that can help us retrieve similar concepts. Our approach is based on the fasttext implementation of Word2Vec, on top of which a tree traversal module is added to adapt its algorithm to the representation format of our data export pipeline. We compare the explainability, customizability, and online-servability of the approaches and argue that the neural embedding approach has more potential to be integrated into an interactive proof assistant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
As to writing of this paper, one more loss function (the one-vs-all, or the ova loss) has been added to the latest version of fasttext, making in total eight variations.
- 2.
The first author finds this note https://github.com/renpengcheng-github/nlp/tree/master/3.word2vec (in Chinese) particularly helpful in understanding Word2Vec.
- 3.
We use the term stochastic gradient descent here for convention though we are in fact doing stochastic gradient ascent.
- 4.
c62abb89396a94520f009f9095874953735e0d75.
- 5.
This gives a total of 35597 s-expressions for Word2Vec training.
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://aclanthology.org/Q17-1010
Bortin, M., Lüth, C.: Structured formal development with quotient types in Isabelle/HOL. In: Autexier, S., et al. (eds.) CICM 2010. LNCS (LNAI), vol. 6167, pp. 34–48. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14128-7_5
Carlisle, D., Davenport, J., Dewar, M., Hur, N., Naylor, W.: Conversion between MathMl and OpenMath. Technical Report 24.969, The OpenMath Society (2001)
Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS (LNAI), vol. 8543, pp. 267–281. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08434-3_20
Gauthier, T., Kaliszyk, C.: Sharing HOL4 and HOL light proof knowledge. In: Davis, M., Fehnker, A., McIver, A., Voronkov, A. (eds.) LPAR 2015. LNCS, vol. 9450, pp. 372–386. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48899-7_26
Gauthier, T., Kaliszyk, C.: Aligning concepts across proof assistant libraries. J. Symbol. Comput. 90, 89–123 (2019). https://doi.org/10.1016/j.jsc.2018.04.005
Gauthier, T., Kaliszyk, C., Urban, J.: Initial experiments with statistical conjecturing over large formal corpora. In: Kohlhase, A. (ed.) Work in Progress at the Conference on Intelligent Computer Mathematics 2016 (CICM-WiP 2016), CEUR, vol. 1785, pp. 219–228. CEUR-WS.org (2016)
Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formalized Reasoning 3(2), 153–245 (2010). https://doi.org/10.6092/issn.1972-5787/1980
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks (2016)
Hales, T.C.: Introduction to the Flyspeck project. In: Coquand, T., Lombardi, H., Roy, M.F. (eds.) Mathematics, Algorithms, Proofs, No. 05021 in Dagstuhl Seminar Proceedings, Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, Dagstuhl, Germany, pp. 1–11 (2006). http://drops.dagstuhl.de/opus/volltexte/2006/432
Harrison, J.: HOL light: an overview. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 60–66. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03359-9_4
Heras, J., Komendantskaya, E.: Proof pattern search in Coq/SSReflect. CoRR abs/1402.0081 (2014). http://arxiv.org/abs/1402.0081
Huet, G.P., Herbelin, H.: 30 years of research and development around Coq. In: Jagannathan, S., Sewell, P. (eds.) ACM Symposium on Principles of Programming Languages, POPL 2014, pp. 249–250. ACM (2014). https://doi.org/10.1145/2535838.2537848
Hurd, J.: The OpenTheory standard theory library. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 177–191. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_14
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, pp. 427–431. Association for Computational Linguistics, April 2017. https://www.aclweb.org/anthology/E17-2068
Kaliszyk, C., Krauss, A.: Scalable LCF-style proof translation. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 51–66. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39634-2_7
Kaliszyk, C., Urban, J.: Lemma mining over HOL Light. In: LPAR, pp. 503–517 (2013)
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reasoning 53(2), 173–213 (2014). https://doi.org/10.1007/s10817-014-9303-3
Kaliszyk, C., Urban, J.: HOL(y)Hammer: online ATP service for HOL Light. Math. Comput. Sci. 9(1), 5–22 (2014). https://doi.org/10.1007/s11786-014-0182-0
Keller, C., Werner, B.: Importing HOL light into Coq. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 307–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5_22
Klein, G.: Proof engineering considered essential. In: Jones, C., Pihlajasaari, P., Sun, J. (eds.) FM 2014. LNCS, vol. 8442, pp. 16–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06410-9_2
McCarthy, J.: Recursive functions symbolic expressions and their computation by machine, Part I. Commun. ACM 3(4), 184–195 (1960). https://doi.org/10/fvx5pv. http://dl.acm.org/citation.cfm?id=367177.367199. zSCC: NoCitationData[s0]. ISBN 0001-0782
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Müller, D., Gauthier, T., Kaliszyk, C., Kohlhase, M., Rabe, F.: Classification of alignments between concepts of formal mathematical systems. In: Geuvers, H., England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS (LNAI), vol. 10383, pp. 83–98. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62075-6_7
Obua, S., Skalberg, S.: Importing HOL into Isabelle/HOL. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 298–302. Springer, Heidelberg (2006). https://doi.org/10.1007/11814771_27
Paulson, L.C.: Isabelle: the next seven hundred theorem provers. In: Lusk, E., Overbeek, R. (eds.) CADE 1988. LNCS, vol. 310, pp. 772–773. Springer, Heidelberg (1988). https://doi.org/10.1007/BFb0012891
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014. https://doi.org/10.1145/2623330.2623732
Rabe, F.: The MMT API: a generic MKM system. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS (LNAI), vol. 7961, pp. 339–343. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39320-4_25
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, pp. 45–50, May 2010. http://is.muni.cz/publication/884893/en
So, C.M., Watt, S.M.: On the conversion between content MathMl and OpenMath. In: Proceedings of the Conference on the Communicating Mathematics in the Digital Era (CMDE 2006), pp. 169–182 (2006)
Sutcliffe, G.: The TPTP world - infrastructure for automated reasoning. In: LPAR (Dakar), pp. 1–12 (2010)
Urban, J.: MoMM - fast interreduction and retrieval in large libraries of formalized mathematics. Int. J. Artif. Intell. Tools 15(1), 109–130 (2006). http://ktiml.mff.cuni.cz/~urban/MoMM/momm.ps
Urban, J.: MPTP 0.2: design, implementation, and initial experiments. J. Autom. Reasoning 37(1–2), 21–43 (2006). https://doi.org/10.1007/s10817-006-9032-3
Acknowledgements
We are largely indebted to Thibault Gauthier for his work on alignments and the various data exports that we re-use. We thank Josef Urban for the Mizar export and his invitation to Prague to discuss research. We also thank Tomáš Mikolov for valuable insights for the current work. This work was supported by the ERC grant no. 714034 SMART and by the University of Innsbruck PhD scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Kaliszyk, C. (2021). JEFL: Joint Embedding of Formal Proof Libraries. In: Konev, B., Reger, G. (eds) Frontiers of Combining Systems. FroCoS 2021. Lecture Notes in Computer Science(), vol 12941. Springer, Cham. https://doi.org/10.1007/978-3-030-86205-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-86205-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86204-6
Online ISBN: 978-3-030-86205-3
eBook Packages: Computer ScienceComputer Science (R0)