Abstract
We discuss the progress in our project which aims to automate formalization by combining natural language processing with deep semantic understanding of mathematical expressions. We introduce the overall motivation and ideas behind this project, and then propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that our learning method allows efficient use of large amount of contextual information, which in turn significantly boosts the precision of the statistical parsing and also makes it more efficient. This leads to a large improvement of our first results in parsing theorems from the Flyspeck corpus.
C. Kaliszyk—Supported by the ERC Starting grant no. 714034 SMART.
J. Urban and J. Vyskočil—Supported by the ERC Consolidator grant no. 649043 AI4REASON. This work was supported by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000466).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In general, a grammar could pick only some subtree depths instead of their contiguous intervals, but we do not use such grammars now.
- 2.
About 1% of the longest Flyspeck formulas were removed from the evaluation to keep the parsing times manageable.
- 3.
If the context-free version parsed only a few terms, but with the best rank, its average rank would be 1, but the method would still be much worse in terms of the overall number of correctly parsed terms.
References
Bancerek, G., Rudnicki, P.: A compendium of continuous lattices in MIZAR. J. Autom. Reason. 29(3–4), 189–224 (2002)
Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. J. Formaliz. Reason. 9(1), 101–148 (2016)
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Morgan Kaufmann Publishers/ACL (1997)
The Coq Proof Assistant. http://coq.inria.fr
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Dudani, S.A.: The distance-weighted K-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)
Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03359-9_23
Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 267–281. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_20
Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 163–179. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39634-2_14
Gonthier, G., Tassi, E.: A language of patterns for subterm selection. In: Beringer, L., Felty, A. (eds.) ITP 2012. LNCS, vol. 7406, pp. 361–376. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32347-8_25
Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formaliz. Reason. 3(2), 153–245 (2010)
Greenbaum, S.: Input transformations and resolution implementation techniques for theorem-proving in first-order logic. Ph.D. thesis, University of Illinois at Urbana-Champaign (1986)
Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74464-1_11
Hales, T.: Dense Sphere Packings a Blueprint for Formal Proofs, London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)
Hales, T.C., Adams, M., Bauer, G., Dang, D.T., Harrison, J., Hoang, T.L., Kaliszyk, C., Magron, V., McLaughlin, S., Nguyen, T.T., Nguyen, T.Q., Nipkow, T., Obua, S., Pleso, J., Rute, J., Solovyev, A., Ta, A.H.T., Tran, T.N., Trieu, D.T., Urban, J., Vu, K.K., Zumkeller, R.: A formal proof of the Kepler conjecture. CoRR, abs/1501.02155, 2015
Harrison, J.: HOL Light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996). doi:10.1007/BFb0031814
Harrison, J., Urban, J., Wiedijk, F.: History of interactive theorem proving. In: Siekmann, J.H. (ed.) Computational Logic. Handbook of the History of Logic, vol. 9. Elsevier, Amsterdam (2014)
Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reason. 53(2), 173–213 (2014)
Kaliszyk, C., Urban, J., Vyskocil, J.: System description: statistical parsing of informalized Mizar formulas. http://grid01.ciirc.cvut.cz/mptp/synasc17sd.pdf
Kaliszyk, C., Urban, J., Vyskočil, J.: Efficient semantic features for automated reasoning over large theories. In: Yang, Q., Wooldridge, M. (eds.) IJCAI 2015, pp. 3084–3090. AAAI Press, Menlo Park (2015)
Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (rough diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233. Springer, Cham (2015). doi:10.1007/978-3-319-22102-1_15
Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Cham (2014). doi:10.1007/978-3-319-08434-3_34
Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an operating-system kernel. Commun. ACM 53(6), 107–115 (2010)
Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39799-8_1
Lange, M., Leiß, H.: To CNF or not to CNF? an efficient yet presentable version of the CYK algorithm. Inform. Didact. 8, 1–21 (2009). https://www.informaticadidactica.de/uploads/Artikel/LangeLeiss2009/LangeLeiss2009.pdf
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reasoning (in 2 Volumes). Elsevier and MIT Press, Cambridge (2001)
Rudnicki, P., Schwarzweller, C., Trybulec, A.: Commutative algebra in the Mizar system. J. Symb. Comput. 32(1/2), 143–169 (2001)
Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for Flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39320-4_10
Urban, J., Vyskočil, J.: Theorem proving in large formal mathematics as an emerging AI field. In: Bonacina, M.P., Stickel, M.E. (eds.) Automated Reasoning and Mathematics. LNCS, vol. 7788, pp. 240–257. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36675-8_13
Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle framework. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 33–38. Springer, Heidelberg (2008). doi:10.1007/978-3-540-71067-7_7
Younger, D.H.: Recognition and parsing of context-free languages in time \(n^{3}\). Inf. Control 10(2), 189–208 (1967)
Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kaliszyk, C., Urban, J., Vyskočil, J. (2017). Automating Formalization by Statistical and Semantic Parsing of Mathematics. In: Ayala-Rincón, M., Muñoz, C.A. (eds) Interactive Theorem Proving. ITP 2017. Lecture Notes in Computer Science(), vol 10499. Springer, Cham. https://doi.org/10.1007/978-3-319-66107-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-66107-0_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66106-3
Online ISBN: 978-3-319-66107-0
eBook Packages: Computer ScienceComputer Science (R0)