[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation

Published: 08 October 2024 Publication History

Abstract

With its combination of Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping—a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with some surprising results: while 2.2× speedups can be obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In the place of semi-naive evaluation, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog’s SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé’s code generator achieve mean 5.2× and 7.6× speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. All in all, using compilation and eager evaluation (as appropriate), Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that traditional semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

References

[1]
Supun Abeysinghe, Anxhelo Xhebraj, and Tiark Rompf. 2024. Flan: An Expressive and Efficient Datalog Compiler for Program Analysis. Proceedings of the ACM on Programming Languages, 8, POPL (2024), 2577–2609. https://doi.org/10.1145/3632928
[2]
Alex Aiken, Suhabe Bugrara, Isil Dillig, Thomas Dillig, Brian Hackett, and Peter Hawkins. 2007. An Overview of the Saturn Project. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. 43–48. https://doi.org/10.1145/1251535.1251543
[3]
Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design. 1–8. https://ieeexplore.ieee.org/document/6679385/
[4]
Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M Hellerstein, and Russell Sears. 2010. Boom Analytics: Exploring Data-Centric, Declarative Programming for the Cloud. In Proceedings of the 5th European Conference on Computer Systems. 223–236. https://doi.org/10.1145/1755913.1755937
[5]
Peter Alvaro, William R Marczak, Neil Conway, Joseph M Hellerstein, David Maier, and Russell Sears. 2010. Dedalus: Datalog in Time and Space. In Datalog Reloaded - 1st International Workshop. 43–48. https://doi.org/10.1007/978-3-642-24206-9_16
[6]
Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. 2017. Porting Doop to Soufflé: A Tale of Inter-Engine Portability for Datalog-Based Analyses. In International Workshop on State Of the Art in Program Analysis. 25–30. https://doi.org/10.1145/3088515.3088522
[7]
Krzysztof R Apt, Howard A Blair, and Adrian Walker. 1988. Towards a Theory of Declarative Knowledge. In Foundations of Deductive Databases and Logic Programming. Elsevier, 89–148. https://doi.org/10.1016/B978-0-934613-40-8.50006-3
[8]
Michael Arntzenius and Neel Krishnaswami. 2020. Seminäive Evaluation for a Higher-Order Functional Language. Proceedings of the ACM on Programming Languages, 4, POPL (2020), 22:1–22:28. https://doi.org/10.1145/3371090
[9]
Nimar S Arora, Robert D Blumofe, and C Greg Plaxton. 1998. Thread Scheduling for Multiprogrammed Multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures. 119–129.
[10]
Francois Bancilhon. 1986. Naive Evaluation of Recursively Defined Relations. In On Knowledge Base Management Systems. Springer, 165–178. https://doi.org/10.1007/978-1-4612-4980-1_17
[11]
Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. 1985. Magic Sets and Other Strange Ways to Implement Logic Programs. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. 1–15. https://doi.org/10.1145/6012.15399
[12]
Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2016. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org
[13]
Catriel Beeri and Raghu Ramakrishnan. 1991. On the Power of Magic. The Journal of Logic Programming, 10, 3-4 (1991), 255–299. https://doi.org/10.1016/0743-1066(91)90038-Q
[14]
Aaron Bembenek, Michael Ballantyne, Michael Greenberg, and Nada Amin. 2020. Datalog-Based Systems Can Use Incremental SMT Solving. In Proceedings of the 36th International Conference Logic Programming (Technical Communications).
[15]
Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2020. Formulog: Datalog for SMT-Based Static Analysis. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 141:1–141:31. https://doi.org/10.1145/3428209
[16]
Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2024. Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version). arxiv:2408.14017.
[17]
Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2024. Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (OOPSLA’24 Artifact). https://doi.org/10.5281/zenodo.13372573
[18]
Gavin M. Bierman, Andrew D. Gordon, Cătălin Hriţcu, and David Langworthy. 2012. Semantic Subtyping with an SMT Solver. Journal of Functional Programming, 22, 1 (2012), 31–105. https://doi.org/10.1145/1863543.1863560
[19]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 169–190. https://doi.org/10.1145/1167473.1167488
[20]
Robert D Blumofe and Charles E Leiserson. 1999. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46, 5 (1999), 720–748.
[21]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 243–262. https://doi.org/10.1145/1639949.1640108
[22]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. 209–224.
[23]
Stefano Ceri, Georg Gottlob, and Letizia Tanca. 1989. What You Always Wanted to Know About Datalog (and Never Dared to Ask). IEEE Transactions on Knowledge and Data Engineering, 1, 1 (1989), 146–166.
[24]
Keith L Clark. 1977. Negation as Failure. In Logic and Data Bases. 293–322. https://doi.org/10.1007/978-1-4684-3384-5_11
[25]
Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A Tool for Checking ANSI-C Programs. In Proceedings of the 10th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 168–176. https://doi.org/10.1007/978-3-540-24730-2_15
[26]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science & Engineering, 5, 1 (1998), 46–55.
[27]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
[28]
Leonardo de Moura and Nikolaj Bjørner. 2011. Satisfiability modulo Theories: Introduction and Applications. Commun. ACM, 54, 9 (2011), 69–77. https://doi.org/10.1145/1995376.1995394
[29]
Daniel J Dougherty, Kathi Fisler, and Shriram Krishnamurthi. 2006. Specifying and reasoning about dynamic access-control policies. In Proceedings of the 3rd International Joint Conference on Automated Reasoning. 632–646. https://doi.org/10.1007/11814771_51
[30]
Niklas Eén and Niklas Sörensson. 2003. Temporal Induction by Incremental SAT Solving. Electronic Notes in Theoretical Computer Science, 89, 4 (2003), 543–560. https://doi.org/10.1016/s1571-0661(05)82542-3
[31]
Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. 2019. Scaling-up in-Memory Datalog Processing: Observations and Techniques. Proceedings of the VLDB Endowment, 12, 6 (2019), feb, 695–708. issn:2150-8097 https://doi.org/10.14778/3311880.3311886
[32]
Yu Feng, Xinyu Wang, Isil Dillig, and Thomas Dillig. 2015. Bottom-up Context-Sensitive Pointer Analysis for Java. In Proceedings of the 13th Asian Symposium on Programming Languages and Systems. 465–484. https://doi.org/10.1007/978-3-319-26529-2_25
[33]
Antonio Flores-Montoya and Eric M. Schulte. 2020. Datalog Disassembly. In Proceedings of the 29th USENIX Security Symposium. 1075–1092.
[34]
1978. Logic and Data Bases, Hervé Gallaire and Jack Minker (Eds.). Plenum Press.
[35]
Jean Gallier. 1986. Logic for Computer Science: Foundations of Automated Theorem Proving. Wiley.
[36]
Michael Gelfond and Vladimir Lifschitz. 1988. The Stable Model Semantics for Logic Programming. In Proceedings of the 5th International Conference and Symposium on Logic Programming.
[37]
Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2019. Gigahorse: Thorough, Declarative Decompilation of Smart Contracts. In Proceedings of the 41st International Conference on Software Engineering. 1176–1186. https://doi.org/10.1109/ICSE.2019.00120
[38]
Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2018. Madmax: Surviving Out-of-Gas Conditions in Ethereum Smart Contracts. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 116:1–116:27. https://doi.org/10.1145/3276486
[39]
Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and Recursive Query Processing. Foundations & Trends in Databases, https://doi.org/10.1561/1900000017
[40]
Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems, 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972
[41]
Jiani Huang, Ziyang Li, Binghong Chen, Karan Samel, Mayur Naik, Le Song, and Xujie Si. 2021. Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning. In Advances in Neural Information Processing Systems. 25134–25145.
[42]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Computer Aided Verification. https://doi.org/10.1007/978-3-319-41540-6_23
[43]
Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. 2019. Brie: A specialized trie for concurrent Datalog. In Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores. 31–40.
[44]
Herbert Jordan, Pavle Subotic, David Zhao, and Bernhard Scholz. 2019. A Specialized B-tree for Concurrent Datalog Evaluation. In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 327–339. https://doi.org/10.1145/3293883.3295719
[45]
Herbert Jordan, Pavle Subotic, David Zhao, and Bernhard Scholz. 2022. Specializing parallel data structures for Datalog. Concurrency and Computation: Practice and Experience, 34, 2 (2022), https://doi.org/10.1002/CPE.5643
[46]
Bas Ketsman and Paraschos Koutris. 2022. Modern Datalog Engines. Foundations and Trends in Databases, 12, 1 (2022), 1–68. https://doi.org/10.1561/1900000073
[47]
Ninghui Li and John C Mitchell. 2003. Datalog with Constraints: A Foundation for Trust Management Languages. In Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages. 58–73. https://doi.org/10.1007/3-540-36388-2_6
[48]
Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, and Mayur Naik. 2024. Relational Programming with Foundation Models. In AAAI Conference on Artificial Intelligence. 209–214.
[49]
Ziyang Li, Jiani Huang, and Mayur Naik. 2023. Scallop: A Language for Neurosymbolic Programming. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1463–1487. https://doi.org/10.1145/3591280
[50]
Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative Networking: Language, Execution and Optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 97–108. https://doi.org/10.1145/1142473.1142485
[51]
Magnus Madsen, Jonathan Lindegaard Starup, and Ondřej Lhoták. 2022. Flix: A Meta Programming Language for Datalog. In Proceedings of the 4th International Workshop on the Resurgence of Datalog in Academia and Industry.
[52]
Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: a Declarative Language for Fixed Points on Lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 194–208. https://doi.org/10.1145/2908080.2908096
[53]
Eric Mohr, David A Kranz, and Robert H Halstead Jr. 1990. Lazy task creation: A technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming. 185–197.
[54]
Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. In Proceedings of the 16th International Conference on Very Large Data Bases. 264–277.
[55]
André Pacak and Sebastian Erdweg. 2022. Functional programming with Datalog. In 36th European Conference on Object-Oriented Programming. 7:1–7:28.
[56]
Teodor C Przymusinski. 1988. On the Declarative Semantics of Deductive Databases and Logic Programs. In Foundations of Deductive Databases and Logic Programming. Elsevier, 193–216. https://doi.org/10.1016/b978-0-934613-40-8.50009-9
[57]
Tiark Rompf and Martin Odersky. 2010. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In International Conference on Generative Programming and Component Engineering. 127–136. https://doi.org/10.1145/1868294.1868314
[58]
Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In Proceedings of the 3rd International Workshop on the Resurgence of Datalog in Academia and Industry. 56–67.
[59]
Arash Sahebolamri, Langston Barrett, Scott Moore, and Kristopher K. Micinski. 2023. Bring Your Own Data Structures to Datalog. Proceedings of the ACM on Programming Languages, 7, OOPSLA2 (2023), 1198–1223. https://doi.org/10.1145/3622840
[60]
Arash Sahebolamri, Thomas Gilray, and Kristopher Micinski. 2022. Seamless deductive inference via macros. In Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction. 77–88.
[61]
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On Fast Large-Scale Program Analysis in Datalog. In Proceedings of the 25th International Conference on Compiler Construction. 196–206. https://doi.org/10.1145/2892208.2892226
[62]
Jiwon Seo, Stephen Guo, and Monica S. Lam. 2013. SociaLite: Datalog Extensions for Efficient Social Network Analysis. In 29th IEEE International Conference on Data Engineering. 278–289. https://doi.org/10.1109/ICDE.2013.6544832
[63]
Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big Data Analytics with Datalog Queries on Spark. In Proceedings of the 2016 International Conference on Management of Data. 1135–1149. https://doi.org/10.1145/2882903.2915229
[64]
Yannis Smaragdakis, Neville Grech, Sifis Lagouvardos, Konstantinos Triantafyllou, and Ilias Tsatiris. 2021. Symbolic value-flow static analysis: deep, precise, complete modeling of Ethereum smart contracts. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–30.
[65]
Pavle Subotić, Herbert Jordan, Lijun Chang, Alan Fekete, and Bernhard Scholz. 2018. Automatic Index Selection for Large-Scale Datalog Computation. Proceedings of the VLDB Endowment, 12, 2 (2018), 141–153.
[66]
Tamás Szabó, Gábor Bergmann, Sebastian Erdweg, and Markus Voelter. 2018. Incrementalizing Lattice-Based Program Analyses in Datalog. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 139:1–139:29. https://doi.org/10.1145/3276509
[67]
Tamás Szabó, Sebastian Erdweg, and Gábor Bergmann. 2021. Incremental whole-program analysis in Datalog with lattices. In Proceedings of the ACM SIGPLAN International Conference on Programming Language Design and Implementation. 1–15.
[68]
Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Buenzli, and Martin Vechev. 2018. Securify: Practical Security Analysis of Smart Contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 67–82. https://doi.org/10.1145/3243734.3243780
[69]
Allen Van Gelder. 1989. Negation as Failure Using Tight Derivations for General Logic Programs. The Journal of Logic Programming, 6, 1-2 (1989), 109–133. https://doi.org/10.1016/0743-1066(89)90032-0
[70]
John Whaley, Dzintars Avots, Michael Carbin, and Monica S. Lam. 2005. Using Datalog with Binary Decision Diagrams for Program Analysis. In Proceedings of the Third Asian Symposium on Programming Languages and Systems. 97–118.
[71]
John Whaley and Monica S. Lam. 2004. Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation. 131–144. https://doi.org/10.1145/996841.996859
[72]
Christoph M. Wintersteiger, Youssef Hamadi, and Leonardo de Moura. 2009. A Concurrent Portfolio Approach to SMT Solving. In Computer Aided Verification. 715–720. isbn:978-3-642-02658-4
[73]
Eric Zhang. 2020. Crepe. https://github.com/ekzhang/crepe Accessed: 2023-08-18

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
October 2024
2691 pages
EISSN:2475-1421
DOI:10.1145/3554319
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024
Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Datalog
  2. Formulog
  3. SMT solving
  4. compilation
  5. parallel evaluation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 163
    Total Downloads
  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)74
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media