Accelerating Program Analyses in Datalog by Merging Library Facts

Yifan Chen¹¹,
Chenyang Yang¹¹,
Xin Zhang¹¹,
Yingfei Xiong¹¹,
Hao Tang¹²,
Xiaoyin Wang¹³ &
…
Lu Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12913))

Included in the following conference series:

International Static Analysis Symposium

999 Accesses

Abstract

Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have been proposed to finely tune the sensitivity of different program parts. However, these approaches are usually designed for specific program analyses, and their abstraction adjustments are coarse-grained as they directly drop sensitivity elements.

In this paper, we propose a new technique, 4DM, to tune abstractions for program analyses in Datalog. 4DM merges values in a domain, allowing fine-grained sensitivity tuning. 4DM uses a data-driven algorithm for automatically learning a merging strategy for a library from a training set of programs. Unlike existing approaches that rely on the properties of a certain analysis, our learning algorithm works for a wide range of Datalog analyses. We have evaluated our approach on a points-to analysis and a liveness analysis, on the DaCapo benchmark suite. Our evaluation results suggest that our technique achieves a significant speedup and negligible precision loss, reaching a good balance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning a Static Analyzer from Data

Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses

Article 21 November 2017

Rethinking Static Analysis by Combining Discrete and Continuous Reasoning

Notes

1.
Bell number can be recursively calculated as $B(n+1)=\varSigma _{k=0}^nC_n^kB(n)$.

References

Barbuti, R., Giacobazzi, R., Levi, G.: A general framework for semantics-based bottom-up abstract interpretation of logic programs. ACM Trans. Program. Lang. Syst. (TOPLAS) 15(1), 133–181 (1993)
Article Google Scholar
Blackburn, S.M., et al.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA 2006, pp. 169–190. ACM, New York (2006)
Google Scholar
Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, pp. 243–262 (2009)
Google Scholar
Cortesi, A., Filé, G.: Abstract interpretation of logic programs: an abstract domain for groundness, sharing, freeness and compoundness analysis. ACM SIGPLAN Not. 26(9), 52–61 (1991). PEPM 1991
Article Google Scholar
Cousot, P., Cousot, R.: Abstract interpretation and application to logic programs. J. Log. Program. 13(2), 103–179 (1992)
Article MathSciNet Google Scholar
Debray, S.K.: Global optimization of logic programs (analysis, transformation, compilation) (1987)
Google Scholar
Delzanno, G., Giacobazzi, R., Ranzato, F.: Static analysis, abstract interpretation and verification in (constraint logic) programming. In: Dovier, A., Pontelli, E. (eds.) A 25-Year Perspective on Logic Programming. LNCS, vol. 6125, pp. 136–158. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14309-0_7
Chapter MATH Google Scholar
Fähndrich, M., Foster, J.S., Su, Z., Aiken, A.: Partial online cycle elimination in inclusion constraint graphs. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, 17–19 June 1998, pp. 85–96 (1998)
Google Scholar
Hardekopf, B., Lin, C.: The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, 10–13 June 2007, pp. 290–299 (2007)
Google Scholar
Jeon, M., Jeong, S., Oh, H.: Precise and scalable points-to analysis via data-driven context tunneling. PACMPL 2(OOPSLA), 140:1–140:29 (2018)
Google Scholar
Jeong, S., Jeon, M., Cha, S.D., Oh, H.: Data-driven context-sensitivity for points-to analysis. PACMPL 1(OOPSLA), 100:1–100:28 (2017)
Google Scholar
Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, Part of SPLASH 2016, Amsterdam, The Netherlands, 30 October–4 November 2016, pp. 359–377 (2016)
Google Scholar
Lhoták, O., Hendren, L.J.: Context-sensitive points-to analysis: is it worth it? In: Compiler Construction, Proceedings of the 15th International Conference, CC 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, 30–31 March 2006, pp. 47–64 (2006)
Google Scholar
Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Precision-guided context sensitivity for pointer analysis. PACMPL 2(OOPSLA), 141:1–141:29 (2018)
Google Scholar
Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Scalability-first pointer analysis with self-tuning context-sensitivity. In: Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 129–140 (2018)
Google Scholar
Liang, P., Tripp, O., Naik, M.: Learning minimal abstractions. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, 26–28 January 2011, pp. 31–42 (2011)
Google Scholar
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-662-03811-6
Book MATH Google Scholar
pjBench: parallel Java benchmarks (2014)
Google Scholar
Shivers, O.: Control-flow analysis of higher-order languages (1991)
Google Scholar
Smaragdakis, Y., Kastrinis, G., Balatsouras, G.: Introspective analysis: context-sensitivity, across the board. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, pp. 485–495. ACM, New York (2014)
Google Scholar
Tan, T., Li, Y., Xue, J.: Efficient and precise points-to analysis: modeling the heap by merging equivalent automata. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 278–291. ACM, New York (2017)
Google Scholar
Tang, H., Wang, D., Xiong, Y., Zhang, L., Wang, X., Zhang, L.: Conditional Dyck-CFL reachability analysis for complete and efficient library summarization. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 880–908. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54434-1_33
Chapter MATH Google Scholar
Tang, H., Wang, X., Zhang, L., Xie, B., Zhang, L., Mei, H.: Summary-based context-sensitive data-dependence analysis in presence of callbacks. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 83–95 (2015)
Google Scholar
Xiao, X., Zhang, C.: Geometric encoding: forging the high performance context sensitive points-to analysis for Java. In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, 17–21 July 2011, pp. 188–198 (2011)
Google Scholar
Xu, G., Rountev, A.: Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 225–236. ACM, New York (2008)
Google Scholar
Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, 27 May–03 June 2018, pp. 327–337. ACM (2018)
Google Scholar
Zhang, X., Mangal, R., Grigore, R., Naik, M., Yang, H.: On abstraction refinement for program analyses in datalog. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, Edinburgh, United Kingdom, 09–11 June 2014, pp. 239–248. ACM (2014)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Program of China No. 2019YFE0198100, National Natural Science Foundation of China under Grant Nos. 61922003, and a grant from ZTE-PKU Joint Laboratory for Foundation Software.

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies, MoE, Department of Computer Science and Technology, EECS, Peking University, Beijing, China
Yifan Chen, Chenyang Yang, Xin Zhang, Yingfei Xiong & Lu Zhang
Alibaba Group, Hangzhou, China
Hao Tang
Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA
Xiaoyin Wang

Authors

Yifan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yingfei Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Hao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Zhang .

Editor information

Editors and Affiliations

Informal Systems, Inria, ENS, DI - Ecole Normale Supérieure, Paris, France
Cezara Drăgoi
Microsoft Corporation, Redmond, WA, USA
Suvam Mukherjee
Nokia Bell Labs, Murray Hill, NJ, USA
Kedar Namjoshi

Appendices

A Incremental Merging in N-Domain-Wise Cases

Here we describe how to define an incremental merging function for general cases of N-domain-wise merging.

Suppose we want to define a merging function on N domains $D_1, D_2, \ldots , D_N$. Without loss of generality, we suppose $D_i$ can only be a super-domain of $D_j$ when $i > j$. We define $\pi _i$ ($1 \le i \le N$) by induction:

When $i = 1$, we can define an arbitrary 1-domain-wise merging function on $D_1$.
Suppose we have defined a function $\pi _i : D_1\times \dots \times D_i \rightarrow \widehat{D_{1\ldots i}}$, we can define $\pi _{i+1} : D_1\times \dots \times D_i\times D_{i+1} \rightarrow \widehat{D_{1\ldots i+1}}$ with a helper function $\pi _{i+1}'$:
$$\begin{aligned} \pi _{i+1}' : \widehat{D_{1\ldots i}} \times \widehat{D_{i+1}} \rightarrow \widehat{D_{1\ldots i+1}} \end{aligned}$$
Here $\widehat{D_{i+1}}$ represents the changed domain of original $D_{i+1}$ after applying the merging $\pi _i$, and $\pi _{i+1}(d_1,\ldots ,d_i,d_{i+1}) = \pi _{i+1}'(\pi _i(d_1, \ldots , d_i), \pi _i(d_{i+1}))$. Note that $\pi _i(d_{i+1})$, which $d_{i+1}$ would become under $\pi _i$, is determined after applying $\pi _i$ to the Datalog program.

With $\pi = \pi _N$, we get an N-domain-wise merging function on $D_1 \times \dots \times D_N$.

B Proof: The Monotonicity of Mergings

Theorem 3 (Monotonicity)

Given a Datalog program C. If the merging $\pi _b$ of the domains $D_1, \dots , D_N$ is a finer merging than $\pi _a$, then applying $\pi _b$ to C will deduce no fewer results than $\pi _a$. It means

Proof

Given a Datalog program C, any derivation on the proof tree in has the form

$$\begin{aligned} R^0(\varPi _a(d^0_1), \varPi _a(d^0_2), \ldots ) :\,\!:= \ldots , R^j(\varPi _a(d^j_1), \varPi _a(d^j_2), \ldots ), \ldots . \end{aligned}$$

(note that the concrete values $\{d^j_i\}$ may not originally match with each other, but get matched after applying $\varPi _a$).

As is defined, for any domain D, $\forall x,y\in D, \varPi _a(x) = \varPi _a(y) \rightarrow \varPi _b(x) = \varPi _b(y)$, so in ’s proof tree,

$$\begin{aligned} R^0(\varPi _b(d^0_1), \varPi _b(d^0_2), \ldots ) :\,\!:= \ldots , R^j(\varPi _b(d^j_1), \varPi _b(d^j_2), \ldots ), \ldots . \end{aligned}$$

also holds as a renaming of the previous instantiation.

Therefore, .

C Detailed Performance of 4DM in Points-to Analysis on Large Projects

Table 2 presents the performance of 4DM by applying learnt heuristics from small projects to large projects. Each column of 4DM shows the result of applying a heuristic learnt from a training set of 12 projects. The partition of training set is the same as in Sect. 6.1.

Table 2. Performance of applying heuristics learnt by 4DM from small projects to large projects.

Full size table

D Detailed Performance of 4DM in Liveness Analysis

Table 3 shows the detailed results of applying 4DM to liveness analysis.

One practical concern in evaluation is that liveness analysis is usually very fast, hence random events in the processor could have a big influence on the measured run time of compiled executable. We tackle this problem by using Soufflé interpreter to run the analysis and obtain a more reliable record of analysis time.

We use a T-test to check whether the distribution of analysis time changes significantly after merging. The smaller the p-value, the stronger the evidence that two distributions are different. We perform 50 independent runs for each benchmark and analysis. The analysis time is the average of the 50 runs, and the p-value is calculated base on these data.

We measure the precision by the size of calculated live-variable set at all call sites. Note that since our approach is sound, this property is non-decreasing; and the less the size increases, the more precise the analysis is.

Table 3. Performance for liveness analysis. lib1 = {luindex, sunflow, hsqldb}, lib2 = {avrora, batik, bloat}, lib3 = {chart, lusearch, pmd}.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y. et al. (2021). Accelerating Program Analyses in Datalog by Merging Library Facts. In: Drăgoi, C., Mukherjee, S., Namjoshi, K. (eds) Static Analysis. SAS 2021. Lecture Notes in Computer Science(), vol 12913. Springer, Cham. https://doi.org/10.1007/978-3-030-88806-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-88806-0_4
Published: 13 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88805-3
Online ISBN: 978-3-030-88806-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accelerating Program Analyses in Datalog by Merging Library Facts

Abstract

Access this chapter

Subscribe and save

Buy Now