Abstract
Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have been proposed to finely tune the sensitivity of different program parts. However, these approaches are usually designed for specific program analyses, and their abstraction adjustments are coarse-grained as they directly drop sensitivity elements.
In this paper, we propose a new technique, 4DM, to tune abstractions for program analyses in Datalog. 4DM merges values in a domain, allowing fine-grained sensitivity tuning. 4DM uses a data-driven algorithm for automatically learning a merging strategy for a library from a training set of programs. Unlike existing approaches that rely on the properties of a certain analysis, our learning algorithm works for a wide range of Datalog analyses. We have evaluated our approach on a points-to analysis and a liveness analysis, on the DaCapo benchmark suite. Our evaluation results suggest that our technique achieves a significant speedup and negligible precision loss, reaching a good balance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Bell number can be recursively calculated as \(B(n+1)=\varSigma _{k=0}^nC_n^kB(n)\).
References
Barbuti, R., Giacobazzi, R., Levi, G.: A general framework for semantics-based bottom-up abstract interpretation of logic programs. ACM Trans. Program. Lang. Syst. (TOPLAS) 15(1), 133–181 (1993)
Blackburn, S.M., et al.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA 2006, pp. 169–190. ACM, New York (2006)
Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, pp. 243–262 (2009)
Cortesi, A., Filé, G.: Abstract interpretation of logic programs: an abstract domain for groundness, sharing, freeness and compoundness analysis. ACM SIGPLAN Not. 26(9), 52–61 (1991). PEPM 1991
Cousot, P., Cousot, R.: Abstract interpretation and application to logic programs. J. Log. Program. 13(2), 103–179 (1992)
Debray, S.K.: Global optimization of logic programs (analysis, transformation, compilation) (1987)
Delzanno, G., Giacobazzi, R., Ranzato, F.: Static analysis, abstract interpretation and verification in (constraint logic) programming. In: Dovier, A., Pontelli, E. (eds.) A 25-Year Perspective on Logic Programming. LNCS, vol. 6125, pp. 136–158. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14309-0_7
Fähndrich, M., Foster, J.S., Su, Z., Aiken, A.: Partial online cycle elimination in inclusion constraint graphs. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, 17–19 June 1998, pp. 85–96 (1998)
Hardekopf, B., Lin, C.: The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, 10–13 June 2007, pp. 290–299 (2007)
Jeon, M., Jeong, S., Oh, H.: Precise and scalable points-to analysis via data-driven context tunneling. PACMPL 2(OOPSLA), 140:1–140:29 (2018)
Jeong, S., Jeon, M., Cha, S.D., Oh, H.: Data-driven context-sensitivity for points-to analysis. PACMPL 1(OOPSLA), 100:1–100:28 (2017)
Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating program analyses by cross-program training. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, Part of SPLASH 2016, Amsterdam, The Netherlands, 30 October–4 November 2016, pp. 359–377 (2016)
Lhoták, O., Hendren, L.J.: Context-sensitive points-to analysis: is it worth it? In: Compiler Construction, Proceedings of the 15th International Conference, CC 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, 30–31 March 2006, pp. 47–64 (2006)
Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Precision-guided context sensitivity for pointer analysis. PACMPL 2(OOPSLA), 141:1–141:29 (2018)
Li, Y., Tan, T., Møller, A., Smaragdakis, Y.: Scalability-first pointer analysis with self-tuning context-sensitivity. In: Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 129–140 (2018)
Liang, P., Tripp, O., Naik, M.: Learning minimal abstractions. In: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, 26–28 January 2011, pp. 31–42 (2011)
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-662-03811-6
pjBench: parallel Java benchmarks (2014)
Shivers, O.: Control-flow analysis of higher-order languages (1991)
Smaragdakis, Y., Kastrinis, G., Balatsouras, G.: Introspective analysis: context-sensitivity, across the board. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, pp. 485–495. ACM, New York (2014)
Tan, T., Li, Y., Xue, J.: Efficient and precise points-to analysis: modeling the heap by merging equivalent automata. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 278–291. ACM, New York (2017)
Tang, H., Wang, D., Xiong, Y., Zhang, L., Wang, X., Zhang, L.: Conditional Dyck-CFL reachability analysis for complete and efficient library summarization. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 880–908. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54434-1_33
Tang, H., Wang, X., Zhang, L., Xie, B., Zhang, L., Mei, H.: Summary-based context-sensitive data-dependence analysis in presence of callbacks. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, 15–17 January 2015, pp. 83–95 (2015)
Xiao, X., Zhang, C.: Geometric encoding: forging the high performance context sensitive points-to analysis for Java. In: Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, 17–21 July 2011, pp. 188–198 (2011)
Xu, G., Rountev, A.: Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 225–236. ACM, New York (2008)
Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, 27 May–03 June 2018, pp. 327–337. ACM (2018)
Zhang, X., Mangal, R., Grigore, R., Naik, M., Yang, H.: On abstraction refinement for program analyses in datalog. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, Edinburgh, United Kingdom, 09–11 June 2014, pp. 239–248. ACM (2014)
Acknowledgements
This work is supported in part by the National Key Research and Development Program of China No. 2019YFE0198100, National Natural Science Foundation of China under Grant Nos. 61922003, and a grant from ZTE-PKU Joint Laboratory for Foundation Software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Incremental Merging in N-Domain-Wise Cases
Here we describe how to define an incremental merging function for general cases of N-domain-wise merging.
Suppose we want to define a merging function on N domains \(D_1, D_2, \ldots , D_N\). Without loss of generality, we suppose \(D_i\) can only be a super-domain of \(D_j\) when \(i > j\). We define \(\pi _i\) (\(1 \le i \le N\)) by induction:
-
When \(i = 1\), we can define an arbitrary 1-domain-wise merging function on \(D_1\).
-
Suppose we have defined a function \(\pi _i : D_1\times \dots \times D_i \rightarrow \widehat{D_{1\ldots i}}\), we can define \(\pi _{i+1} : D_1\times \dots \times D_i\times D_{i+1} \rightarrow \widehat{D_{1\ldots i+1}}\) with a helper function \(\pi _{i+1}'\):
$$\begin{aligned} \pi _{i+1}' : \widehat{D_{1\ldots i}} \times \widehat{D_{i+1}} \rightarrow \widehat{D_{1\ldots i+1}} \end{aligned}$$Here \(\widehat{D_{i+1}}\) represents the changed domain of original \(D_{i+1}\) after applying the merging \(\pi _i\), and \(\pi _{i+1}(d_1,\ldots ,d_i,d_{i+1}) = \pi _{i+1}'(\pi _i(d_1, \ldots , d_i), \pi _i(d_{i+1}))\). Note that \(\pi _i(d_{i+1})\), which \(d_{i+1}\) would become under \(\pi _i\), is determined after applying \(\pi _i\) to the Datalog program.
With \(\pi = \pi _N\), we get an N-domain-wise merging function on \(D_1 \times \dots \times D_N\).
B Proof: The Monotonicity of Mergings
Theorem 3 (Monotonicity)
Given a Datalog program C. If the merging \(\pi _b\) of the domains \(D_1, \dots , D_N\) is a finer merging than \(\pi _a\), then applying \(\pi _b\) to C will deduce no fewer results than \(\pi _a\). It means
Proof
Given a Datalog program C, any derivation on the proof tree in has the form
(note that the concrete values \(\{d^j_i\}\) may not originally match with each other, but get matched after applying \(\varPi _a\)).
As is defined, for any domain D, \(\forall x,y\in D, \varPi _a(x) = \varPi _a(y) \rightarrow \varPi _b(x) = \varPi _b(y)\), so in ’s proof tree,
also holds as a renaming of the previous instantiation.
Therefore, .
C Detailed Performance of 4DM in Points-to Analysis on Large Projects
Table 2 presents the performance of 4DM by applying learnt heuristics from small projects to large projects. Each column of 4DM shows the result of applying a heuristic learnt from a training set of 12 projects. The partition of training set is the same as in Sect. 6.1.
D Detailed Performance of 4DM in Liveness Analysis
Table 3 shows the detailed results of applying 4DM to liveness analysis.
One practical concern in evaluation is that liveness analysis is usually very fast, hence random events in the processor could have a big influence on the measured run time of compiled executable. We tackle this problem by using Soufflé interpreter to run the analysis and obtain a more reliable record of analysis time.
We use a T-test to check whether the distribution of analysis time changes significantly after merging. The smaller the p-value, the stronger the evidence that two distributions are different. We perform 50 independent runs for each benchmark and analysis. The analysis time is the average of the 50 runs, and the p-value is calculated base on these data.
We measure the precision by the size of calculated live-variable set at all call sites. Note that since our approach is sound, this property is non-decreasing; and the less the size increases, the more precise the analysis is.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Y. et al. (2021). Accelerating Program Analyses in Datalog by Merging Library Facts. In: Drăgoi, C., Mukherjee, S., Namjoshi, K. (eds) Static Analysis. SAS 2021. Lecture Notes in Computer Science(), vol 12913. Springer, Cham. https://doi.org/10.1007/978-3-030-88806-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-88806-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88805-3
Online ISBN: 978-3-030-88806-0
eBook Packages: Computer ScienceComputer Science (R0)