[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Compacting points-to sets through object clustering

Published: 15 October 2021 Publication History

Abstract

Inclusion-based set constraint solving is the most popular technique for whole-program points-to analysis whereby an analysis is typically formulated as repeatedly resolving constraints between points-to sets of program variables. The set union operation is central to this process. The number of points-to sets can grow as analyses become more precise and input programs become larger, resulting in more time spent performing unions and more space used storing these points-to sets. Most existing approaches focus on improving scalability of precise points-to analyses from an algorithmic perspective and there has been less research into improving the data structures behind the analyses.
Bit-vectors as one of the more popular data structures have been used in several mainstream analysis frameworks to represent points-to sets. To store memory objects in bit-vectors, objects need to mapped to integral identifiers. We observe that this object-to-identifier mapping is critical for a compact points-to set representation and the set union operation. If objects in the same points-to sets (co-pointees) are not given numerically close identifiers, points-to resolution can cost significantly more space and time. Without data on the unpredictable points-to relations which would be discovered by the analysis, an ideal mapping is extremely challenging.
In this paper, we present a new approach to inclusion-based analysis by compacting points-to sets through object clustering. Inspired by recent staged analysis where an auxiliary analysis produces results approximating a more precise main analysis, we formulate points-to set compaction as an optimisation problem solved by integer programming using constraints generated from the auxiliary analysis’s results in order to produce an effective mapping. We then develop a more approximate mapping, yet much more efficiently, using hierarchical clustering to compact bit-vectors. We also develop an improved representation of bit-vectors (called core bit-vectors) to fully take advantage of the newly produced mapping. Our approach requires no algorithmic change to the points-to analysis. We evaluate our object clustering on flow sensitive points-to analysis using 8 open-source programs (>3.1 million lines of LLVM instructions) and our results show that our approach can successfully improve the analysis with an up to 1.83× speed up and an up to 4.05× reduction in memory usage.

Supplementary Material

Auxiliary Presentation Video (oopsla21main-p426-p-video.mp4)
This is a video presentation of the work "Compacting Points-To Sets through Object Clustering" by Mohamad Barbar and Yulei Sui published at OOPSLA 2021.

References

[1]
Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. University of Copenhagen. Denmark.
[2]
George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive Points-To Analysis for C and C++. In International Static Analysis Symposium (SAS ’16). Springer, Germany. 84–104. https://doi.org/10.1007/978-3-662-53413-7_5
[3]
Mohamad Barbar and Yulei Sui. 2021. Compacting Points-To Sets through Object Clustering (Artifact). https://doi.org/10.5281/zenodo.5507442
[4]
Mohamad Barbar, Yulei Sui, and Shiping Chen. 2020. Flow-Sensitive Type-Based Heap Cloning. In 34th European Conference on Object-Oriented Programming (ECOOP ’18, Vol. 166). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Germany. 24:1–24:26. https://doi.org/10.4230/LIPIcs.ECOOP.2020.24
[5]
Mohamad Barbar, Yulei Sui, and Shiping Chen. 2021. Object Versioning for Flow-Sensitive Pointer Analysis. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’21). IEEE Computer Society, USA. 222–235. https://doi.org/10.1109/CGO51591.2021.9370334
[6]
Marc Berndl, Ondrej Lhoták, Feng Qian, Laurie Hendren, and Navindra Umanee. 2003. Points-to Analysis using BDDs. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI ’03). ACM, USA. 103–114. https://doi.org/10.1145/781131.781144
[7]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, USA. 243–262. https://doi.org/10.1145/1640089.1640108
[8]
Fred C. Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, and Mark Streich. 1996. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Proceedings of the 6th International Conference on Compiler Construction (CC ’96). Springer, Germany. 253–267. https://doi.org/10.1007/3-540-61053-7_66
[9]
2021. crux-bitcode. https://github.com/mbarbar/crux-bitcode Last accessed on 14 September 2021.
[10]
Manuel Fähndrich, Jeffrey S. Foster, Zhendong Su, and Alexander Aiken. 1998. Partial Online Cycle Elimination in Inclusion Constraint Graphs. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI ’98). ACM, USA. 85–96. https://doi.org/10.1145/277650.277667
[11]
Stephen J. Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel Geay. 2008. Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology, 17, 2 (2008), Article 9, May, 34 pages. https://doi.org/10.1145/1348250.1348255
[12]
Ben Hardekopf and Calvin Lin. 2007. The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for Millions of Lines of Code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, USA. 290–299. https://doi.org/10.1145/1250734.1250767
[13]
Ben Hardekopf and Calvin Lin. 2007. Exploiting Pointer and Location Equivalence to Optimize Pointer Analysis. In International Static Analysis Symposium (SAS ’07). Springer, Germany. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17
[14]
Ben Hardekopf and Calvin Lin. 2009. Semi-Sparse Flow-Sensitive Pointer Analysis. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’09). ACM, USA. 226–238. https://doi.org/10.1145/1480881.1480911
[15]
Ben Hardekopf and Calvin Lin. 2011. Flow-Sensitive Pointer Analysis for Millions of Lines of Code. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’11). IEEE Computer Society, USA. 289–298. https://doi.org/10.1109/CGO.2011.5764696
[16]
Nevin Heintze and Olivier Tardieu. 2001. Ultra-Fast Aliasing Analysis Using CLA: A Million Lines of C Code in a Second. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI ’01). ACM, USA. 254––263. https://doi.org/10.1145/378795.378855
[17]
ISO/IEC. 2017. ISO/IEC 14882:2017 — Programming languages — C++ (fifth ed.). International Organization for Standardization, Switzerland.
[18]
Paul Jaccard. 1912. The Distribution of the Flora in the Alpine Zone. New Phytologist, 11, 2 (1912), 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x
[19]
Minseok Jeon, Sehun Jeong, and Hakjoo Oh. 2018. Precise and Scalable Points-to Analysis via Data-Driven Context Tunneling. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), Article 140, Oct., 29 pages. https://doi.org/10.1145/3276510
[20]
Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), Article 179, Nov., 30 pages. https://doi.org/10.1145/3428247
[21]
Jakub Kuderski, Jorge A Navas, and Arie Gurfinkel. 2019. Unification-based Pointer Analysis without Oversharing. In 2019 Formal Methods in Computer Aided Design (FMCAD ’19). IEEE Computer Society, USA. 37–45. https://doi.org/10.23919/FMCAD.2019.8894275
[22]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’04). IEEE Computer Society, USA. 75. https://doi.org/10.1109/CGO.2004.1281665
[23]
Yuxiang Lei and Yulei Sui. 2019. Fast and Precise Handling of Positive Weight Cycles for Field-Sensitive Pointer Analysis. In International Static Analysis Symposium (SAS ’19). Springer, Germany. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3
[24]
Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to Analysis with Efficient Strong Updates. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, USA. 3–16. https://doi.org/10.1145/1926385.1926389
[25]
Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java Points-to Analysis Using SPARK. In Proceedings of the 12th International Conference on Compiler Construction (CC ’03). Springer, Germany. 153–169.
[26]
Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Transactions on Programming Languages and Systems, 41, 1 (2019), Article 6, March, 31 pages. https://doi.org/10.1145/3293606
[27]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM, 58, 2 (2015), 44–46. https://doi.org/10.1145/2644805
[28]
2021. https://llvm.org/doxygen/BitVector_8h_source.html Last accessed on 16 April 2021.
[29]
2021. https://llvm.org/doxygen/SparseBitVector_8h_source.html Last accessed on 16 April 2021.
[30]
Oded Maimon and Lior Rokach. 2005. Data Mining and Knowledge Discovery Handbook (1st ed.). Springer, USA. isbn:0387098224 https://doi.org/10.1007/b107408
[31]
Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel Inclusion-Based Points-to Analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10). ACM, USA. 428–443. https://doi.org/10.1145/1869459.1869495
[32]
Fionn Murtagh. 1983. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J., 26, 4 (1983), 11, 354–359. https://doi.org/10.1093/comjnl/26.4.354
[33]
Daniel Müllner. 2013. fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python. Journal of Statistical Software, Articles, 53, 9 (2013), 1–18. https://doi.org/10.18637/jss.v053.i09
[34]
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and Implementation of Sparse Global Analyses for C-like Languages. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). ACM, USA. 229––238. https://doi.org/10.1145/2254064.2254092
[35]
David J Pearce, Paul HJ Kelly, and Chris Hankin. 2003. Online cycle detection and difference propagation for pointer analysis. In Proceedings of the Third IEEE International Workshop on Source Code Analysis and Manipulation. IEEE Computer Society, USA. 3–12. https://doi.org/10.1109/SCAM.2003.1238026
[36]
David J. Pearce, Paul H.J. Kelly, and Chris Hankin. 2007. Efficient Field-Sensitive Pointer Analysis of C. ACM Transactions on Programming Languages and Systems, 30, 1 (2007), Nov., 4:1–4:42. https://doi.org/10.1145/1290520.1290524
[37]
Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave Propagation and Deep Propagation for Pointer Analysis. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’09). IEEE Computer Society, USA. 126–135. https://doi.org/10.1109/CGO.2009.9
[38]
Atanas Rountev and Satish Chandra. 2000. Off-Line Variable Substitution for Scaling Points-to Analysis. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI ’00). ACM, USA. 47––56. https://doi.org/10.1145/349299.349310
[39]
Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. 2019. PhASAR: an inter-procedural static analysis framework for C/C++. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’19). Springer, Germany. 393–410. https://doi.org/10.1007/978-3-030-17465-1_22
[40]
Yannis Smaragdakis, George Balatsouras, and George Kastrinis. 2013. Set-Based Pre-Processing for Points-to Analysis. In Proceedings of the 28th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’13). ACM, USA. 253––270. https://doi.org/10.1145/2509136.2509524
[41]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, USA. 17–30. https://doi.org/10.1145/1926385.1926390
[42]
Yulei Sui and Jingling Xue. 2016. On-Demand Strong Update Analysis via Value-Flow Refinement. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’16). ACM, USA. 460–473. https://doi.org/10.1145/2950290.2950296
[43]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, USA. 265–266. https://doi.org/10.1145/2892208.2892235
[44]
Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static Memory Leak Detection Using Full-Sparse Value-Flow Analysis. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA ’12). ACM, USA. 254–264. https://doi.org/10.1145/2338965.2336784
[45]
André Tavares, Benoit Boissinot, Fernando Pereira, and Fabrice Rastello. 2014. Parameterized Construction of Program Representations for Sparse Dataflow Analyses. In Proceedings of the 23rd International Conference on Compiler Construction. Springer, Germany. 18–39. https://doi.org/10.1007/978-3-642-54807-9_2
[46]
Hamid A Toussi and Ahmed Khademzadeh. 2013. Improving Bit-Vector Representation of Points-To Sets Using Class Hierarchy. International Journal of Computer Theory and Engineering, 5, 3 (2013), 494–499. https://doi.org/10.7763/IJCTE.2013.V5.736
[47]
2021. The T. J. Watson Libraries for Analysis (WALA). http://wala.sf.net/ Last accessed on 14 September 2021.
[48]
John Whaley. 2007. Context-Sensitive Pointer Analysis Using Binary Decision Diagrams. Ph.D. Dissertation. Stanford University. USA.
[49]
2021. Whole Program LLVM in Go. https://github.com/SRI-CSL/gllvm Last accessed on 14 September 2021.
[50]
Jianwen Zhu and Silvian Calman. 2004. Symbolic Pointer Analysis Revisited. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI ’04). ACM, USA. 145–157. https://doi.org/10.1145/996841.996860

Cited By

View all
  • (2025)Thread-sensitive fuzzing for concurrency bug detectionComputers & Security10.1016/j.cose.2024.104171148(104171)Online publication date: Jan-2025
  • (2024)SPATA: Effective OS Bug Detection with Summary-Based, Alias-Aware, and Path-Sensitive Typestate AnalysisACM Transactions on Computer Systems10.1145/369525042:3-4(1-40)Online publication date: 6-Sep-2024
  • (2024)Scaling Type-Based Points-to Analysis with SaturationProceedings of the ACM on Programming Languages10.1145/36564178:PLDI(990-1013)Online publication date: 20-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 5, Issue OOPSLA
October 2021
2001 pages
EISSN:2475-1421
DOI:10.1145/3492349
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021
Published in PACMPL Volume 5, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. bit-vectors
  2. hierarchical clustering
  3. points-to sets
  4. staged points-to analysis

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)21
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Thread-sensitive fuzzing for concurrency bug detectionComputers & Security10.1016/j.cose.2024.104171148(104171)Online publication date: Jan-2025
  • (2024)SPATA: Effective OS Bug Detection with Summary-Based, Alias-Aware, and Path-Sensitive Typestate AnalysisACM Transactions on Computer Systems10.1145/369525042:3-4(1-40)Online publication date: 6-Sep-2024
  • (2024)Scaling Type-Based Points-to Analysis with SaturationProceedings of the ACM on Programming Languages10.1145/36564178:PLDI(990-1013)Online publication date: 20-Jun-2024
  • (2024)Generic Sensitivity: Generics-Guided Context Sensitivity for Pointer AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.337764550:5(1144-1162)Online publication date: May-2024
  • (2023)Tai-e: A Developer-Friendly Static Analysis Framework for Java by Harnessing the Good Designs of ClassicsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598120(1093-1105)Online publication date: 12-Jul-2023
  • (2022)Path-sensitive and alias-aware typestate analysis for detecting OS bugsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507770(859-872)Online publication date: 28-Feb-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media