[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Heaps don't lie: countering unsoundness with heap snapshots

Published: 12 October 2017 Publication History

Abstract

Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).

References

[1]
Edward E. Aftandilian, Sean Kelley, Connor Gramazio, Nathan Ricci, Sara L. Su, and Samuel Z. Guyer. 2010. Heapviz: Interactive Heap Visualization for Program Understanding and Debugging. In Proceedings of the 5th International Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA, 53–62.
[2]
Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: A Step Towards Reconciling Dynamically and Statically Typed OO Languages. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). ACM, New York, NY, USA, 53–64.
[3]
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1371–1382.
[4]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 259–269.
[5]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 169–190.
[6]
Sam Blackshear, Alexandra Gendreau, and Bor-Yuh Evan Chang. 2015. Droidel: A General Approach to Android Framework Modeling. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2015). ACM, New York, NY, USA, 19–25.
[7]
Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 241–250.
[8]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’09). ACM, New York, NY, USA, 243–261.
[9]
Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems 30, 19 (2002).
[10]
Patrick P. F. Chan, Lucas C. K. Hui, and S. M. Yiu. 2011. Dynamic Software Birthmark for Java Based on Heap Memory Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 94–107.
[11]
Stephen J. Fink. 2015. T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net . (2015).
[12]
Michael Furr, Jong-hoon (David) An, and Jeffrey S. Foster. 2009. Profile-guided Static Typing for Dynamic Scripting Languages. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, New York, NY, USA, 283–300.
[13]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, New York, NY, USA, 213–223.
[14]
Brian Goetz. 2010. One VM, Many Languages. https://gotocon.com/dl/jaoo- aarhus- 2010/slides/BrianGoetz_ OneVMManyLanguages.pdf . (2010).
[15]
Brian Goetz. 2016. Project Valhalla Update. http://www.oracle.com/technetwork/java/jvmls2016- goetz- 3126134.pdf . (August 2016). JVM Language Summit.
[16]
Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive Type Checking in Dynamically Typed Languages. In Proceedings of the 10th International Colloquium on Theoretical Aspects of Computing (Lecture Notes in Computer Science), Vol. 8049. Springer, 195–212.
[17]
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and Taint Analysis. Proceedings of the ACM on Programming Languages 1, OOPSLA (October 2017).
[18]
Martin Hirzel, Daniel Von Dincklage, Amer Diwan, and Michael Hind. 2007. Fast Online Pointer Analysis. ACM Transactions on Programming Languages and Systems 29, 2, Article 11 (April 2007), 55 pages.
[19]
Martin Hirzel, Amer Diwan, and Michael Hind. 2004. Pointer Analysis in the Presence of Dynamic Class Loading. In Proceedings of the 18th European Conference on Object-Oriented Programming (ECOOP ’04). Springer Berlin Heidelberg, Berlin, Heidelberg, 96–122.
[20]
Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Efficient Dynamic Analysis of the Synchronization Performance of Java Applications. In Proceedings of the 13th International Workshop on Dynamic Analysis (WODA 2015). ACM, New York, NY, USA, 14–18.
[21]
IBM. 2017. Using the HPROF Profiler. https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.lnx. 80.doc/diag/tools/hprof.html . (2017).
[22]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. Springer International Publishing, Cham, 422–430.
[23]
George Kastrinis and Yannis Smaragdakis. 2013. Hybrid Context-Sensitivity for Points-To Analysis. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 423–434.
[24]
Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for Static Analysis of Java Reflection – Literature Review and Empirical Study. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017.
[25]
Sungho Lee, Julian Dolby, and Sukyoung Ryu. 2016. HybriDroid: Static Analysis Framework for Android Hybrid Applications. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 250–261.
[26]
Lian Li, Yi Lu, and Jingling Xue. 2017. Dynamic Symbolic Execution for Polymorphism. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017). ACM, New York, NY, USA, 120–130.
[27]
Yue Li, Tian Tan, and Jingling Xue. 2015. Effective Soundness-Guided Reflection Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 162–180.
[28]
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java Virtual Machine Specification, Java SE 8 Edition. (2014).
[29]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46.
[30]
Evan K. Maxwell, Godmar Back, and Naren Ramakrishnan. 2010. Diagnosing Memory Leaks Using Graph Mining on Heap Dumps. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10). ACM, New York, NY, USA, 115–124.
[31]
Eric Mercer and Michael Jones. 2005. Model Checking Machine Code with the GNU Debugger. In Proceedings of the 12th International Conference on Model Checking Software (SPIN’05). Springer-Verlag, Berlin, Heidelberg, 251–265.
[32]
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (2005), 1–41.
[33]
Oracle. 2014a. Java Platform, Standard Edition Java Flight Recorder Runtime Guide. https://docs.oracle.com/javacomponents/ jmc- 5- 4/jfr- runtime- guide/about.htm . (2014).
[34]
Oracle. 2014b. JSR 335: Lambda Expressions for the Java™ Programming Language. (2014). https://jcp.org/en/jsr/detail?id=335
[35]
Oracle. 2016a. HPROF: A Heap/CP U Profiling Tool. http://docs.oracle.com/javase/8/docs/technotes/samples/hprof.html . (2016).
[36]
Oracle. 2016b. jhat. (2016). https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jhat.html
[37]
Oracle. 2016c. VisualVM: Home. https://visualvm.github.io/ . (2016).
[38]
Oracle. 2017. JEP 280: Indify String Concatenation. (2017). http://openjdk.java.net/jeps/280
[39]
Alex Potanin, James Noble, and Robert Biddle. 2004. Checking Ownership and Confinement: Research Articles. Concurrency and Computation: Practice & Experience - Formal Techniques for Java-like Programs 16, 7 (June 2004), 671–687.
[40]
S. P. Reiss. 2009. Visualizing the Java heap to detect memory problems. In 2009 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis. 73–80.
[41]
John R. Rose. 2009. Bytecodes Meet Combinators: Invokedynamic on the JVM. In Proceedings of the Third Workshop on Virtual Machines and Intermediate Languages (VMIL ’09). ACM, New York, NY, USA, Article 2, 11 pages.
[42]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). ACM, New York, NY, USA, 263–272.
[43]
Micha Sharir and Amir Pnueli. 1981. Two Approaches to Interprocedural Data Flow Analysis. In Program flow analysis: theory and applications, Steven S. Muchnick and Neil D. Jones (Eds.). Prentice-Hall, Inc., Englewood Cliffs, NJ, Chapter 7, 189–233.
[44]
Yannis Smaragdakis, Martin Bravenboer, and Ondřej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 17–30.
[45]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative research (CASCON ’99). IBM Press, 125–135.
[46]
Noriaki Yoshiura and Wei Wei. 2014. Static Data Race Detection for Java Programs with Dynamic Class Loading. Springer International Publishing, Cham, 161–173.
[47]
YourKit. 2017. https://www.yourkit.com/ . (2017).
[48]
Yury Zhauniarovich, Maqsood Ahmad, Olga Gadyatskaya, Bruno Crispo, and Fabio Massacci. 2015. StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Applications. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy (CODASPY ’15). ACM, New York, NY, USA, 37–48.

Cited By

View all
  • (2024)Reducing Static Analysis Unsoundness with Approximate InterpretationProceedings of the ACM on Programming Languages10.1145/36564248:PLDI(1165-1188)Online publication date: 20-Jun-2024
  • (2023)Extensible and Scalable Architecture for Hybrid AnalysisProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596146(34-39)Online publication date: 6-Jun-2023
  • (2023)On Retrofitting Provenance for Transparent and Fair Software - Drivers and Challenges2023 IEEE/ACM International Workshop on Equitable Data & Technology (FairWare)10.1109/FairWare59297.2023.00007(14-21)Online publication date: May-2023
  • Show More Cited By

Index Terms

  1. Heaps don't lie: countering unsoundness with heap snapshots

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 1, Issue OOPSLA
    October 2017
    1786 pages
    EISSN:2475-1421
    DOI:10.1145/3152284
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2017
    Published in PACMPL Volume 1, Issue OOPSLA

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Heap Profiling
    2. Instrumentation
    3. Program Analysis
    4. Soundness

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)88
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reducing Static Analysis Unsoundness with Approximate InterpretationProceedings of the ACM on Programming Languages10.1145/36564248:PLDI(1165-1188)Online publication date: 20-Jun-2024
    • (2023)Extensible and Scalable Architecture for Hybrid AnalysisProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596146(34-39)Online publication date: 6-Jun-2023
    • (2023)On Retrofitting Provenance for Transparent and Fair Software - Drivers and Challenges2023 IEEE/ACM International Workshop on Equitable Data & Technology (FairWare)10.1109/FairWare59297.2023.00007(14-21)Online publication date: May-2023
    • (2022)A theory of monitorsInformation and Computation10.1016/j.ic.2021.104704281:COnline publication date: 3-Jan-2022
    • (2022)Automatic inspection of program state in an uncooperative environmentSoftware: Practice and Experience10.1002/spe.314652:12(2727-2758)Online publication date: 31-Aug-2022
    • (2021)A Survey of Parametric Static AnalysisACM Computing Surveys10.1145/346445754:7(1-37)Online publication date: 18-Jul-2021
    • (2021)PyCGProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00146(1646-1657)Online publication date: 22-May-2021
    • (2020)Putting the semantics into semantic versioningProceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3426428.3426922(157-179)Online publication date: 18-Nov-2020
    • (2020)Identifying Java calls in native code via binary scanningProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3395363.3397368(388-400)Online publication date: 18-Jul-2020
    • (2020)On the recall of static call graph construction in practiceProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380441(1049-1060)Online publication date: 27-Jun-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media