[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3377811.3380425acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Pipelining bottom-up data flow analysis

Published: 01 October 2020 Publication History

Abstract

Bottom-up program analysis has been traditionally easy to parallelize because functions without caller-callee relations can be analyzed independently. However, such function-level parallelism is significantly limited by the calling dependence - functions with caller-callee relations have to be analyzed sequentially because the analysis of a function depends on the analysis results, a.k.a., function summaries, of its callees. We observe that the calling dependence can be relaxed in many cases and, as a result, the parallelism can be improved. In this paper, we present Coyote, a framework of bottom-up data flow analysis, in which the analysis task of each function is elaborately partitioned into multiple sub-tasks to generate pipelineable function summaries. These sub-tasks are pipelined and run in parallel, even though the calling dependence exists. We formalize our idea under the IFDS/IDE framework and have implemented an application to checking null-dereference bugs and taint issues in C/C++ programs. We evaluate Coyote on a series of standard benchmark programs and open-source software systems, which demonstrates significant speedup over a conventional parallel design.

References

[1]
Aws Albarghouthi, Rahul Kumar, Aditya V Nori, and Sriram K Rajamani. 2012. Parallelizing top-down interprocedural analyses. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, 217--228.
[2]
Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining type-analysis with points-to analysis for analyzing Java library source-code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '15). ACM, 13--18.
[3]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, 259--269.
[4]
Domagoj Babic and Alan J. Hu. 2008. Calysto: Scalable and precise extended static checking. In Proceedings of the 30th International Conference on Software Engineering (ICSE '08). IEEE, 211--220.
[5]
Thomas Ball, Vladimir Levin, and Sriram K Rajamani. 2011. A decade of software model checking with SLAM. Commun. ACM 54, 7 (2011), 68--76.
[6]
Jiri Barnat, Lubos Brim, and Jitka Stříbrná. 2001. Distributed LTL model-checking in SPIN. In International SPIN Workshop on Model Checking of Software. Springer, 200--216.
[7]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '09). ACM, 243--262.
[8]
Cristiano Calcagno, Dino Distefano, Peter W. O'Hearn, and Hongseok Yang. 2011. Compositional shape analysis by means of bi-abduction. J. ACM 58, 6 (2011), 26:1--26:66.
[9]
Sagar Chaki, Edmund M Clarke, Alex Groce, Somesh Jha, and Helmut Veith. 2004. Modular verification of software components in C. IEEE Transactions on Software Engineering 30, 6 (2004), 388--402.
[10]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07). ACM, 480--491.
[11]
Chia Yuan Cho, Vijay D'Silva, and Dawn Song. 2013. BLITZ: Compositional bounded model checking for real-world programs. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE '13). IEEE, 136--146.
[12]
Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George Candea. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 4 (2010), 5--10.
[13]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.
[14]
Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf. 2015. A parallel abstract interpreter for JavaScript. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE, 34--45.
[15]
Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, 270--280.
[16]
Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Precise and compact modular procedure summaries for heap manipulating programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). ACM, 567--577.
[17]
Matthew B Dwyer, Sebastian Elbaum, Suzette Person, and Rahul Purandare. 2007. Parallel randomized state-space search. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE, 3--12.
[18]
Marcus Edvinsson, Jonas Lundberg, and Welf Löwe. 2011. Parallel points-to analysis for multi-core machines. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 45--54.
[19]
Stephen J Fink, Eran Yahav, Nurit Dor, G Ramalingam, and Emmanuel Geay. 2008. Effective typestate verification in the presence of aliasing. ACM Transactions on Software Engineering and Methodology (TOSEM) 17, 2 (2008), 9.
[20]
Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 143--152.
[21]
Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits. 2017. Toward full elasticity in distributed static analysis: the case of callgraph analysis. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (FSE '17). ACM, 442--453.
[22]
Orna Grumberg, Tamir Heyman, Nili Ifergan, and Assaf Schuster. 2005. Achieving speedups in distributed symbolic reachability analysis through asynchronous computation. In Advanced Research Working Conference on Correct Hardware Design and Verification Methods. Springer, 129--145.
[23]
Salvatore Guarnieri, Marco Pistoia, Omer Tripp, Julian Dolby, Stephen Teilhet, and Ryan Berg. 2011. Saving the world wide web from vulnerable JavaScript. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA '11). ACM, 177--187.
[24]
Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on. IEEE, 289--298.
[25]
Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An efficient tunable selective points-to analysis for large codebases. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP '17). ACM, 13--18.
[26]
Gerard J Holzmann and Dragan Bosnacki. 2007. The design of a multicore extension of the SPIN model checker. IEEE Transactions on Software Engineering 33, 10 (2007), 659--674.
[27]
G. Hulin. 1989. Parallel Processing of Recursive Queries in Distributed Architectures. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89). Morgan Kaufmann Publishers Inc., 87--96.
[28]
Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP '19). ACM, 327--339.
[29]
Yong-fong Lee and Barbara G Ryder. 1992. A comprehensive approach to parallel data flow analysis. In Proceedings of the 6th International Conference on Supercomputing. ACM, 236--247.
[30]
Jan Karel Lenstra and AHG Rinnooy Kan. 1978. Complexity of scheduling under precedence constraints. Operations Research 26, 1 (1978), 22--35.
[31]
Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) 41, 1 (2019), 6.
[32]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44--46.
[33]
Nuno P Lopes and Andrey Rybalchenko. 2011. Distributed and predictable software model checking. In International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 340--355.
[34]
Carlos Alberto Martínez-Angeles, Inês Dutra, Vítor Santos Costa, and Jorge Buenabad-Chávez. 2013. A datalog engine for gpus. In Declarative Programming and Knowledge Management. Springer, 152--168.
[35]
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '13). ACM, 554--564.
[36]
Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. 2012. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, 107--116.
[37]
Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel inclusion-based points-to analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). ACM, 428--443.
[38]
David Monniaux. 2005. The parallel implementation of the Astrée static analyzer. In Asian Symposium on Programming Languages and Systems. Springer, 86--96.
[39]
Nomair A Naeem and Ondrej Lhotak. 2008. Typestate-like analysis of multiple interacting objects. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications (OOPSLA '08). ACM, 347--366.
[40]
Nomair A Naeem and Ondrej Lhoták. 2009. Efficient alias set analysis using SSA form. In Proceedings of the 2009 International Symposium on Memory Management (ISMM '09). ACM, 79--88.
[41]
Vaivaswatha Nagaraj and R Govindarajan. 2013. Parallel flow-sensitive pointer analysis by graph-rewriting. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE, 19--28.
[42]
Damien Octeau, Patrick McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. 2013. Effective inter-component communication mapping in android: An essential step towards holistic security analysis. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security '13). USENIX Association, 543--558.
[43]
Tarun Prabhu, Shreyas Ramalingam, Matthew Might, and Mary Hall. 2011. EigenCFA: Accelerating flow analysis with GPUs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). ACM, 511--522.
[44]
Sandeep Putta and Rupesh Nasre. 2012. Parallel replication-based points-to analysis. In International Conference on Compiler Construction (CC '12). Springer, 61--80.
[45]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '95). ACM, 49--61.
[46]
Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding up slicing. In Proceedings of the 2nd ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE '94). ACM, 11--20.
[47]
Noam Rinetzky, Mooly Sagiv, and Eran Yahav. 2005. Interprocedural shape analysis for cutpoint-free programs. In International Static Analysis Symposium. Springer, 284--302.
[48]
Jonathan Rodriguez and Ondřej Lhoták. 2011. Actor-based parallel dataflow analysis. In International Conference on Compiler Construction (CC '11). Springer, 179--197.
[49]
Atanas Rountev, Mariana Sharp, and Guoqing Xu. 2008. IDE dataflow analysis in the presence of large object-oriented libraries. In International Conference on Compiler Construction (CC '08). Springer, 53--68.
[50]
Mooly Sagiv, Thomas Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theoretical Computer Science 167, 1 (1996), 131--170.
[51]
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On fast large-scale program analysis in datalog. In International Conference on Compiler Construction (CC '16). ACM, 196--206.
[52]
Jürgen Seib and Georg Lausen. 1991. Parallelizing Datalog programs by generalized pivoting. In Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. ACM, 241--251.
[53]
Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing large-scale Semi-Naïve datalog evaluation in hadoop. In International Datalog 2.0 Workshop. Springer, 165--176.
[54]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '18). ACM, 693--706.
[55]
Sharon Shoham, Eran Yahav, Stephen J Fink, and Marco Pistoia. 2008. Static specification mining using automata-based abstractions. IEEE Transactions on Software Engineering 34, 5 (2008), 651--666.
[56]
Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 32--41.
[57]
Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. IEEE, 451--460.
[58]
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering 40, 2 (2014), 107--122.
[59]
Omer Tripp, Marco Pistoia, Patrick Cousot, Radhia Cousot, and Salvatore Guarnieri. 2013. Andromeda: Accurate and scalable security analysis of web applications. In International Conference on Fundamental Approaches to Software Engineering. Springer, 210--225.
[60]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGOPS Operating Systems Review 51, 2 (2017), 389--404.
[61]
Ouri Wolfson and Aya Ozeri. 1990. A New Paradigm for Parallel and Distributed Rule-processing. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, 133--142.
[62]
Ouri Wolfson and Avi Silberschatz. 1988. Distributed Processing of Logic Programs. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD '88). ACM, 329--336.
[63]
Yichen Xie and Alex Aiken. 2005. Context- and path-sensitive memory leak detection. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE '05). ACM, 115--125.
[64]
Yichen Xie and Alex Aiken. 2005. Scalable error detection using Boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '05). ACM, 351--363.
[65]
Hongseok Yang, Oukseh Lee, Josh Berdine, Cristiano Calcagno, Byron Cook, Dino Distefano, and Peter O'Hearn. 2008. Scalable shape analysis for systems code. In International Conference on Computer Aided Verification. Springer, 385--398.
[66]
Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. The VLDB Journal - The International Journal on Very Large Data Bases 26, 2 (2017), 229--248.
[67]
Greta Yorsh, Eran Yahav, and Satish Chandra. 2008. Generating precise and concise procedure summaries. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '08). ACM, 221--234.
[68]
Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. 2019. Grapple: A graph system for static finite-state property checking of large-scale systems code. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, 38.

Cited By

View all
  • (2024)Exploring Scalability of Value-Flow Graph ConstructionProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690951(112-117)Online publication date: 19-Jul-2024
  • (2024)Fast Graph Simplification for Path-Sensitive Typestate Analysis through Tempo-Spatial Multi-Point SlicingProceedings of the ACM on Software Engineering10.1145/36437491:FSE(494-516)Online publication date: 12-Jul-2024
  • (2024) Octopus: Scaling Value-Flow Analysis via Parallel Collection of Realizable Path ConditionsACM Transactions on Software Engineering and Methodology10.1145/363274333:3(1-33)Online publication date: 24-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. IFDS/IDE
  2. bottom-up analysis
  3. compositional program analysis
  4. data flow analysis
  5. modular program analysis

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong
  • MSRA

Conference

ICSE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)5
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Scalability of Value-Flow Graph ConstructionProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690951(112-117)Online publication date: 19-Jul-2024
  • (2024)Fast Graph Simplification for Path-Sensitive Typestate Analysis through Tempo-Spatial Multi-Point SlicingProceedings of the ACM on Software Engineering10.1145/36437491:FSE(494-516)Online publication date: 12-Jul-2024
  • (2024) Octopus: Scaling Value-Flow Analysis via Parallel Collection of Realizable Path ConditionsACM Transactions on Software Engineering and Methodology10.1145/363274333:3(1-33)Online publication date: 24-Jan-2024
  • (2024)LibAlchemy: A Two-Layer Persistent Summary Design for Taming Third-Party Libraries in Static Bug-Finding SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639132(1-13)Online publication date: 20-May-2024
  • (2023)Hybrid Inlining: A Framework for Compositional and Context-Sensitive Static AnalysisProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598042(114-126)Online publication date: 12-Jul-2023
  • (2023)DStream: A Streaming-Based Highly Parallel IFDS FrameworkProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00208(2488-2500)Online publication date: 14-May-2023
  • (2022)Fast and precise application code analysis using a partial libraryProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510046(934-945)Online publication date: 21-May-2022
  • (2021)Path-sensitive sparse analysis without path conditionsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454086(930-943)Online publication date: 19-Jun-2021
  • (2021)Optimizing demand‐driven null dereference verification via merging branchesExpert Systems10.1111/exsy.1270739:6Online publication date: 27-May-2021
  • (2021)Accelerating Data-Flow Analysis with Full-Partitioning2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00184(1345-1352)Online publication date: Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media