[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ASE.2013.6693083acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

SEDGE: symbolic example data generation for dataflow programs

Published: 11 November 2013 Publication History

Abstract

Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain.
In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.

References

[1]
A. Behm, V. R. Borkar, M. J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. J. Tsotras. Asterix: towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases, 29(3):185-216, 2011.
[2]
C. Binnig, D. Kossmann, and E. Lo. Reverse query processing. In Proc. 23rd International Conference on Data Engineering (ICDE), pages 506-515. IEEE, Apr. 2007.
[3]
C. Binnig, D. Kossmann, E. Lo, and M. T. Özsu. QAGen: Generating query-aware test databases. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 341-352. ACM, June 2007.
[4]
M. Borges, M. d'Amorim, S. Anand, D. Bushnell, and C. S. Pasareanu. Symbolic execution with interval solving and meta-heuristic search. In Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, ICST '12, pages 111-120, Washington, DC, USA, 2012. IEEE Computer Society.
[5]
R. E. Bryant, S. M. German, and M. N. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In Proc. 11th International Conference on Computer Aided Verification (CAV), pages 470-482. Springer, 1999.
[6]
J. R. Burch and D. L. Dill. Automatic verification of pipelined microprocessor control. In Proc. 6th International Conference on Computer Aided Verification (CAV), pages 68-80. Springer, 1994.
[7]
C. Cadar and D. R. Engler. Execution generated test cases: How to make systems code crash itself. In Proc. 12th International SPIN Workshop on Model Checking of Software, pages 2-23. Springer, Aug. 2005.
[8]
L. De Moura and N. Bjørner. Satisfiability modulo theories: introduction and applications. Commun. ACM, 54(9):69-77, Sept. 2011.
[9]
M. Emmi, R. Majumdar, and K. Sen. Dynamic test input generation for database applications. In Proc. ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pages 151-162. ACM, July 2007.
[10]
P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 213-223. ACM, June 2005.
[11]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 59-72. ACM, 2007.
[12]
M. Islam and C. Csallner. Dsc+mock: a test case + mock class generator in support of coding against interfaces. In Proceedings of the Eighth International Workshop on Dynamic Analysis, WODA '10, pages 26-31, New York, NY, USA, 2010. ACM.
[13]
C. Li and C. Csallner. Dynamic symbolic database application testing. In Proc. 3rd International Workshop on Testing Database Systems (DBTest). ACM, June 2010.
[14]
S. Liang. Java Native Interface: Programmer's Guide and Specification. Prentice Hall, June 1999.
[15]
M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. Test input generation for database programs using relational constraints. In Proc. 5th International Workshop on Testing Database Systems (DBTest). ACM, May 2012.
[16]
C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In Proc. 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 245-256. ACM, 2009.
[17]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 1099-1110. ACM, 2008.
[18]
K. Pan, X. Wu, and T. Xie. Generating program inputs for database application testing. In Proc. 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 73-82. IEEE, Nov. 2011.
[19]
K. Sen and G. Agha. Cute and jCute: Concolic unit testing and explicit path model-checking tools. In Proc. 18th International Conference on Computer Aided Verification (CAV), pages 419-423. Springer, Aug. 2006.
[20]
Y. Smaragdakis and C. Csallner. Combining static and dynamic reasoning for bug detection. In Y. Gurevich and B. Meyer, editors, TAP, volume 4454 of Lecture Notes in Computer Science, pages 1-16. Springer, 2007.
[21]
Y. Smaragdakis, C. Csallner, and R. Subramanian. Scalable automatic test data generation from modeling diagrams. In Automated Software Engineering conference (ASE), pages 4-13. ACM Press, Nov. 2007.
[22]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - a warehousing solution over a mapreduce framework. PVLDB, 2(2):1626-1629, 2009.
[23]
N. Tillmann and J. De Halleux. Pex: White box test generation for .Net. In Proc. 2nd International Conference on Tests and Proofs (TAP), pages 134-153. Springer, 2008.
[24]
J. Tuya, M. J. Suárez-Cabal, and C. de la Riva. Full predicate coverage for testing SQL database queries. Software Testing, Verification & Reliability (STVR), 20(3):237-288, Sept. 2010.
[25]
M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In Proc. 16th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), pages 425-446. Springer, Apr. 2010.
[26]
X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise identification of problems for structural test generation. In Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, pages 611-620, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2023)Co-dependence Aware Fuzzing for Dataflow-Based Big Data AnalyticsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616298(1050-1061)Online publication date: 30-Nov-2023
  • (2021)Efficient fuzz testing for apache spark using framework abstractionProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00036(61-64)Online publication date: 25-May-2021
  • (2020)A domain-specific language for filtering in application-level gatewaysProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426955(111-123)Online publication date: 16-Nov-2020
  • Show More Cited By

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering
November 2013
765 pages
ISBN:9781479902156
  • General Chair:
  • Ewen Denney,
  • Program Chairs:
  • Tevfik Bultan,
  • Andreas Zeller

Sponsors

Publisher

IEEE Press

Publication History

Published: 11 November 2013

Check for updates

Qualifiers

  • Article

Conference

ASE '13
Sponsor:
ASE '13: Automated Software Engineering
November 11 - 15, 2013
CA, Silicon Valley, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Co-dependence Aware Fuzzing for Dataflow-Based Big Data AnalyticsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616298(1050-1061)Online publication date: 30-Nov-2023
  • (2021)Efficient fuzz testing for apache spark using framework abstractionProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00036(61-64)Online publication date: 25-May-2021
  • (2020)A domain-specific language for filtering in application-level gatewaysProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426955(111-123)Online publication date: 16-Nov-2020
  • (2020)BigTestProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings10.1145/3377812.3382145(61-64)Online publication date: 27-Jun-2020
  • (2020)BigFuzzProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416641(722-733)Online publication date: 21-Dec-2020
  • (2019)White-box testing of big data analytics with complex user-defined functionsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338953(290-301)Online publication date: 12-Aug-2019
  • (2016)Applying combinatorial test data generation to big data applicationsProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970325(637-647)Online publication date: 25-Aug-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media