More Web Proxy on the site http://driver.im/

Article

SEDGE: symbolic example data generation for dataflow programs

Authors:

Christoph Reichenbach,

Yannis Smaragdakis,

Christoph CsallnerAuthors Info & Claims

ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

Pages 235 - 245

https://doi.org/10.1109/ASE.2013.6693083

Published: 11 November 2013 Publication History

Publisher Site Get Access

Abstract

Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain.

In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.

References

[1]

A. Behm, V. R. Borkar, M. J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. J. Tsotras. Asterix: towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases, 29(3):185-216, 2011.

Digital Library

[2]

C. Binnig, D. Kossmann, and E. Lo. Reverse query processing. In Proc. 23rd International Conference on Data Engineering (ICDE), pages 506-515. IEEE, Apr. 2007.

[3]

C. Binnig, D. Kossmann, E. Lo, and M. T. Özsu. QAGen: Generating query-aware test databases. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 341-352. ACM, June 2007.

Digital Library

[4]

M. Borges, M. d'Amorim, S. Anand, D. Bushnell, and C. S. Pasareanu. Symbolic execution with interval solving and meta-heuristic search. In Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, ICST '12, pages 111-120, Washington, DC, USA, 2012. IEEE Computer Society.

Digital Library

[5]

R. E. Bryant, S. M. German, and M. N. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In Proc. 11th International Conference on Computer Aided Verification (CAV), pages 470-482. Springer, 1999.

Digital Library

[6]

J. R. Burch and D. L. Dill. Automatic verification of pipelined microprocessor control. In Proc. 6th International Conference on Computer Aided Verification (CAV), pages 68-80. Springer, 1994.

Digital Library

[7]

C. Cadar and D. R. Engler. Execution generated test cases: How to make systems code crash itself. In Proc. 12th International SPIN Workshop on Model Checking of Software, pages 2-23. Springer, Aug. 2005.

Digital Library

[8]

L. De Moura and N. Bjørner. Satisfiability modulo theories: introduction and applications. Commun. ACM, 54(9):69-77, Sept. 2011.

Digital Library

[9]

M. Emmi, R. Majumdar, and K. Sen. Dynamic test input generation for database applications. In Proc. ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pages 151-162. ACM, July 2007.

Digital Library

[10]

P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 213-223. ACM, June 2005.

Digital Library

[11]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys), pages 59-72. ACM, 2007.

Digital Library

[12]

M. Islam and C. Csallner. Dsc+mock: a test case + mock class generator in support of coding against interfaces. In Proceedings of the Eighth International Workshop on Dynamic Analysis, WODA '10, pages 26-31, New York, NY, USA, 2010. ACM.

Digital Library

[13]

C. Li and C. Csallner. Dynamic symbolic database application testing. In Proc. 3rd International Workshop on Testing Database Systems (DBTest). ACM, June 2010.

Digital Library

[14]

S. Liang. Java Native Interface: Programmer's Guide and Specification. Prentice Hall, June 1999.

Digital Library

[15]

M. Marcozzi, W. Vanhoof, and J.-L. Hainaut. Test input generation for database programs using relational constraints. In Proc. 5th International Workshop on Testing Database Systems (DBTest). ACM, May 2012.

Digital Library

[16]

C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In Proc. 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 245-256. ACM, 2009.

Digital Library

[17]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 1099-1110. ACM, 2008.

Digital Library

[18]

K. Pan, X. Wu, and T. Xie. Generating program inputs for database application testing. In Proc. 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 73-82. IEEE, Nov. 2011.

Digital Library

[19]

K. Sen and G. Agha. Cute and jCute: Concolic unit testing and explicit path model-checking tools. In Proc. 18th International Conference on Computer Aided Verification (CAV), pages 419-423. Springer, Aug. 2006.

Digital Library

[20]

Y. Smaragdakis and C. Csallner. Combining static and dynamic reasoning for bug detection. In Y. Gurevich and B. Meyer, editors, TAP, volume 4454 of Lecture Notes in Computer Science, pages 1-16. Springer, 2007.

Digital Library

[21]

Y. Smaragdakis, C. Csallner, and R. Subramanian. Scalable automatic test data generation from modeling diagrams. In Automated Software Engineering conference (ASE), pages 4-13. ACM Press, Nov. 2007.

Digital Library

[22]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - a warehousing solution over a mapreduce framework. PVLDB, 2(2):1626-1629, 2009.

Digital Library

[23]

N. Tillmann and J. De Halleux. Pex: White box test generation for .Net. In Proc. 2nd International Conference on Tests and Proofs (TAP), pages 134-153. Springer, 2008.

Digital Library

[24]

J. Tuya, M. J. Suárez-Cabal, and C. de la Riva. Full predicate coverage for testing SQL database queries. Software Testing, Verification & Reliability (STVR), 20(3):237-288, Sept. 2010.

Digital Library

[25]

M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In Proc. 16th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), pages 425-446. Springer, Apr. 2010.

Digital Library

[26]

X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise identification of problems for structural test generation. In Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, pages 611-620, New York, NY, USA, 2011. ACM.

Digital Library

Cited By

Humayun AKim MGulzar MChandra SBlincoe KTonella P(2023)Co-dependence Aware Fuzzing for Dataflow-Based Big Data AnalyticsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616298(1050-1061)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616298
Zhang QWang JGulzar MPadhye RKim MJuristo N(2021)Efficient fuzz testing for apache spark using framework abstractionProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00036(61-64)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1109/ICSE-Companion52605.2021.00036
Balldin HReichenbach CErwig MGray J(2020)A domain-specific language for filtering in application-level gatewaysProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426955(111-123)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.1145/3425898.3426955
Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

November 2013

765 pages

ISBN:9781479902156

General Chair:
Ewen Denney
SGT, NASA, Ames Research Center, Moffett Field
,
Program Chairs:
Tevfik Bultan
University of California, Santa Barbara
,
Andreas Zeller
Saarland University, Saarbrücken, Germany

Sponsors

NASA: National Aeronatics and Space Administration
SIGAI: ACM Special Interest Group on Artificial Intelligence
IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGSOFT: ACM Special Interest Group on Software Engineering
Microsoft: Microsoft

Publisher

IEEE Press

Publication History

Published: 11 November 2013

Check for updates

Qualifiers

Article

Conference

ASE '13

Sponsor:

NASA
SIGAI
IEEE
SIGSOFT
Microsoft

ASE '13: Automated Software Engineering

November 11 - 15, 2013

CA, Silicon Valley, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
31
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Humayun AKim MGulzar MChandra SBlincoe KTonella P(2023)Co-dependence Aware Fuzzing for Dataflow-Based Big Data AnalyticsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616298(1050-1061)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616298
Zhang QWang JGulzar MPadhye RKim MJuristo N(2021)Efficient fuzz testing for apache spark using framework abstractionProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00036(61-64)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1109/ICSE-Companion52605.2021.00036
Balldin HReichenbach CErwig MGray J(2020)A domain-specific language for filtering in application-level gatewaysProceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3425898.3426955(111-123)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.1145/3425898.3426955
Gulzar MMusuvathi MKim MRothermel GBae D(2020)BigTestProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings10.1145/3377812.3382145(61-64)Online publication date: 27-Jun-2020
https://dl.acm.org/doi/10.1145/3377812.3382145
Zhang QWang JGulzar MPadhye RKim MGrundy JLe Goues CLo D(2020)BigFuzzProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416641(722-733)Online publication date: 21-Dec-2020
https://dl.acm.org/doi/10.1145/3324884.3416641
Gulzar MMardani SMusuvathi MKim MDumas MPfahl DApel SRusso A(2019)White-box testing of big data analytics with complex user-defined functionsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338953(290-301)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338953
Li NLei YKhan HLiu JGuo YLo DApel SKhurshid S(2016)Applying combinatorial test data generation to big data applicationsProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970325(637-647)Online publication date: 25-Aug-2016
https://dl.acm.org/doi/10.1145/2970276.2970325

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents