[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

String constraints with concatenation and transducers solved efficiently

Published: 27 December 2017 Publication History

Abstract

String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature.

Supplementary Material

WEBM File (concatenationandtransducers.webm)

References

[1]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In CAV. 150–166.
[2]
Davide Balzarotti, Marco Cova, Viktoria Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2008. Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications. In S&P. 387–401.
[3]
Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013).
[4]
Pablo Barceló, Leonid Libkin, A. W. Lin, and Peter T. Wood. 2012. Expressive Languages for Path Queries over GraphStructured Data. ACM Trans. Database Syst. 37, 4 (2012), 31.
[5]
Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In Proc. of SMT’10.
[6]
Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. 2016. Efficient solving of string constraints for security analysis. In Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016. 4–6.
[7]
Daniel Le Berre and Anne Parrain. 2010. The Sat4j library, release 2.2. JSAT 7, 2-3 (2010), 59–6. http://jsat.ewi.tudelft.nl/ content/volume7/JSAT7_4_LeBerre.pdf
[8]
Jean Berstel. 1979. Transductions and Context-Free Languages. Teubner-Verlag.
[9]
Armin Biere, Keijo Heljanko, and Siert Wieringa. 2017. AIGER 1.9 and Beyond (Draft). http://fmv.jku.at/hwmcc11/beyond1.pdf (cited in 2017). (2017).
[10]
Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In TACAS. 307–321.
[11]
Aaron R. Bradley. 2012. Understanding IC3. In Theory and Applications of Satisfiability Testing - SAT 2012 - 15th International Conference, Trento, Italy, June 17-20, 2012. Proceedings. 1–14.
[12]
Robert Brayton and Alan Mishchenko. 2010. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification: 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings, Tayssir Touili, Byron Cook, and Paul Jackson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 24–40.
[13]
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur. 12, 2 (2008), 10:1–10:38.
[14]
Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic execution for software testing in practice: preliminary assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu, HI, USA, May 21-28, 2011. 1066–1071.
[15]
Roberto Cavada, Alessandro Cimatti, Michele Dorigatti, Alberto Griggio, Alessandro Mariotti, Andrea Micheli, Sergio Mover, Marco Roveri, and Stefano Tonetta. 2014. The nuXmv Symbolic Model Checker. In CAV’14 (Lecture Notes in Computer Science), Vol. 8559. Springer, 334–342.
[16]
Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. 1999. Model Checking. The MIT Press, Cambridge, Massachusetts.
[17]
Google co. 2015. Google Closure Library (referred in Nov 2015). https://developers.google.com/closure/library/ . (2015).
[18]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.
[19]
Arlen Cox and Jason Leasure. 2017. Model Checking Regular Language Constraints. CoRR abs/1708.09073 (2017). arXiv: 1708.09073 http://arxiv.org/abs/1708.09073
[20]
Loris D’Antoni, Zachary Kincaid, and Fang Wang. 2016. A Symbolic Decision Procedure for Symbolic Alternating Finite Automata. CoRR abs/1610.01722 (2016). http://arxiv.org/abs/1610.01722
[21]
Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In VMCAI. 209–228.
[22]
Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77.
[23]
Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.
[24]
Laurent Doyen and Jean-François Raskin. 2010. Antichain Algorithms for Finite Automata. In TACAS’10 (Lecture Notes in Computer Science), Vol. 6015. Springer, 2–22.
[25]
Alain Finkel. 1987. A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems. In Automata, Languages and Programming, 14th International Colloquium, ICALP87, Karlsruhe, Germany, July 13-17, 1987, Proceedings (Lecture Notes in Computer Science), Thomas Ottmann (Ed.), Vol. 267. Springer, 499–508.
[26]
Xiang Fu and Chung-Chih Li. 2010. Modeling Regular Replacement for String Constraint Solving. In NFM. 67–76.
[27]
Xiang Fu, Michael C. Powell, Michael Bantegui, and Chung-Chih Li. 2013. Simple linear string constraints. Formal Asp. Comput. 25, 6 (2013), 847–891.
[28]
Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin Rinard. 2013. Word equations with length constraints: what’s decidable? In Hardware and Software: Verification and Testing. Springer, 209–226.
[29]
Graeme Gange, Jorge A. Navas, Peter J. Stuckey, Harald Søndergaard, and Peter Schachte. 2013. Unbounded Model-Checking with Interpolation for Regular Language Constraints. In TACAS’2013 (Lecture Notes in Computer Science), Vol. 7795. Springer, 277–291.
[30]
Seymour Ginsburg and Edwin H. Spanier. 1966. Semigroups, Presburger formulas, and languages. Pacific J. Math. 16, 2 (1966), 285–296. http://projecteuclid.org/euclid.pjm/1102994974
[31]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 213–223.
[32]
Claudio Gutiérrez. 1998. Solving Equations in Strings: On Makanin’s Algorithm. In LATIN. 358–373.
[33]
Mario Heiderich, Jörg Schwenk, Tilman Frosch, Jonas Magazinius, and Edward Z. Yang. 2013. mXSS attacks: attacking well-secured web-applications by using innerHTML mutations. In CCS. 777–788.
[34]
Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In USENIX Security Symposium. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdf
[35]
Pieter Hooimeijer and Westley Weimer. 2012. StrSolve: solving string constraints lazily. Autom. Softw. Eng. 19, 4 (2012), 531–559.
[36]
Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51.
[37]
Scott Kausler and Elena Sherman. 2014. Evaluation of String Constraint Solvers in the Context of Symbolic Execution. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, New York, NY, USA, 259–270.
[38]
Christoph Kern. 2014. Securing the Tangled Web. Commun. ACM 57, 9 (Sept. 2014), 38–47.
[39]
Adam Kiezun and others. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25.
[40]
Nils Klarlund, Anders Møller, and Michael I. Schwartzbach. 2002. MONA Implementation Secrets. International Journal of Foundations of Computer Science 13, 4 (2002), 571–586.
[41]
Johannes Kloos, Rupak Majumdar, Filip Niksic, and Ruzica Piskac. 2013. Incremental, Inductive Coverability. In CAV’13 (Lecture Notes in Computer Science), Vol. 8044. Springer, 158–173.
[42]
Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.
[43]
Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In CAV. 646–662.
[44]
Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2016. An efficient SMT solver for string constraints. Formal Methods in System Design 48, 3 (2016), 206–234.
[45]
Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. 2015. A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings. In Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings. 135–150.
[46]
Anthony Widjaja Lin and Pablo Barceló. 2016. String solving with word equations and transducers: towards a logic for analysing mutation XSS. In POPL. 123–136.
[47]
Blake Loring, Duncan Mitchell, and Johannes Kinder. 2017. ExpoSE: Practical Symbolic Execution of Standalone JavaScript. In SPIN.
[48]
Gennady S Makanin. 1977. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics 32, 2 (1977), 129–198.
[49]
John McCarthy. 1980. Circumscription - A Form of Non-Monotonic Reasoning. Artif. Intell. 13, 1-2 (1980), 27–39.
[50]
Kenneth L. McMillan. 2003. Interpolation and SAT-Based Model Checking. In Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Proceedings. 1–13.
[51]
Christophe Morvan. 2000. On Rational Graphs. In FoSSaCS. 252–266.
[52]
Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In LPAR’04 (LNCS), Vol. 3452. Springer, 36–50.
[53]
OWASP. 2013. https://www.owasp.org/images/f/f8/OWASP_Top_10_- _2013.pdf . (2013).
[54]
Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496.
[55]
Wojciech Plandowski. 2006. An efficient algorithm for solving word equations. In STOC. 467–476.
[56]
Gideon Redelinghuys, Willem Visser, and Jaco Geldenhuys. 2012. Symbolic execution of programs with strings. In SAICSIT. 139–148.
[57]
Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Proceedings, 15th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LNCS), Vol. 5330. Springer, 274–289.
[58]
Jacques Sakarovitch. 2009. Elements of automata theory. Cambridge University Press.
[59]
Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In S&P. 513–528.
[60]
Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. 488–498.
[61]
Mary Sheeran, Satnam Singh, and Gunnar Stålmarck. 2000. Checking Safety Properties Using Induction and a SAT-Solver. In FMCAD (LNCS), Vol. 1954. Springer, 108–125.
[62]
Deian Tabakov and Moshe Y. Vardi. 2005. Experimental Evaluation of Classical Automata Constructions. In Logic for Programming, Artificial Intelligence, and Reasoning, 12th International Conference, LPAR 2005, Montego Bay, Jamaica, December 2-6, 2005, Proceedings (Lecture Notes in Computer Science), Geoff Sutcliffe and Andrei Voronkov (Eds.), Vol. 3835. Springer, 396–411.
[63]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In CCS. 1232–1243.
[64]
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. 218–240.
[65]
Moshe Y. Vardi. 1995. An Automata-Theoretic Approach to Linear Temporal Logic. In Logics for Concurrency - Structure versus Automata (8th Banff Higher Order Workshop, August 27 - September 3, 1995, Proceedings). 238–266.
[66]
Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjørner. 2012. Symbolic finite state transducers: algorithms and applications. In POPL. 137–150.
[67]
Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260.
[68]
Gary Wassermann, Dachuan Yu, Ajay Chander, Dinakar Dhurjati, Hiroshi Inamura, and Zhendong Su. 2008. Dynamic test input generation for web applications. In ISSTA. 249–260.
[69]
Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In ESORICS. 150–171.
[70]
Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In TACAS. 154–157. Benchmark can be found at http://www.cs.ucsb.edu/~vlab/stranger/ .
[71]
Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based symbolic string analysis for vulnerability detection. Formal Methods in System Design 44, 1 (2014), 44–70.
[72]
Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2009. Symbolic String Verification: Combining String Analysis and Size Analysis. In TACAS. 322–336.
[73]
Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2011. Relational String Verification Using Multi-Track Automata. Int. J. Found. Comput. Sci. 22, 8 (2011), 1909–1924.
[74]
Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In ESEC/SIGSOFT FSE. 114–124.

Cited By

View all
  • (2024)A Constraint Solving Approach to Parikh Images of Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36498558:OOPSLA1(1235-1263)Online publication date: 29-Apr-2024
  • (2024)A decision procedure for string constraints with string/integer conversion and flat regular constraintsActa Informatica10.1007/s00236-023-00446-461:1(23-52)Online publication date: 1-Mar-2024
  • (2024)A Closer Look at the Expressive Power of Logics Based on Word EquationsTheory of Computing Systems10.1007/s00224-023-10154-868:3(322-379)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 2, Issue POPL
January 2018
1961 pages
EISSN:2475-1421
DOI:10.1145/3177123
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2017
Published in PACMPL Volume 2, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Alternating Finite Automata
  2. Decision Procedure
  3. IC3
  4. String Solving

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)26
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Constraint Solving Approach to Parikh Images of Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36498558:OOPSLA1(1235-1263)Online publication date: 29-Apr-2024
  • (2024)A decision procedure for string constraints with string/integer conversion and flat regular constraintsActa Informatica10.1007/s00236-023-00446-461:1(23-52)Online publication date: 1-Mar-2024
  • (2024)A Closer Look at the Expressive Power of Logics Based on Word EquationsTheory of Computing Systems10.1007/s00224-023-10154-868:3(322-379)Online publication date: 1-Jun-2024
  • (2024)Mata: A Fast and Simple Finite Automata LibraryTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57249-4_7(130-151)Online publication date: 5-Apr-2024
  • (2023)Solving String Constraints with Lengths by StabilizationProceedings of the ACM on Programming Languages10.1145/36228727:OOPSLA2(2112-2141)Online publication date: 16-Oct-2023
  • (2023)Black Ostrich: Web Application Scanning with String SolversProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616582(549-563)Online publication date: 15-Nov-2023
  • (2023)On the Expressive Power of String ConstraintsProceedings of the ACM on Programming Languages10.1145/35712037:POPL(278-308)Online publication date: 11-Jan-2023
  • (2023)A symbolic algorithm for the case-split rule in solving word constraints with extensionsJournal of Systems and Software10.1016/j.jss.2023.111673201:COnline publication date: 1-Jul-2023
  • (2023)A Solver for Arrays with ConcatenationJournal of Automated Reasoning10.1007/s10817-022-09654-y67:1Online publication date: 7-Jan-2023
  • (2023)Reasoning About Regular Properties: A Comparative StudyAutomated Deduction – CADE 2910.1007/978-3-031-38499-8_17(286-306)Online publication date: 2-Sep-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media