ABSTRACT
We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic browser model. On the one hand, word equations (string logic with concatenations) cannot precisely capture sanitisation functions (e.g. htmlescape) and implicit browser transductions (e.g. innerHTML mutations). On the other hand, transducers suffer from the reverse problem of being able to model sanitisation functions and browser transductions, but not string concatenations. Naively combining word equations and transducers easily leads to an undecidable logic. Our main contribution is to show that the "straight-line fragment" of the logic is decidable (complexity ranges from PSPACE to EXPSPACE). The fragment can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions. We demonstrate that the logic can naturally express constraints required for analysing mutation XSS in web applications. Finally, the logic remains decidable in the presence of length, letter-counting, regular, indexOf, and disequality constraints.
- BEK website (referred in Nov 2015). http://research. microsoft.com/en-us/projects/bek/.Google Scholar
- OWASP XSS cheat sheet (referred in Nov 2015). https: //www.owasp.org/index.php/XSS_(Cross_Site_Scripting) _Prevention_Cheat_Sheet.Google Scholar
- SAT competition (referred in Nov 2015). http://www. satcompetition.org/.Google Scholar
- SMT competition (referred in Nov 2015). http://www.smtcomp. org/.Google Scholar
- Google Closure Library (referred in Nov 2015). https:// developers.google.com/closure/library/.Google Scholar
- HTML5 Security cheat sheet (referred in Nov 2015). http:// html5sec.org/.Google Scholar
- P. A. Abdulla, M. F. Atig, Y. Chen, L. Holík, A. Rezine, P. Rümmer, and J. Stenman. String constraints for verification. In CAV, pages 150–166, 2014. Google ScholarDigital Library
- D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In S&P, pages 387––401, 2008. Google ScholarDigital Library
- P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst., 37(4):31, 2012. Google ScholarDigital Library
- P. Barceló, D. Figueira, and L. Libkin. Graph logics with rational relations. Logical Methods in Computer Science, 9(3), 2013..Google Scholar
- C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli. Satisfiability modulo theories. In Biere et al. {15}, pages 825–885..Google Scholar
- W. Bekker and V. Goranko. Symbolic model checking of tense logics on rational Kripke models. In Infinity in Logic and Computation, International Conference, ILC 2007, Cape Town, South Africa, November 3-5, 2007, Revised Selected Papers, pages 2–20, 2007.. Google ScholarDigital Library
- W. Bekker and V. Goranko. Symbolic model checking of tense logics on rational Kripke models. CoRR, abs/0810.5516, 2008.Google Scholar
- J. Berstel. Transductions and Context-Free Languages. Teubner-Verlag, 1979.Google ScholarCross Ref
- A. Biere, M. Heule, H. van Maaren, and T. Walsh, editors. Handbook of Satisfiability, volume 185 of Frontiers in Artificial Intelligence and Applications, 2009. IOS Press. Google ScholarDigital Library
- N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility analysis for string-manipulating programs. In TACAS, pages 307–321, 2009.Google ScholarDigital Library
- A. Blumensath and E. Grädel. Automatic structures. In LICS, pages 51–62, 2000.. Google ScholarDigital Library
- A. Blumensath and E. Grädel. Finite Presentations of Infinite Structures: Automata and Interpretations. Theory Comput. Syst., 37(6):641– 674, 2004.Google Scholar
- J. R. Büchi and S. Senger. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi, pages 671–683. Springer, 1990.Google Scholar
- O. Carton, C. Choffrut, and S. Grigorieff. Decision problems among the main subfamilies of rational relations. ITA, 40(2):255–275, 2006.Google Scholar
- A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In SAS, pages 1–18, 2003. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009. ISBN 0262033844, 9780262033848. Google ScholarDigital Library
- L. D’Antoni and M. Veanes. Static analysis of string encoders and decoders. In VMCAI, pages 209–228, 2013.Google ScholarDigital Library
- L. De Moura and N. Bjørner. Satisfiability modulo theories: introduction and applications. Commun. ACM, 54(9):69–77, 2011. Google ScholarDigital Library
- V. Diekert. Makanin’s Algorithm. In M. Lothaire, editor, Algebraic Combinatorics on Words, volume 90 of Encyclopedia of Mathematics and its Applications, chapter 12, pages 387–442. Cambridge University Press, 2002.Google Scholar
- V. D’Silva, D. Kroening, and G. Weissenbacher. A survey of automated techniques for formal software verification. IEEE Trans. on CAD of Integrated Circuits and Systems, 27(7):1165–1178, 2008. Google ScholarDigital Library
- J. Esparza, P. Ganty, S. Kiefer, and M. Luttenberger. Parikh’s theorem: A simple and direct automaton construction. Inf. Process. Lett., 111 (12):614–619, 2011. Google ScholarDigital Library
- X. Fu and C. Li. Modeling regular replacement for string constraint solving. In NFM, pages 67–76, 2010.Google Scholar
- X. Fu, M. C. Powell, M. Bantegui, and C. Li. Simple linear string constraints. Formal Asp. Comput., 25(6):847–891, 2013.Google ScholarCross Ref
- V. Ganesh, M. Minnes, A. Solar-Lezama, and M. Rinard. Word equations with length constraints: whats decidable? In Hardware and Software: Verification and Testing, pages 209–226. Springer, 2013. Google ScholarDigital Library
- C. Gould, Z. Su, and P. T. Devanbu. Static checking of dynamically generated queries in database applications. In ICSE, pages 645–654, 2004. Google ScholarDigital Library
- M. Heiderich, J. Schwenk, T. Frosch, J. Magazinius, and E. Z. Yang. mxss attacks: attacking well-secured web-applications by using innerhtml mutations. In CCS, pages 777–788, 2013. Google ScholarDigital Library
- P. Hooimeijer and M. Veanes. An evaluation of automata algorithms for string analysis. In VMCAI, pages 248–262, 2011. Google ScholarDigital Library
- P. Hooimeijer and W. Weimer. StrSolve: solving string constraints lazily. Autom. Softw. Eng., 19(4):531–559, 2012.Google ScholarCross Ref
- P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In USENIX Security Symposium, 2011. URL http://static.usenix.org/events/ sec11/tech/full_papers/Hooimeijer.pdf. Google ScholarDigital Library
- O. H. Ibarra. Reversal-bounded multicounter machines and their decision problems. J. ACM, 25(1):116–133, 1978. Google ScholarDigital Library
- C. Kern. Securing the tangled web. Commun. ACM, 57(9):38–47, Sept. 2014. Google ScholarDigital Library
- A. Kiezun et al. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol., 21(4):25, 2012. Google ScholarDigital Library
- N. Klarlund, A. Møller, and M. I. Schwartzbach. MONA implementation secrets. International Journal of Foundations of Computer Science, 13(04):571–586, 2002.Google ScholarCross Ref
- E. Kopczynski and A. W. To. Parikh images of grammars: Complexity and applications. In LICS, 2010. Google ScholarDigital Library
- D. Kozen. Lower bounds for natural proof systems. In FOCS, pages 254–266, 1977. Google ScholarDigital Library
- D. Kroening and O. Strichman. Decision Procedures. Springer, 2008.Google Scholar
- T. Liang, A. Reynolds, C. Tinelli, C. Barrett, and M. Deters. A DPLL(T) theory solver for a theory of strings and regular expressions. In CAV, pages 646–662, 2014. Google ScholarDigital Library
- A. W. Lin and P. Barceló. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS (Full Version). http://arxiv.org/abs/1511.01633 (cited in 2015). Google ScholarDigital Library
- G. S. Makanin. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics, 32(2):129–198, 1977.Google Scholar
- S. Malik and L. Zhang. Boolean satisfiability from theoretical hardness to practical success. Commun. ACM, 52(8):76–82, 2009. Google ScholarDigital Library
- K. L. McMillan. Symbolic model checking. Kluwer, 1993. Google ScholarDigital Library
- Y. Minamide. Static approximation of dynamically generated web pages. In WWW, pages 432–441, 2005. Google ScholarDigital Library
- C. Morvan. On rational graphs. In FoSSaCS, pages 252–266, 2000. Google ScholarDigital Library
- W. Plandowski. Satisfiability of word equations with constants is in PSPACE. In FOCS, pages 495–500, 1999. Google ScholarDigital Library
- W. Plandowski. Satisfiability of word equations with constants is in PSPACE. J. ACM, 51(3):483–496, 2004. Google ScholarDigital Library
- W. Plandowski. An efficient algorithm for solving word equations. In STOC, pages 467–476, 2006. Google ScholarDigital Library
- G. Redelinghuys, W. Visser, and J. Geldenhuys. Symbolic execution of programs with strings. In SAICSIT, pages 139–148, 2012. Google ScholarDigital Library
- J. Sakarovitch. Elements of automata theory. Cambridge University Press, 2009. Google ScholarDigital Library
- Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regular expression matching into transducers. J. Applied Logic, 10(1):32–51, 2012. Google ScholarDigital Library
- W. J. Savitch. Relationships between nondeterministic and deterministic tape complexities. J. Comput. Syst. Sci., 4(2):177–192, 1970. Google ScholarDigital Library
- P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. A symbolic execution framework for javascript. In S&P, pages 513–– 528, 2010. Google ScholarDigital Library
- P. Saxena, D. Molnar, and B. Livshits. SCRIPTGARD: automatic context-sensitive sanitization for large-scale legacy web applications. In CCS, pages 601–614, 2011. Google ScholarDigital Library
- B. Scarpellini. Complexity of subcases of presburger arithmetic. Trans. of AMS, 284(1):203–218, 1984.Google ScholarCross Ref
- S. Schwoon. Model-Checking Pushdown Systems. PhD thesis, Technischen Universität München, 2002.Google Scholar
- M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997. Google ScholarDigital Library
- B. Stock, S. Lekies, T. Mueller, P. Spiegel, and M. Johns. Precise client-side protection against dom-based cross-site scripting. In USENIX Security, pages 655–670, 2014. Google ScholarDigital Library
- A. W. To. Model Checking Infinite-State Systems: Generic and Specific Approaches. PhD thesis, LFCS, School of Informatics, University of Edinburgh, 2010.Google Scholar
- A. W. To and L. Libkin. Algorithmic metatheorems for decidable LTL model checking over infinite systems. In FOSSACS, 2010. Google ScholarDigital Library
- M. Trinh, D. Chu, and J. Jaffar. S3: A symbolic string solver for vulnerability detection in web applications. In CCS, pages 1232–1243, 2014. Google ScholarDigital Library
- M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjørner. Symbolic finite state transducers: algorithms and applications. In POPL, pages 137–150, 2012. Google ScholarDigital Library
- G. Wassermann and Z. Su. Sound and precise analysis of web applications for injection vulnerabilities. In PLDI, pages 32–41, 2007. Google ScholarDigital Library
- G. Wassermann and Z. Su. Static detection of cross-site scripting vulnerabilities. In ICSE, pages 171–180, 2008. Google ScholarDigital Library
- G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, and Z. Su. Dynamic test input generation for web applications. In ISSTA, pages 249–260, 2008. Google ScholarDigital Library
- J. Weinberger, P. Saxena, D. Akhawe, M. Finifter, E. C. R. Shin, and D. Song. A systematic analysis of XSS sanitization in web application frameworks. In ESORICS, pages 150–171, 2011. Google ScholarDigital Library
- F. Yu, T. Bultan, and O. H. Ibarra. Symbolic string verification: Combining string analysis and size analysis. In TACAS, pages 322– 336, 2009. Google ScholarDigital Library
- F. Yu, M. Alkhalaf, and T. Bultan. Stranger: An automata-based string analysis tool for PHP. In TACAS, pages 154–157, 2010. Benchmark can be found at http://www.cs.ucsb.edu/~vlab/stranger/. Google ScholarDigital Library
- F. Yu, M. Alkhalaf, and T. Bultan. Patching vulnerabilities with sanitization synthesis. In ICSE, pages 251–260, 2011. Google ScholarDigital Library
- F. Yu, T. Bultan, and O. H. Ibarra. Relational string verification using multi-track automata. Int. J. Found. Comput. Sci., 22(8):1909–1924, 2011.Google ScholarCross Ref
- F. Yu, M. Alkhalaf, T. Bultan, and O. H. Ibarra. Automata-based symbolic string analysis for vulnerability detection. Formal Methods in System Design, 44(1):44–70, 2014. Google ScholarDigital Library
- Y. Zheng, X. Zhang, and V. Ganesh. Z3-str: a Z3-based string solver for web application analysis. In ESEC/SIGSOFT FSE, pages 114–124, 2013. Google ScholarDigital Library
Index Terms
- String solving with word equations and transducers: towards a logic for analysing mutation XSS
Recommendations
Solving string constraints with Regex-dependent functions through transducers with priorities and variables
Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features ...
String solving with word equations and transducers: towards a logic for analysing mutation XSS
POPL '16We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the ...
Copyful Streaming String Transducers
Special Issue on the 11th International Workshop on Reachability Problems (RP 2017)Copyless streaming string transducers (copyless SST) have been introduced by R. Alur and P. Černý in 2010 as a one-way deterministic automata model to define transductions of finite strings. Copyless SST extend deterministic finite state automata with a ...
Comments