research-article

String solving with word equations and transducers: towards a logic for analysing mutation XSS

Authors:
Anthony W. Lin

Yale-NUS College, Singapore

Yale-NUS College, Singapore
View Profile

,
Pablo Barceló

University of Chile, Chile

University of Chile, Chile
View Profile

POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesJanuary 2016Pages 123–136https://doi.org/10.1145/2837614.2837641

Published:11 January 2016Publication History

POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Pages 123–136

ABSTRACT

We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic browser model. On the one hand, word equations (string logic with concatenations) cannot precisely capture sanitisation functions (e.g. htmlescape) and implicit browser transductions (e.g. innerHTML mutations). On the other hand, transducers suffer from the reverse problem of being able to model sanitisation functions and browser transductions, but not string concatenations. Naively combining word equations and transducers easily leads to an undecidable logic. Our main contribution is to show that the "straight-line fragment" of the logic is decidable (complexity ranges from PSPACE to EXPSPACE). The fragment can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions. We demonstrate that the logic can naturally express constraints required for analysing mutation XSS in web applications. Finally, the logic remains decidable in the presence of length, letter-counting, regular, indexOf, and disequality constraints.

References

BEK website (referred in Nov 2015). http://research. microsoft.com/en-us/projects/bek/.Google Scholar
OWASP XSS cheat sheet (referred in Nov 2015). https: //www.owasp.org/index.php/XSS_(Cross_Site_Scripting) _Prevention_Cheat_Sheet.Google Scholar
SAT competition (referred in Nov 2015). http://www. satcompetition.org/.Google Scholar
SMT competition (referred in Nov 2015). http://www.smtcomp. org/.Google Scholar
Google Closure Library (referred in Nov 2015). https:// developers.google.com/closure/library/.Google Scholar
HTML5 Security cheat sheet (referred in Nov 2015). http:// html5sec.org/.Google Scholar
P. A. Abdulla, M. F. Atig, Y. Chen, L. Holík, A. Rezine, P. Rümmer, and J. Stenman. String constraints for verification. In CAV, pages 150–166, 2014. Google ScholarDigital Library
D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In S&P, pages 387––401, 2008. Google ScholarDigital Library
P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst., 37(4):31, 2012. Google ScholarDigital Library
P. Barceló, D. Figueira, and L. Libkin. Graph logics with rational relations. Logical Methods in Computer Science, 9(3), 2013..Google Scholar
C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli. Satisfiability modulo theories. In Biere et al. {15}, pages 825–885..Google Scholar
W. Bekker and V. Goranko. Symbolic model checking of tense logics on rational Kripke models. In Infinity in Logic and Computation, International Conference, ILC 2007, Cape Town, South Africa, November 3-5, 2007, Revised Selected Papers, pages 2–20, 2007.. Google ScholarDigital Library
W. Bekker and V. Goranko. Symbolic model checking of tense logics on rational Kripke models. CoRR, abs/0810.5516, 2008.Google Scholar
J. Berstel. Transductions and Context-Free Languages. Teubner-Verlag, 1979.Google ScholarCross Ref
A. Biere, M. Heule, H. van Maaren, and T. Walsh, editors. Handbook of Satisfiability, volume 185 of Frontiers in Artificial Intelligence and Applications, 2009. IOS Press. Google ScholarDigital Library
N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility analysis for string-manipulating programs. In TACAS, pages 307–321, 2009.Google ScholarDigital Library
A. Blumensath and E. Grädel. Automatic structures. In LICS, pages 51–62, 2000.. Google ScholarDigital Library
A. Blumensath and E. Grädel. Finite Presentations of Infinite Structures: Automata and Interpretations. Theory Comput. Syst., 37(6):641– 674, 2004.Google Scholar
J. R. Büchi and S. Senger. Definability in the existential theory of concatenation and undecidable extensions of this theory. In The Collected Works of J. Richard Büchi, pages 671–683. Springer, 1990.Google Scholar
O. Carton, C. Choffrut, and S. Grigorieff. Decision problems among the main subfamilies of rational relations. ITA, 40(2):255–275, 2006.Google Scholar
A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In SAS, pages 1–18, 2003. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009. ISBN 0262033844, 9780262033848. Google ScholarDigital Library
L. D’Antoni and M. Veanes. Static analysis of string encoders and decoders. In VMCAI, pages 209–228, 2013.Google ScholarDigital Library
L. De Moura and N. Bjørner. Satisfiability modulo theories: introduction and applications. Commun. ACM, 54(9):69–77, 2011. Google ScholarDigital Library
V. Diekert. Makanin’s Algorithm. In M. Lothaire, editor, Algebraic Combinatorics on Words, volume 90 of Encyclopedia of Mathematics and its Applications, chapter 12, pages 387–442. Cambridge University Press, 2002.Google Scholar
V. D’Silva, D. Kroening, and G. Weissenbacher. A survey of automated techniques for formal software verification. IEEE Trans. on CAD of Integrated Circuits and Systems, 27(7):1165–1178, 2008. Google ScholarDigital Library
J. Esparza, P. Ganty, S. Kiefer, and M. Luttenberger. Parikh’s theorem: A simple and direct automaton construction. Inf. Process. Lett., 111 (12):614–619, 2011. Google ScholarDigital Library
X. Fu and C. Li. Modeling regular replacement for string constraint solving. In NFM, pages 67–76, 2010.Google Scholar
X. Fu, M. C. Powell, M. Bantegui, and C. Li. Simple linear string constraints. Formal Asp. Comput., 25(6):847–891, 2013.Google ScholarCross Ref
V. Ganesh, M. Minnes, A. Solar-Lezama, and M. Rinard. Word equations with length constraints: whats decidable? In Hardware and Software: Verification and Testing, pages 209–226. Springer, 2013. Google ScholarDigital Library
C. Gould, Z. Su, and P. T. Devanbu. Static checking of dynamically generated queries in database applications. In ICSE, pages 645–654, 2004. Google ScholarDigital Library
M. Heiderich, J. Schwenk, T. Frosch, J. Magazinius, and E. Z. Yang. mxss attacks: attacking well-secured web-applications by using innerhtml mutations. In CCS, pages 777–788, 2013. Google ScholarDigital Library
P. Hooimeijer and M. Veanes. An evaluation of automata algorithms for string analysis. In VMCAI, pages 248–262, 2011. Google ScholarDigital Library
P. Hooimeijer and W. Weimer. StrSolve: solving string constraints lazily. Autom. Softw. Eng., 19(4):531–559, 2012.Google ScholarCross Ref
P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In USENIX Security Symposium, 2011. URL http://static.usenix.org/events/ sec11/tech/full_papers/Hooimeijer.pdf. Google ScholarDigital Library
O. H. Ibarra. Reversal-bounded multicounter machines and their decision problems. J. ACM, 25(1):116–133, 1978. Google ScholarDigital Library
C. Kern. Securing the tangled web. Commun. ACM, 57(9):38–47, Sept. 2014. Google ScholarDigital Library
A. Kiezun et al. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol., 21(4):25, 2012. Google ScholarDigital Library
N. Klarlund, A. Møller, and M. I. Schwartzbach. MONA implementation secrets. International Journal of Foundations of Computer Science, 13(04):571–586, 2002.Google ScholarCross Ref
E. Kopczynski and A. W. To. Parikh images of grammars: Complexity and applications. In LICS, 2010. Google ScholarDigital Library
D. Kozen. Lower bounds for natural proof systems. In FOCS, pages 254–266, 1977. Google ScholarDigital Library
D. Kroening and O. Strichman. Decision Procedures. Springer, 2008.Google Scholar
T. Liang, A. Reynolds, C. Tinelli, C. Barrett, and M. Deters. A DPLL(T) theory solver for a theory of strings and regular expressions. In CAV, pages 646–662, 2014. Google ScholarDigital Library
A. W. Lin and P. Barceló. String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation XSS (Full Version). http://arxiv.org/abs/1511.01633 (cited in 2015). Google ScholarDigital Library
G. S. Makanin. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics, 32(2):129–198, 1977.Google Scholar
S. Malik and L. Zhang. Boolean satisfiability from theoretical hardness to practical success. Commun. ACM, 52(8):76–82, 2009. Google ScholarDigital Library
K. L. McMillan. Symbolic model checking. Kluwer, 1993. Google ScholarDigital Library
Y. Minamide. Static approximation of dynamically generated web pages. In WWW, pages 432–441, 2005. Google ScholarDigital Library
C. Morvan. On rational graphs. In FoSSaCS, pages 252–266, 2000. Google ScholarDigital Library
W. Plandowski. Satisfiability of word equations with constants is in PSPACE. In FOCS, pages 495–500, 1999. Google ScholarDigital Library
W. Plandowski. Satisfiability of word equations with constants is in PSPACE. J. ACM, 51(3):483–496, 2004. Google ScholarDigital Library
W. Plandowski. An efficient algorithm for solving word equations. In STOC, pages 467–476, 2006. Google ScholarDigital Library
G. Redelinghuys, W. Visser, and J. Geldenhuys. Symbolic execution of programs with strings. In SAICSIT, pages 139–148, 2012. Google ScholarDigital Library
J. Sakarovitch. Elements of automata theory. Cambridge University Press, 2009. Google ScholarDigital Library
Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regular expression matching into transducers. J. Applied Logic, 10(1):32–51, 2012. Google ScholarDigital Library
W. J. Savitch. Relationships between nondeterministic and deterministic tape complexities. J. Comput. Syst. Sci., 4(2):177–192, 1970. Google ScholarDigital Library
P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. A symbolic execution framework for javascript. In S&P, pages 513–– 528, 2010. Google ScholarDigital Library
P. Saxena, D. Molnar, and B. Livshits. SCRIPTGARD: automatic context-sensitive sanitization for large-scale legacy web applications. In CCS, pages 601–614, 2011. Google ScholarDigital Library
B. Scarpellini. Complexity of subcases of presburger arithmetic. Trans. of AMS, 284(1):203–218, 1984.Google ScholarCross Ref
S. Schwoon. Model-Checking Pushdown Systems. PhD thesis, Technischen Universität München, 2002.Google Scholar
M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997. Google ScholarDigital Library
B. Stock, S. Lekies, T. Mueller, P. Spiegel, and M. Johns. Precise client-side protection against dom-based cross-site scripting. In USENIX Security, pages 655–670, 2014. Google ScholarDigital Library
A. W. To. Model Checking Infinite-State Systems: Generic and Specific Approaches. PhD thesis, LFCS, School of Informatics, University of Edinburgh, 2010.Google Scholar
A. W. To and L. Libkin. Algorithmic metatheorems for decidable LTL model checking over infinite systems. In FOSSACS, 2010. Google ScholarDigital Library
M. Trinh, D. Chu, and J. Jaffar. S3: A symbolic string solver for vulnerability detection in web applications. In CCS, pages 1232–1243, 2014. Google ScholarDigital Library
M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjørner. Symbolic finite state transducers: algorithms and applications. In POPL, pages 137–150, 2012. Google ScholarDigital Library
G. Wassermann and Z. Su. Sound and precise analysis of web applications for injection vulnerabilities. In PLDI, pages 32–41, 2007. Google ScholarDigital Library
G. Wassermann and Z. Su. Static detection of cross-site scripting vulnerabilities. In ICSE, pages 171–180, 2008. Google ScholarDigital Library
G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, and Z. Su. Dynamic test input generation for web applications. In ISSTA, pages 249–260, 2008. Google ScholarDigital Library
J. Weinberger, P. Saxena, D. Akhawe, M. Finifter, E. C. R. Shin, and D. Song. A systematic analysis of XSS sanitization in web application frameworks. In ESORICS, pages 150–171, 2011. Google ScholarDigital Library
F. Yu, T. Bultan, and O. H. Ibarra. Symbolic string verification: Combining string analysis and size analysis. In TACAS, pages 322– 336, 2009. Google ScholarDigital Library
F. Yu, M. Alkhalaf, and T. Bultan. Stranger: An automata-based string analysis tool for PHP. In TACAS, pages 154–157, 2010. Benchmark can be found at http://www.cs.ucsb.edu/~vlab/stranger/. Google ScholarDigital Library
F. Yu, M. Alkhalaf, and T. Bultan. Patching vulnerabilities with sanitization synthesis. In ICSE, pages 251–260, 2011. Google ScholarDigital Library
F. Yu, T. Bultan, and O. H. Ibarra. Relational string verification using multi-track automata. Int. J. Found. Comput. Sci., 22(8):1909–1924, 2011.Google ScholarCross Ref
F. Yu, M. Alkhalaf, T. Bultan, and O. H. Ibarra. Automata-based symbolic string analysis for vulnerability detection. Formal Methods in System Design, 44(1):44–70, 2014. Google ScholarDigital Library
Y. Zheng, X. Zhang, and V. Ganesh. Z3-str: a Z3-based string solver for web application analysis. In ESEC/SIGSOFT FSE, pages 114–124, 2013. Google ScholarDigital Library

Index Terms

String solving with word equations and transducers: towards a logic for analysing mutation XSS
1. Theory of computation
  1. Logic
  2. Semantics and reasoning
    1. Program reasoning

Recommendations

Solving string constraints with Regex-dependent functions through transducers with priorities and variables

Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features ...
Read More
String solving with word equations and transducers: towards a logic for analysing mutation XSS
POPL '16

We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the ...
Read More
Copyful Streaming String Transducers
Special Issue on the 11th International Workshop on Reachability Problems (RP 2017)
Copyless streaming string transducers (copyless SST) have been introduced by R. Alur and P. Černý in 2010 as a one-way deterministic automata model to define transductions of finite strings. Copyless SST extend deterministic finite state automata with a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2016
815 pages
ISBN:9781450335492
DOI:10.1145/2837614
General Chair:
Rastislav Bodik
UC Berkeley, USA
,
Program Chair:
Rupak Majumdar
MPI-SWS, Germany
ACM SIGPLAN Notices Volume 51, Issue 1
POPL '16
January 2016
815 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2914770
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 January 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
XSS
string analysis
transducers
word equations
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate824of4,130submissions,20%
Upcoming Conference
POPL '25

Sponsor:

sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 421
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

String solving with word equations and transducers: towards a logic for analysing mutation XSS

POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

ABSTRACT

References

Cited By

Index Terms

Recommendations

Solving string constraints with Regex-dependent functions through transducers with priorities and variables

String solving with word equations and transducers: towards a logic for analysing mutation XSS

Copyful Streaming String Transducers