Skip to main content

Symbolic String Transformations with Regular Lookahead and Rollback

  • Conference paper
  • First Online:
Perspectives of System Informatics (PSI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8974))

Abstract

Implementing string transformation routines, such as encoders, decoders, and sanitizers, correctly and efficiently is a difficult and error prone task. Such routines are often used in security critical settings, process large amounts of data, and must work efficiently and correctly. We introduce a new declarative language called Bex that builds on elements of regular expressions, symbolic automata and transducers, and enables a compilation scheme into C, C# or JavaScript that avoids many of the potential sources of errors that arise when such routines are implemented directly. The approach allows correctness analysis using symbolic automata theory that is not possible at the level of the generated code. Moreover, the case studies show that the generated code consistently outperforms hand-optimized code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The variable order of the BDD is the reverse bit order of the binary representation of a number, thus, the most significant bit has the lowest ordinal.

  2. 2.

    Regular Expression Language - Quick Reference: http://msdn.microsoft.com/en-us/library/az24scfc.aspx.

  3. 3.

    Observe that \(\mathfrak {D}_{}^0 =\{[]\}\) and \(\mathfrak {D}_{}^1 = \{[a]\mid a\in \mathfrak {D}_{}\}\).

  4. 4.

    No semantic distinction is made between characters and their numeric codes. Thus , , and 48 all denote number 48.

  5. 5.

    Predicates in \({\varPsi }_{\mathcal {U}}\) are denoted by regex character classes, or individual characters. The predicate \(\bot \) is denoted by the empty character class [].

References

  1. Alur, R., Cerný, P.: Streaming transducers for algorithmic verification of single-pass list-processing programs. In: POPL 2011, pp. 599–610. ACM (2011)

    Google Scholar 

  2. Alur, R., Filiot, E., Trivedi, A.: Regular transformations of infinite strings. In: LICS, pp. 65–74. IEEE (2012)

    Google Scholar 

  3. Balzarotti, D., Cova, M., Felmetsger, V., Jovanovic, N., Kirda, E., Kruegel, C., Vigna, G.: Saner: composing static and dynamic analysis to validate sanitization in web applications. In: SP 2008, pp. 387–401. IEEE (2008)

    Google Scholar 

  4. Bex (2013). http://www.rise4fun.com/Bex/tutorial

  5. Botincan, M., Babic, D.: Sigma*: symbolic learning of input-output specifications. In: POPL 2013, pp. 443–456. ACM (2013)

    Google Scholar 

  6. Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise analysis of string. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 1–18. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. D’Antoni, L., Veanes, M.: Equivalence of extended symbolic finite transducers. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 624–639. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. D’Antoni, L., Veanes, M.: Static analysis of string encoders and decoders. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) VMCAI 2013. LNCS, vol. 7737, pp. 209–228. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Dantoni, L., Veanes, M.: Minimization of symbolic automata. In: POPL 2014. ACM (2014)

    Google Scholar 

  10. de Moura, L., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Le Gall, T., Jeannet, B.: Lattice automata: a representation for languages on infinite alphabets, and some applications to verification. In: Riis Nielson, H., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 52–68. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Godefroid, P.: Compositional dynamic test generation. In: POPL 2007, pp. 47–54(2007)

    Google Scholar 

  13. Hooimeijer, P., Livshits, B., Molnar, D., Saxena, P., Veanes, M.: Fast and precise sanitizer analysis with Bek. In: USENIX Security, August 2011

    Google Scholar 

  14. Kaminski, M., Francez, N.: Finite-memory automata. TCS 134(2), 329–363 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  15. Kumar, S., Chandrasekaran, B., Turner, J., Varghese, G.: Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia. In: ANCS 2007, pp. 155–164. ACM/IEEE (2007)

    Google Scholar 

  16. Livshits, B., Nori, A.V., Rajamani, S.K., Banerjee, A.: Merlin: specification inference for explicit information flow problems. In: PLDI 2009, pp. 75–86 (2009)

    Google Scholar 

  17. Minamide, Y.: Static approximation of dynamically generated web pages. In: WWW 2005, pp. 432–441 (2005)

    Google Scholar 

  18. NVD. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

  19. OWASP. https://www.owasp.org/index.php/Double_Encoding

  20. SANS. http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php

  21. Segoufin, L.: Automata and logics for words and trees over an infinite alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Smith, R., Estan, C., Jha, S., Kong, S.: Deflating the big bang: fast and scalable deep packet inspection with extended finite automata. In: SIGCOMM 2008, pp. 207–218. ACM (2008)

    Google Scholar 

  23. Veanes, M., Bjørner, N.: Symbolic tree transducers. In: Clarke, E., Virbitskaite, I., Voronkov, A. (eds.) PSI 2011. LNCS, vol. 7162, pp. 377–393. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Veanes, M., Hooimeijer, P., Livshits, B., Molnar, D., Bjørner, N.: Symbolic finite state transducers: algorithms and applications. In: POPL 2012, pp. 137–150 (2012)

    Google Scholar 

  25. Wassermann, G., Yu, D., Chander, A., Dhurjati, D., Inamura, H., Su, Z.: Dynamic test input generation for web applications. In: ISSTA (2008)

    Google Scholar 

  26. Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 41–110. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  27. Z3. http://research.microsoft.com/projects/z3

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margus Veanes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Veanes, M. (2015). Symbolic String Transformations with Regular Lookahead and Rollback. In: Voronkov, A., Virbitskaite, I. (eds) Perspectives of System Informatics. PSI 2014. Lecture Notes in Computer Science(), vol 8974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46823-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46823-4_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46822-7

  • Online ISBN: 978-3-662-46823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics