Robustness Testing of Intermediate Verifiers

Chen, YuTing; Furia, Carlo A.

doi:10.1007/978-3-030-01090-4_6

YuTing Chen¹⁵ &
Carlo A. Furia¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11138))

Included in the following conference series:

International Symposium on Automated Technology for Verification and Analysis

1259 Accesses
3 Citations

Abstract

Program verifiers are not exempt from the bugs that affectnearly every piece of software. In addition, they often exhibit brittle behavior: their performance changes considerably with details of how the input program is expressed—details that should be irrelevant, such as the order of independent declarations. Such a lack of robustness frustrates users who have to spend considerable time figuring out a tool’s idiosyncrasies before they can use it effectively. This paper introduces a technique to detect lack of robustness of program verifiers; the technique is lightweight and fully automated, as it is based on testing methods (such as mutation testing and metamorphic testing). The key idea is to generate many simple variants of a program that initially passes verification. All variants are, by construction, equivalent to the original program; thus, any variant that fails verification indicates lack of robustness in the verifier. We implemented our technique in a tool called \(\mu \) gie, which operates on programs written in the popular Boogie language for verification—used as intermediate representation in numerous program verifiers. Experiments targeting 135 Boogie programs indicate that brittle behavior occurs fairly frequently (16 programs) and is not hard to trigger. Based on these results, the paper discusses the main sources of brittle behavior and suggests means of improving robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, the term “verification” also designates validation techniques such as testing.
2.
http://boogie-docs.readthedocs.io/en/latest/#front-ends-that-emit-boogie-ivl.
3.
By an anonymous reviewer of FM 2018.
4.
[6] describes some experiments with seeds that fail verification. Unsurprisingly, random mutations are unlikely to turn an unverified program into a verified one—therefore, the main paper focuses on using verified programs as seeds.
5.
See Sect. 5 for a discussion of how robustness testing differs from traditional mutation testing.
6.
For clarity, we initially focus on Boogie 4.5.0, and later discuss differences with other versions.
7.
Additionally, Why3 times out on 51 mutants of 2 seeds in group S; this seems to reflect an ineffective translation performed by b2w [1] rather than brittleness of Why3.

References

Ameri, M., Furia, C.A.: Why just Boogie? In: Ábrahám, E., Huisman, M. (eds.) IFM 2016. LNCS, vol. 9681, pp. 79–95. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33693-0_6
Chapter Google Scholar
AutoProof verified code repository. http://tiny.cc/autoproof-repo
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)
Article Google Scholar
Chen, T.Y., Cheung, S.C., Yiu, S.M.: Metamorphic testing: a new approach for generating next test cases. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology (1998)
Google Scholar
Chen, Y.T., Furia, C.A.: Triggerless happy. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 295–311. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_19
Chapter Google Scholar
Chen, Y.T., Furia, C.A.: Robustness testing of intermediate verifiers. http://arxiv.org/abs/1805.03296 (2018)
Claessen, K., Hughes, J.: Quickcheck: a lightweight tool for random testing of Haskell programs. In: ICFP, pp. 268–279. ACM (2000)
Google Scholar
Dafny examples and tests. https://github.com/Microsoft/dafny/tree/master/Test
Filliâtre, J.-C., Paskevich, A.: Why3—where programs meet provers. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 125–128. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-6_8
Chapter Google Scholar
Furia, C.A., Meyer, B., Velder, S.: Loop invariants: analysis, classification, and examples. ACM Comput. Surv. 46(3) (2014)
Article Google Scholar
Furia, C.A., Nordio, M., Polikarpova, N., Tschannen, J.: AutoProof: auto-active functional verification of object-oriented programs. STTT 19(6), 697–716 (2016)
Article Google Scholar
Godefroid, P., Levin, M.Y., Molnar, D.A.: SAGE: whitebox fuzzing for security testing. Commun. ACM 55(3), 40–44 (2012)
Article Google Scholar
Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L., Setty, S.T.V., Zill, B.: IronFleet: proving practical distributed systems correct. In: SOSP, pp. 1–17. ACM (2015)
Google Scholar
Hawblitzel, C., Howell, J., Lorch, J.R., Narayan, A., Parno, B., Zhang, D., Zill, B.: Ironclad Apps: end-to-end security via automated full-system verification. In: USENIX OSDI, pp. 165–181. USENIX Association (2014)
Google Scholar
Hierons, R.M., et al.: Using formal specifications to support testing. ACM Comput. Surv. 41(2), 9:1–9:76 (2009)
Article Google Scholar
Jia, Y., Harman, M.: An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng. 37(5), 649–678 (2011)
Article Google Scholar
Leino, K.R.M.: This is Boogie 2 (2008). http://goo.gl/QsH6g
Leino, K., Rustan, M.: Dafny: an automatic program verifier for functional correctness. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010. LNCS (LNAI), vol. 6355, pp. 348–370. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17511-4_20
Chapter Google Scholar
Leino, K.R.M., Pit-Claudel, C.: Trigger selection strategies to stabilize program verifiers. In: CAV, pp. 361–381. Springer, Berlin (2016)
Google Scholar
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Article Google Scholar
Liew, D., Cadar, C., Donaldson, A.F.: Symbooglix: A symbolic execution engine for boogie programs. In: ICST, pp. 45–56. IEEE Computer Society (2016)
Google Scholar
McKeeman, W.M.: Differential testing for software. Digit. Tech. J. 10(1), 100–107 (1998)
Google Scholar
\(\mu \)gie repository. https://emptylambda.github.io/mu-gie/
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: ICSE, pp. 75–84. IEEE Computer Society (2007)
Google Scholar
Polikarpova, N., Furia, C.A., West, S.: To run what no one has run before: executing an intermediate verification language. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174, pp. 251–268. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40787-1_15
Chapter Google Scholar
Segura, S., Fraser, G., Sanchez, A.B., Ruiz-Cortés, A.: A survey on metamorphic testing. IEEE Trans. Softw. Eng. 42(9), 805–824 (2016)
Article Google Scholar
Tange, O.: GNU parallel—the command-line power tool. Login: USENIX Mag. 36, 42–47 (2011)
Google Scholar
Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. ACM SIGPLAN Not. ACM 46, 283–294 (2011)
Article Google Scholar
Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Eng. 28(2), 183–200 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
YuTing Chen & Carlo A. Furia

Authors

YuTing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Carlo A. Furia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YuTing Chen .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Shuvendu K. Lahiri
University of Southern California, Los Angeles, CA, USA
Chao Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Furia, C.A. (2018). Robustness Testing of Intermediate Verifiers. In: Lahiri, S., Wang, C. (eds) Automated Technology for Verification and Analysis. ATVA 2018. Lecture Notes in Computer Science(), vol 11138. Springer, Cham. https://doi.org/10.1007/978-3-030-01090-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-01090-4_6
Published: 30 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01089-8
Online ISBN: 978-3-030-01090-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics