Skip to main content

Probabilistic Lipschitz Analysis of Neural Networks

  • Conference paper
  • First Online:
Static Analysis (SAS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12389))

Included in the following conference series:

  • 604 Accesses

Abstract

We are interested in algorithmically proving the robustness of neural networks. Notions of robustness have been discussed in the literature; we are interested in probabilistic notions of robustness that assume it feasible to construct a statistical model of the process generating the inputs of a neural network. We find this a reasonable assumption given the rapid advances in algorithms for learning generative models of data. A neural network f is then defined to be probabilistically robust if, for a randomly generated pair of inputs, f is likely to demonstrate k-Lipschitzness, i.e., the distance between the outputs computed by f is upper-bounded by the \(k^{th}\) multiple of the distance between the pair of inputs. We name this property, probabilistic Lipschitzness.

We model generative models and neural networks, together, as programs in a simple, first-order, imperative, probabilistic programming language, \(pcat\). Inspired by a large body of existing literature, we define a denotational semantics for this language. Then we develop a sound local Lipschitzness analysis for \(cat\), a non-probabilistic sublanguage of \(pcat\). This analysis can compute an upper bound of the “Lipschitzness” of a neural network in a bounded region of the input set. We next present a provably correct algorithm, \(\mathtt{PROLIP}\), that analyzes the behavior of a neural network in a user-specified box-shaped input region and computes - (i) lower bounds on the probabilistic mass of such a region with respect to the generative model, (ii) upper bounds on the Lipschitz constant of the neural network in this region, with the help of the local Lipschitzness analysis. Finally, we present a sketch of a proof-search algorithm that uses \(\mathtt{PROLIP}\) as a primitive for finding proofs of probabilistic Lipschitzness. We implement the \(\mathtt{PROLIP}\) algorithm and empirically evaluate the computational complexity of \(\mathtt{PROLIP}\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Recent work has tried to combine loss functions with logical constraints [27].

  2. 2.

    \(pcat\) has no \(\mathtt{observe}\) or \(\mathtt{score}\) construct and cannot be used for Bayesian reasoning.

References

  1. Albarghouthi, A., D’Antoni, L., Drews, S., Nori, A.V.: FairSquare: probabilistic verification of program fairness. Proc. ACM Program. Lang. 1(OOPSLA), 80:1–80:30 (2017)

    Article  Google Scholar 

  2. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2890–2896. Association for Computational Linguistics, Brussels (October 2018)

    Google Scholar 

  3. Baluta, T., Shen, S., Shinde, S., Meel, K.S., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, pp. 1249–1264. Association for Computing Machinery, London (November 2019)

    Google Scholar 

  4. Bárány, I., Füredi, Z.: Computing the volume is difficult. Discret. Comput. Geom. 2(4), 319–326 (1987)

    Article  MathSciNet  Google Scholar 

  5. Barthe, G., D’Argenio, P., Rezk, T.: Secure information flow by self-composition. In: Proceedings of 17th IEEE Computer Security Foundations Workshop, 2004, pp. 100–114 (June 2004)

    Google Scholar 

  6. Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 200–214. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21437-0_17

    Chapter  Google Scholar 

  7. Barthe, G., Espitau, T., Ferrer Fioriti, L.M., Hsu, J.: Synthesizing probabilistic invariants via Doob’s decomposition. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 43–61. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41528-4_3

    Chapter  Google Scholar 

  8. Barthe, G., Espitau, T., Gaboardi, M., Grégoire, B., Hsu, J., Strub, P.-Y.: An assertion-based program logic for probabilistic programs. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol. 10801, pp. 117–144. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89884-1_5

    Chapter  Google Scholar 

  9. Barthe, G., Espitau, T., Grégoire, B., Hsu, J., Strub, P.Y.: Proving expected sensitivity of probabilistic programs. Proc. ACM Program. Lang. 2(POPL), 57:1–57:29 (2017)

    Google Scholar 

  10. Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2613–2621. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6339-measuring-neural-net-robustness-with-constraints.pdf

  11. Bastani, O., Zhang, X., Solar-Lezama, A.: Probabilistic verification of fairness properties via concentration. Proc. ACM Program. Lang. 3(OOPSLA), 118:1–118:27 (2019)

    Article  Google Scholar 

  12. Benton, N.: Simple relational correctness proofs for static analyses and program transformations. In: Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004, pp. 14–25. Association for Computing Machinery, Venice (January 2004)

    Google Scholar 

  13. Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 513–530 (2016). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/carlini

  14. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7 (May 2018)

    Google Scholar 

  15. Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 511–526. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_34

    Chapter  Google Scholar 

  16. Chaudhuri, S., Gulwani, S., Lublinerman, R., Navidpour, S.: Proving programs robust. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE 2011, pp. 102–112. Association for Computing Machinery, Szeged (September 2011)

    Google Scholar 

  17. Chen, A.: Aaron-xichen/pytorch-playground (May 2020). https://github.com/aaron-xichen/pytorch-playground

  18. Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: 2008 21st IEEE Computer Security Foundations Symposium, pp. 51–65 (June 2008)

    Google Scholar 

  19. Combettes, P.L., Pesquet, J.C.: Lipschitz certificates for neural network structures driven by averaged activation operators. arXiv:1903.01014 (2019)

  20. Cousins, B., Vempala, S.: Gaussian cooling and \({O}{^{*}}\)\((n{^{3}})\) algorithms for volume and Gaussian volume. SIAM J. Comput. 47(3), 1237–1273 (2018)

    Article  MathSciNet  Google Scholar 

  21. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1978, pp. 84–96. Association for Computing Machinery, Tucson (January 1978)

    Google Scholar 

  22. Cousot, P., Monerau, M.: Probabilistic abstract interpretation. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 169–193. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28869-2_9

    Chapter  Google Scholar 

  23. Dyer, M.E., Frieze, A.M.: On the complexity of computing the volume of a polyhedron. SIAM J. Comput. 17(5), 967–974 (1988)

    Article  MathSciNet  Google Scholar 

  24. Dyer, M., Frieze, A., Kannan, R.: A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM 38(1), 1–17 (1991)

    Article  MathSciNet  Google Scholar 

  25. Elekes, G.: A geometric inequality and the complexity of computing volume. Discret. Comput. Geom. 1(4), 289–292 (1986)

    Article  MathSciNet  Google Scholar 

  26. Fazlyab, M., Robey, A., Hassani, H., Morari, M., Pappas, G.: Efficient and accurate estimation of lipschitz constants for deep neural networks. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 11427–11438. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/9319-efficient-and-accurate-estimation-of-lipschitz-constants-for-deep-neural-networks.pdf

  27. Fischer, M., Balunovic, M., Drachsler-Cohen, D., Gehr, T., Zhang, C., Vechev, M.: DL2: training and querying neural networks with logic. In: International Conference on Machine Learning, pp. 1931–1941 (May 2019). http://proceedings.mlr.press/v97/fischer19a.html

  28. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (May 2018)

    Google Scholar 

  29. Geldenhuys, J., Dwyer, M.B., Visser, W.: Probabilistic symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pp. 166–176. Association for Computing Machinery, Minneapolis (July 2012)

    Google Scholar 

  30. Ghorbal, K., Goubault, E., Putot, S.: The zonotope abstract domain Taylor1+. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 627–633. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_47

    Chapter  Google Scholar 

  31. Gibbons, J.: APLicative programming with Naperian functors. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 556–583. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54434-1_21

    Chapter  Google Scholar 

  32. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  33. Gouk, H., Frank, E., Pfahringer, B., Cree, M.: Regularisation of neural networks by enforcing lipschitz continuity. arXiv:1804.04368 (September 2018). http://arxiv.org/abs/1804.04368

  34. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 448–456. JMLR.org, Lille (July 2015)

    Google Scholar 

  35. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2021–2031. Association for Computational Linguistics, Copenhagen (September 2017)

    Google Scholar 

  36. Katoen, J.-P., McIver, A.K., Meinicke, L.A., Morgan, C.C.: Linear-invariant generation for probabilistic programs. In: Cousot, R., Martel, M. (eds.) SAS 2010. LNCS, vol. 6337, pp. 390–406. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15769-1_24

    Chapter  Google Scholar 

  37. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Verification, CAV 2017. Lecture Notes in Computer Science, vol. 10426. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5

  38. Katz, G., et al.: The Marabou framework for verification and analysis of deep neural networks. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26

    Chapter  Google Scholar 

  39. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (May 2014). http://arxiv.org/abs/1312.6114

  40. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  41. Lamport, L.: Proving the correctness of multiprocess programs. IEEE Trans. Softw. Eng. 3(2), 125–143 (1977)

    Article  MathSciNet  Google Scholar 

  42. Latorre, F., Rolland, P., Cevher, V.: Lipschitz constant estimation of neural networks via sparse polynomial optimization. arXiv:2004.08688 (April 2020). http://arxiv.org/abs/2004.08688

  43. Liu, C., Arnon, T., Lazarus, C., Barrett, C., Kochenderfer, M.J.: Algorithms for verifying deep neural networks. arXiv:1903.06758 (March 2019). http://arxiv.org/abs/1903.06758

  44. Mangal, R., Nori, A.V., Orso, A.: Robustness of neural networks: a probabilistic and practical approach. In: Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019, pp. 93–96. IEEE Press, Montreal (May 2019)

    Google Scholar 

  45. Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231–5240 (May 2019). http://proceedings.mlr.press/v97/qin19a.html

  46. Sampson, A., Panchekha, P., Mytkowicz, T., McKinley, K.S., Grossman, D., Ceze, L.: Expressing and verifying probabilistic assertions. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, pp. 112–122. Association for Computing Machinery, Edinburgh (June 2014)

    Google Scholar 

  47. Sankaranarayanan, S., Chakarov, A., Gulwani, S.: Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 447–458. Association for Computing Machinery, Seattle (June 2013)

    Google Scholar 

  48. Singh, C.: Csinva/gan-vae-pretrained-pytorch (May 2020). https://github.com/csinva/gan-vae-pretrained-pytorch

  49. Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. Proc. ACM Program. Lang. 3(POPL), 41:1–41:30 (2019)

    Article  Google Scholar 

  50. Slepak, J., Shivers, O., Manolios, P.: An array-oriented language with static rank polymorphism. In: Shao, Z. (ed.) ESOP 2014. LNCS, vol. 8410, pp. 27–46. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54833-8_3

    Chapter  Google Scholar 

  51. Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014). http://arxiv.org/abs/1312.6199

  52. Tsuzuku, Y., Sato, I., Sugiyama, M.: Lipschitz-margin training: scalable certification of perturbation invariance for deep neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 6542–6551. Curran Associates Inc., Montréal (December 2018)

    Google Scholar 

  53. Virmaux, A., Scaman, K.: Lipschitz regularity of deep neural networks: analysis and efficient estimation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3835–3844. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7640-lipschitz-regularity-of-deep-neural-networks-analysis-and-efficient-estimation.pdf

  54. Wang, D., Hoffmann, J., Reps, T.: PMAF: an algebraic framework for static analysis of probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pp. 513–528. Association for Computing Machinery, Philadelphia (June 2018)

    Google Scholar 

  55. Webb, S., Rainforth, T., Teh, Y.W., Kumar, M.P.: A statistical approach to assessing neural network robustness. In: International Conference on Learning Representations (September 2018). https://openreview.net/forum?id=S1xcx3C5FX

  56. Weng, L., et al.: PROVEN: verifying robustness of neural networks with a probabilistic approach. In: International Conference on Machine Learning, pp. 6727–6736 (May 2019). http://proceedings.mlr.press/v97/weng19a.html

  57. Weng, L., et al.: Towards fast computation of certified robustness for ReLU networks. In: International Conference on Machine Learning, pp. 5276–5285 (July 2018). http://proceedings.mlr.press/v80/weng18a.html

  58. Weng, T.W., et al.: Evaluating the robustness of neural networks: an extreme value theory approach. In: International Conference on Learning Representations (February 2018). https://openreview.net/forum?id=BkUHlMZ0b

  59. Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. arXiv:1712.07107 (July 2018). http://arxiv.org/abs/1712.07107

  60. Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (June 2010)

    Google Scholar 

  61. Zhang, H., Zhang, P., Hsieh, C.J.: RecurJac: an efficient recursive algorithm for bounding Jacobian matrix of neural networks and its applications. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 5757–5764 (2019)

    Google Scholar 

  62. Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi Mangal .

Editor information

Editors and Affiliations

Appendices

A Proof of Lemma 3

Lemma 3

(Equivalence of semantics)

Proof

We prove this by induction on the structure of statements in \(s^-\).

We first consider the base cases:

  1. (i)

    By definition, for any state \(\sigma \),

  2. (ii)

    Again, by definition, for any state \(\sigma \),

Next, we consider the inductive cases:

  1. (iii)
  2. (iv)

   \(\blacksquare \)

B Proof of Corollary 4

Corollary 4

Proof

By definition,

Now suppose, . Then, continuing from above,

   \(\blacksquare \)

C Proof of Theorem 6

We first prove a lemma needed for the proof.

Lemma 12

(Soundness of abstract conditional checks)

where

\(\gamma _C(\mathbf {tt}) = \{\mathbf {tt}\}\), \(\gamma _C(\mathbf {ff}) = \{\mathbf {ff}\}\), \(\gamma _C(\top ) = \{\mathbf {tt}, \mathbf {ff}\}\)

Proof

We prove this by induction on the structure of the boolean expressions in b.

We first consider the base cases:

  1. (i)

    By definition, \({[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_L}({\sigma }^{_L}) = {[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_B}({\sigma }^{_L}_1)\) Consider the case where, \({[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_B}({\sigma }^{_L}_1) = \mathbf {tt}\), then, by the semantics described in Fig. 6, we know that,

    $$\begin{aligned} {\sigma }^{_L}_1(x)_m)_1 \ge ({\sigma }^{_L}_1(y)_n)_2) \end{aligned}$$
    (1)

    By the definition of \(\gamma _L\) (Definition 5), we also know that,

    $$\begin{aligned} \forall {\sigma }^{_D}\in \gamma _L.\; ({\sigma }^{_L}_1(x)_1 \leqslant {\sigma }^{_D}_1(x) \leqslant {\sigma }^{_L}_1(x)_2) \wedge ({\sigma }^{_L}_1(y)_1 \leqslant {\sigma }^{_D}_1(y) \leqslant {\sigma }^{_L}_1(y)_2) \end{aligned}$$
    (2)

    where the comparisons are performed pointwise for every element in the vector.

    From 1 and 2, we can conclude that,

    $$\begin{aligned} \forall {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}).\; {\sigma }^{_D}_1(y)_n \leqslant ({\sigma }^{_L}_1(y)_n)_2 \leqslant ({\sigma }^{_L}_1(x)_m)_1 \leqslant {\sigma }^{_D}_1(x)_m \end{aligned}$$
    (3)

    Now,

    (4)

    From 3 and 4, we can conclude that, , or in other words, when the analysis returns \(\mathbf {tt}\). We can similarly prove the case when the analysis returns \(\mathbf {ff}\). In case, the analysis returns \(\top \), the required subset containment is trivially true since \(\gamma _C(\top ) = \{\mathbf {tt},\mathbf {ff}\}\).

  2. (ii)

    The proof is very similar to the first case, and we skip the details.

  3. (iii)

    The proof is very similar to the first case, and we skip the details.

We next consider the inductive cases:

  1. (iv)

    By the inductive hypothesis, we know that, If \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L}) = \top \vee {[\![b_2 ]\!]}_{_L}({\sigma }^{_L}) = \top \), then, as per the semantics in Fig. 6, \({[\![b_1 \wedge b_2 ]\!]}_{_L}({\sigma }^{_L}) = \top \), and the desired property trivially holds. However, if \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L}) \ne \top \,\wedge \, {[\![b_2 ]\!]}_{_L}({\sigma }^{_L}) \ne \top \), then using the inductive hypotheses, we know that for all \({\sigma }^{_D}\in \gamma _L({\sigma }^{_L})\), evaluates to the same boolean value as \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L})\). We can make the same deduction for \(b_2\). So, evaluating also yields the same boolean value for all \({\sigma }^{_D}\in \gamma _L({\sigma }^{_L})\), and this value is equal to \({[\![b_1 \wedge b_2 ]\!]}_{_L}({\sigma }^{_L})\).

  2. (v)

    By the inductive hypothesis, we know that, If \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \mathbf {tt}\), then . So, , and we can conclude that, . We can similarly argue about the case when \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \mathbf {ff}\), and as stated previously, the case with, \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \top \) trivially holds.

   \(\blacksquare \)

Theorem 13

(Soundness of Jacobian analysis)

Proof

We prove this by induction on the structure of statements in \(s^-\).

We first consider the base cases:

  1. (i)

    By definition, for any state \({\sigma }^{_L}\),

    $$\begin{aligned} {[\![\mathbf {skip} ]\!]}_{_L}({\sigma }^{_L}) = {\sigma }^{_L}\end{aligned}$$
    (5)
    (6)

    From Eqs. 5 and 6,

    (7)
  2. (ii)

    We first observe that when multiplying an interval (lu) with a constant c, if \(c \ge 0\), then the result is simply given by the interval \((c \cdot l, c \cdot u)\). But if \(c < 0\), then the result is in the interval \((c \cdot u, c \cdot l)\), i.e., the use of the lower bounds and upper bounds gets flipped. Similarly, when computing the dot product of an abstract vector v with a constant vector w, for each multiplication operation \(v_i \cdot w_i\), we use the same reasoning as above. Then, the lower bound and upper bound of the dot product result are given by, \((\sum \limits _{i=1 \wedge w_i \ge 0}^{n} w_i \cdot (v_i)_1 \,+\, \sum \limits _{i=1 \wedge w_i< 0}^{n} w_i \cdot (v_i)_2,\, \sum \limits _{i=1 \wedge w_i \ge 0}^{n} w_i \cdot (v_i)_2 \,+\, \sum \limits _{i=1 \wedge w_i < 0}^{n} w_i \cdot (v_i)_1)\) where \((v_i)_1\) represents the lower bound of the \(i^{th}\) element of v and \((v_i)_2\) represents the lower bound of the \(i^{th}\) element of v, and we assume \(\mathbf {dim}(w) = \mathbf {dim}(v) = n\). We do not provide the rest of the formal proof for this case since it just involves using the definitions.

Next, we consider the inductive cases:

  1. (iii)

    From the inductive hypothesis, we know,

    (8)
    (9)

    From Eqs. 8 and 9, we conclude,

    (10)

    Rewriting, we get,

    (11)

    and this can be simplified further as,

    (12)
  2. (iv)

    From the inductive hypothesis, we know,

    (13)
    (14)

    The conditional check can result in three different outcomes while performing the analysis - \(\mathbf {tt}\), \(\mathbf {ff}\), or \(\top \). From Lemma 12, we know that the abstract boolean checks are sound. We analyze each of the three cases separately.

    1. (a)

      Since we only consider the true case, we can write,

      $$\begin{aligned} {[\![\mathbf {if}\; b\; \mathbf {then}\; s_1^-\; \mathbf {else}\; s_2^- ]\!]}_{_L}({\sigma }^{_L}) = {[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}) \end{aligned}$$
      (15)

      Also, from Lemma 12,

      (16)

      From 13, 15, and 16,

      (17)
    2. (b)

      Similar to the \(\mathbf {tt}\) case, for the \(\mathbf {ff}\) case, we can show,

      (18)
    3. (c)

      We first prove the following about the join (\(\bigsqcup _L\)) operation,

      $$\begin{aligned} \gamma _L({\sigma }^{_L}) \cup \gamma _L({\tilde{\sigma }}^{_L}) \subseteq \gamma _L({\sigma }^{_L}\sqcup _L {\tilde{\sigma }}^{_L}) \end{aligned}$$
      (19)

      By definition of \(\gamma _L\),

      $$\begin{aligned} \begin{aligned} \gamma _L({\sigma }^{_L}) = \{ {\sigma }^{_D}\mid (\bigwedge _{v \in V}. {\sigma }^{_L}_1(v)_1 \leqslant {\sigma }^{_D}_1(v) \leqslant {\sigma }^{_L}_1(v)_2) \,\wedge \\ (\bigwedge _{v \in V}. ({\sigma }^{_L}_2(v)_1)_1 \leqslant {\sigma }^{_D}_2(v)_1 \leqslant ({\sigma }^{_L}_2(v)_1)_2) \,\wedge \\ {\sigma }^{_D}_2(v)_2 \in \gamma _V({\sigma }^{_L}_2(v)_2)\} \end{aligned} \end{aligned}$$
      (20)

      \(\gamma _L({\tilde{\sigma }}^{_L})\) can be defined similarly.

      The join operation combines corresponding intervals in the abstract states by taking the smaller of the two lower bounds and larger of the two upper bounds. We do not prove the following formally, but from the definition of \(\gamma _L\) and \(\bigsqcup _L\), one can see that the intended property holds. Next, we consider the \(\mathbf {assert}\) statements that appear in the abstract denotational semantics for the \(\top \) case. Let us call, \({\sigma }^{_L}_1 ={[\![\mathbf {assert}\; b ]\!]}_{_L}({\sigma }^{_L})\) and \({\sigma }^{_L}_2 ={[\![\mathbf {assert}\; \lnot b ]\!]}_{_L}({\sigma }^{_L})\). From inductive hypothesis (13 and 14) we know,

      (21)
      (22)

      From 19, 21, and 22,

      $$\begin{aligned} L_1 \,\cup \, L_2 \subseteq \gamma _L({[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}_1)) \,\cup \, \gamma _L({[\![s_2^- ]\!]}_{_L}({\sigma }^{_L}_2)) \subseteq \gamma _L({[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}_1) \,\sqcup \, {[\![s_2^- ]\!]}_{_L}({\sigma }^{_L}_2)) \end{aligned}$$
      (23)

      Then, if we can show that,

      $$\begin{aligned} \{{\sigma }^{_D}\mid {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}) \wedge {[\![b ]\!]}({\sigma }^{_D}) = \mathbf {tt}\} \subseteq \gamma _L({\sigma }^{_L}_1) \end{aligned}$$
      (24)
      $$\begin{aligned} \{{\sigma }^{_D}\mid {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}) \wedge {[\![b ]\!]}({\sigma }^{_D}) = \mathbf {ff}\} \subseteq \gamma _L({\sigma }^{_L}_2) \end{aligned}$$
      (25)

      then, from 21, 22, 23, 24, 25, and the semantics of \(\mathbf {if}\; b\; \mathbf {then}\; s_1^-\; \mathbf {else}\; s_2^-\), we can say,

      (26)

      Now, we need to show that 24 and 25 are true. The \(\mathbf {assert}\) statements either behave as identity or produce a modified abstract state (see Fig. 6). When \(\mathbf {assert}\) behaves as identity, 24 and 25 are obviously true. We skip the proof of the case when \(\mathbf {assert}\) produces a modified abstract state.

   \(\blacksquare \)

D Proof of Corollary 8

Corollary 8

(Upper bound of Jacobian operator norm)

Proof

From Theorem 6, we know that for any \(p \in s^-, {\sigma }^{_L}\in {\varSigma }^{_L}\),

(1)

Let us define, . This is the set of all Jacobian matrices associated with the variable v after executing p on the set of input states, \(\gamma _L({\sigma }^{_L})\). Note that the set \(D_V\) does not distinguish the Jacobians on the basis of the input that we are differentiating with respect to.

Let \(D_V^L = \{({\tilde{\sigma }}^{_D}_2(v))_1 \mid {\tilde{\sigma }}^{_D}\in \gamma _L({[\![p ]\!]}_{_L}({\sigma }^{_L}))\}\), and \(J = (({[\![p ]\!]}_{_L}({\sigma }^{_L}))_2(v))_1\).

Using Definition 5 of \(\gamma _L\), we can show,

$$\begin{aligned} \forall d \in D_V^L. \; J_1 \leqslant d \leqslant J_2 \end{aligned}$$
(2)

where \(\leqslant \) is defined pointwise on the matrices, and \(J_1\)(\(J_2\)) refers to the matrix of lower(upper) bounds.

Then, from 1 and definitions of \(D_V\) and \(D_V^L\), we can deduce that,

$$\begin{aligned} D_V \subseteq D_V^L \end{aligned}$$
(3)

From 2 and 3,

$$\begin{aligned} \forall d \in D_V. \; J_1 \leqslant d \leqslant J_2 \end{aligned}$$
(4)

Let \(J' = [\mathbf {max}\{|(J_{k,l})_1|, |(J_{k,l})_2|\} \mid k \in \{1,...,m\},\, l \in \{1,...,n\}]\). Then,

$$\begin{aligned} \forall d \in D_V. \; |d| \leqslant J' \end{aligned}$$
(5)

where \(|\cdot |\) applies pointwise on matrices d.

Using definition of operator norm, one can show that,

$$\begin{aligned} M_1 \leqslant M_2 \Longrightarrow \Vert M_1\Vert \leqslant \Vert M_2\Vert \end{aligned}$$
(6)

where \(M_1\) and \(M_2\) are matrices with \(\leqslant \) applied pointwise.

Finally, from 5 and 6, we conclude,

$$\begin{aligned} \forall d \in D_V. \; \Vert d\Vert \leqslant \Vert J'\Vert = \Vert J\Vert \end{aligned}$$
(7)

Unrolling the definitions,

(8)

   \(\blacksquare \)

E Proof of Theorem 11

Theorem 11

(Soundness of \(\mathtt{PROLIP}\))

Let where \(g,\,f \in s^-\), \((k_U,d,vol) = \mathtt{PROLIP}(p,z_B)\), \(z \notin \mathbf {outv}(g), \, z \notin \mathbf {outv}(f), \,x \in \mathbf {inv}(f),\,\) and \(y \in \mathbf {outv}(f)\) then, \(\forall \sigma _0 \in \varSigma . \)

\(\underset{\sigma ,\sigma ' \sim {[\![p ]\!]}(\sigma _0)}{\Pr }((\Vert \sigma (y) - \sigma '(y)\Vert \leqslant k_U \cdot \Vert \sigma (x) - \sigma '(x)\Vert ) \wedge (\sigma (z),\sigma '(z) \in \gamma (z_B))) \ge vol\)

Proof

We prove this theorem in two parts.

First, let us define set \(\varSigma _P\) as, \(\varSigma _P = \{\sigma \mid \sigma \in \gamma _B({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1)\}\)

In words, \(\varSigma _P\) is the concretization of the abstract box produced by abstractly “interpreting” gf on the input box \(z_B\). Assuming that z is not written to by g or f, it is easy to see from the definitions of the abstract semantics in Figs. 6 and 4 that, \(({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1(z) = z_B\), i.e., the final abstract value of z is the same as the initial value \(z_B\). Moreover, from Theorem 8, we know that the operator norm of the abstract Jacobian matrix, \({\Vert J\Vert }_{_L}\) upper bounds the operator norm of every Jacobian of f for variable y with respect to x (since \(x \in \mathbf {inv}(f), y \in \mathbf {outv}(f)\)) for every input in \(\gamma _B({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B]))\), which itself is an upper bound on the local Lipschitz constant in the same region.

In other words, we can say that,

\(\forall \sigma ,\sigma ' \in \varSigma _P.\; \sigma (z), \sigma '(z) \in \gamma (z_B) \wedge \Vert \sigma (y) - \sigma '(y)\Vert \leqslant k_U \cdot \Vert \sigma (x) - \sigma '(x)\Vert \).

To complete the proof, we need to show that, \(\underset{\sigma ,\sigma ' \sim {[\![p ]\!]}(\sigma _0)}{\Pr }(\sigma ,\sigma ' \in \varSigma _P) \ge vol\). We show this in the second part of this proof.

Using the semantic definition of \(pcat\) (Fig. 2), we know that,

We first analyze . Again using the semantic definition of \(pcat\), we write,

(1)

We are interested in the volume of the set \(\varSigma _z\), defined as, \(\varSigma _z = \{\sigma \mid \sigma (z) \in z_B\}\). Using the expression for from above, we can now compute the required probability as follows,

(2)

This shows that starting from any \(\sigma _0 \in \varSigma \), after executing the first statement of p, the probability that the value stored at z lies in the box \(z_B\) is \(vol'\).

Next, we analyze . In particular, we are interested in the volume of the set, (which is notational abuse for the set ). We can lower bound this volume as follows,

(3)

We can similarly show that,

(4)

Now, \({\sigma }^{_B}[z \mapsto z_B]\) defined on line 2 of Algorithm 1 is such that

\(\gamma ({\sigma }^{_B}[z \mapsto z_B]) = \varSigma _z\). From Theorem 10, we can conclude that,

(5)

Similarly, from Theorem 6, we can conclude that,

(6)

From 4 and 6, we conclude that,

$$\begin{aligned} \underset{\sigma \sim {[\![p ]\!]}(\sigma _0)}{\Pr } (\sigma \in \gamma ({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1) \ge vol' \end{aligned}$$
(7)

Consequently,

$$\begin{aligned} \underset{\sigma , \sigma ' \sim {[\![p ]\!]}(\sigma _0)}{\Pr } (\sigma ,\sigma ' \in \gamma ({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1) \ge vol' \times vol' = vol \end{aligned}$$
(8)

since each act of sampling is independent.    \(\blacksquare \)

F Translating Neural Networks into \(pcat\)

NNs are often described as a sequential composition of “layers”, with each layer describing the computation to be performed on an incoming vector. Many commonly used layers can be expressed in the \(pcat\) language. For instance, [28] describes the translation of maxpool, convolution, ReLU, and fully connected layers into the \(cat\) language. Here, we describe the translation of two other common layers, namely, the batchnorm layer [34] and the transposed convolution layer (also referred to as the deconvolution layer) [60].

Batchnorm Layer. A batchnorm layer typically typically expects an input \(x \in \mathbb {R}^{C \times H \times W}\) which we flatten, using a row-major form in to \(x' \in \mathbb {R}^{C \cdot H \cdot W}\) where, historically, C denotes the number of channels in the input, H denotes the height, and W denotes the width. For instance, given an RGB image of dimensions 28 \(\times \) 28 pixels, \(H = 28\), \(W = 28\), and \(C = 3\).

A batchnorm layer is associated with vectors m and v such that \(\mathbf {dim}(m) = \mathbf {dim}(v) = C\) where \(\mathbf {dim}(\cdot )\) returns the dimension of a vector. m and v represent the running-mean and running-variance of the values in each channel observed during the training time of the NN. A batchnorm layer is also associated with a scaling vector \(s^1\) and a shift vector \(s^2\), both also of dimension c. For a particular element \(x_{i,j,k}\) in the input, the corresponding output element is \(s^1_i\cdot (\frac{x_{i,j,k}-m_i}{\sqrt{v_i+\epsilon }})+s^2_i\) where \(\epsilon \) is a constant that is added for numerical stability (commonly set to \(1e^{-5}\)). Note that the batchnorm operation produces an output of the same dimensions as the input. We can represent the batchnorm operation by the statement, \(y \leftarrow w \cdot x' + \beta \), where \(x'\) is the flattened input, w is a weight matrix of dimension \(C \cdot H \cdot W \times C \cdot H \cdot W\) and \(\beta \) is a bias vector of dimension \(C \cdot H \cdot W\), such that,

\(w = I \cdot [\frac{s^1_{\lfloor i/H\cdot W \rfloor }}{\sqrt{v_{\lfloor i/H\cdot W \rfloor }+\epsilon }} \mid i \in \{1,...,C \cdot H \cdot W\}] \)

\(\beta = [-\frac{s^1_{\lfloor i/H\cdot W \rfloor }\cdot m_{\lfloor i/H\cdot W \rfloor }}{\sqrt{v_{\lfloor i/H\cdot W \rfloor }+\epsilon }} + s^2_{\lfloor i/H\cdot W \rfloor } \mid i \in \{1,...,C \cdot H \cdot W\}]\)

where I is the identity matrix with dimension \((C \cdot H \cdot W, C \cdot H \cdot W)\), \(\lfloor \cdot \rfloor \) is the floor operation that rounds down to an integer, and \([ \; \mid \; ]\) is the list builder/comprehension notation.

Transposed Convolution Layer. A convolution layer applies a kernel or a filter on the input vector and typically, compresses this vector so that the output vector is of a smaller dimension. A deconvolution or transposed convolution layer does the opposite - it applies the kernel in a manner that produces a larger output vector. A transposed convolution layer expects an input \(x \in \mathbb {R}^{C_{in} \times H_{in} \times W_{in}}\) and applies a kernel \(k \in \mathbb {R}^{C_{out} \times C_{in} \times K_h \times K_w}\) using a stride S. For simplicity of presentation, we assume that \(K_h = K_w = K\) and \(W_{in}=H_{in}\). In \(pcat\), the transposed convolution layer can be expressed by the statement, \(y \leftarrow w \cdot x'\), where \(x'\) is the flattened version of input x, w is a weight matrix that we derive from the parameters associated with the transposed convolution layer, and the bias vector, \(\beta \), is a zero vector in this case. To compute the dimensions of the weight matrix, we first calculate the height (\(H_{out}\)) and width (\(W_{out}\)) of each channel in the output using formulae, \(H_{out}=H_{in}\cdot S + K\), and \(W_{out}=W_{in}\cdot S + K\). Since we assume \(W_{in}=H_{in}\), we have \(W_{out} = H_{out}\) here. Then, the dimension of w is \({C_{out} \cdot H_{out} \cdot W_{out}\times C_{in} \cdot H_{in} \cdot W_{in}}\), and the definition of w is as follows,

where \(I = \{1,...,C_{out} \cdot H_{out} \cdot W_{out}\}\) and \(J = \{1,...,C_{in} \cdot H_{in} \cdot W_{in}\}\)

G Details of Box Analysis

Fig. 6.
figure 6

\(cat\) abstract semantics for box analysis

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mangal, R., Sarangmath, K., Nori, A.V., Orso, A. (2020). Probabilistic Lipschitz Analysis of Neural Networks. In: Pichardie, D., Sighireanu, M. (eds) Static Analysis. SAS 2020. Lecture Notes in Computer Science(), vol 12389. Springer, Cham. https://doi.org/10.1007/978-3-030-65474-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65474-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65473-3

  • Online ISBN: 978-3-030-65474-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics