Abstract
We are interested in algorithmically proving the robustness of neural networks. Notions of robustness have been discussed in the literature; we are interested in probabilistic notions of robustness that assume it feasible to construct a statistical model of the process generating the inputs of a neural network. We find this a reasonable assumption given the rapid advances in algorithms for learning generative models of data. A neural network f is then defined to be probabilistically robust if, for a randomly generated pair of inputs, f is likely to demonstrate k-Lipschitzness, i.e., the distance between the outputs computed by f is upper-bounded by the \(k^{th}\) multiple of the distance between the pair of inputs. We name this property, probabilistic Lipschitzness.
We model generative models and neural networks, together, as programs in a simple, first-order, imperative, probabilistic programming language, \(pcat\). Inspired by a large body of existing literature, we define a denotational semantics for this language. Then we develop a sound local Lipschitzness analysis for \(cat\), a non-probabilistic sublanguage of \(pcat\). This analysis can compute an upper bound of the “Lipschitzness” of a neural network in a bounded region of the input set. We next present a provably correct algorithm, \(\mathtt{PROLIP}\), that analyzes the behavior of a neural network in a user-specified box-shaped input region and computes - (i) lower bounds on the probabilistic mass of such a region with respect to the generative model, (ii) upper bounds on the Lipschitz constant of the neural network in this region, with the help of the local Lipschitzness analysis. Finally, we present a sketch of a proof-search algorithm that uses \(\mathtt{PROLIP}\) as a primitive for finding proofs of probabilistic Lipschitzness. We implement the \(\mathtt{PROLIP}\) algorithm and empirically evaluate the computational complexity of \(\mathtt{PROLIP}\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Recent work has tried to combine loss functions with logical constraints [27].
- 2.
\(pcat\) has no \(\mathtt{observe}\) or \(\mathtt{score}\) construct and cannot be used for Bayesian reasoning.
References
Albarghouthi, A., D’Antoni, L., Drews, S., Nori, A.V.: FairSquare: probabilistic verification of program fairness. Proc. ACM Program. Lang. 1(OOPSLA), 80:1–80:30 (2017)
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2890–2896. Association for Computational Linguistics, Brussels (October 2018)
Baluta, T., Shen, S., Shinde, S., Meel, K.S., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, pp. 1249–1264. Association for Computing Machinery, London (November 2019)
Bárány, I., Füredi, Z.: Computing the volume is difficult. Discret. Comput. Geom. 2(4), 319–326 (1987)
Barthe, G., D’Argenio, P., Rezk, T.: Secure information flow by self-composition. In: Proceedings of 17th IEEE Computer Security Foundations Workshop, 2004, pp. 100–114 (June 2004)
Barthe, G., Crespo, J.M., Kunz, C.: Relational verification using product programs. In: Butler, M., Schulte, W. (eds.) FM 2011. LNCS, vol. 6664, pp. 200–214. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21437-0_17
Barthe, G., Espitau, T., Ferrer Fioriti, L.M., Hsu, J.: Synthesizing probabilistic invariants via Doob’s decomposition. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 43–61. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41528-4_3
Barthe, G., Espitau, T., Gaboardi, M., Grégoire, B., Hsu, J., Strub, P.-Y.: An assertion-based program logic for probabilistic programs. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol. 10801, pp. 117–144. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89884-1_5
Barthe, G., Espitau, T., Grégoire, B., Hsu, J., Strub, P.Y.: Proving expected sensitivity of probabilistic programs. Proc. ACM Program. Lang. 2(POPL), 57:1–57:29 (2017)
Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2613–2621. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6339-measuring-neural-net-robustness-with-constraints.pdf
Bastani, O., Zhang, X., Solar-Lezama, A.: Probabilistic verification of fairness properties via concentration. Proc. ACM Program. Lang. 3(OOPSLA), 118:1–118:27 (2019)
Benton, N.: Simple relational correctness proofs for static analyses and program transformations. In: Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004, pp. 14–25. Association for Computing Machinery, Venice (January 2004)
Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 513–530 (2016). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/carlini
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7 (May 2018)
Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 511–526. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_34
Chaudhuri, S., Gulwani, S., Lublinerman, R., Navidpour, S.: Proving programs robust. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE 2011, pp. 102–112. Association for Computing Machinery, Szeged (September 2011)
Chen, A.: Aaron-xichen/pytorch-playground (May 2020). https://github.com/aaron-xichen/pytorch-playground
Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: 2008 21st IEEE Computer Security Foundations Symposium, pp. 51–65 (June 2008)
Combettes, P.L., Pesquet, J.C.: Lipschitz certificates for neural network structures driven by averaged activation operators. arXiv:1903.01014 (2019)
Cousins, B., Vempala, S.: Gaussian cooling and \({O}{^{*}}\)\((n{^{3}})\) algorithms for volume and Gaussian volume. SIAM J. Comput. 47(3), 1237–1273 (2018)
Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1978, pp. 84–96. Association for Computing Machinery, Tucson (January 1978)
Cousot, P., Monerau, M.: Probabilistic abstract interpretation. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 169–193. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28869-2_9
Dyer, M.E., Frieze, A.M.: On the complexity of computing the volume of a polyhedron. SIAM J. Comput. 17(5), 967–974 (1988)
Dyer, M., Frieze, A., Kannan, R.: A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM 38(1), 1–17 (1991)
Elekes, G.: A geometric inequality and the complexity of computing volume. Discret. Comput. Geom. 1(4), 289–292 (1986)
Fazlyab, M., Robey, A., Hassani, H., Morari, M., Pappas, G.: Efficient and accurate estimation of lipschitz constants for deep neural networks. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 11427–11438. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/9319-efficient-and-accurate-estimation-of-lipschitz-constants-for-deep-neural-networks.pdf
Fischer, M., Balunovic, M., Drachsler-Cohen, D., Gehr, T., Zhang, C., Vechev, M.: DL2: training and querying neural networks with logic. In: International Conference on Machine Learning, pp. 1931–1941 (May 2019). http://proceedings.mlr.press/v97/fischer19a.html
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (May 2018)
Geldenhuys, J., Dwyer, M.B., Visser, W.: Probabilistic symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pp. 166–176. Association for Computing Machinery, Minneapolis (July 2012)
Ghorbal, K., Goubault, E., Putot, S.: The zonotope abstract domain Taylor1+. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 627–633. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_47
Gibbons, J.: APLicative programming with Naperian functors. In: Yang, H. (ed.) ESOP 2017. LNCS, vol. 10201, pp. 556–583. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54434-1_21
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Gouk, H., Frank, E., Pfahringer, B., Cree, M.: Regularisation of neural networks by enforcing lipschitz continuity. arXiv:1804.04368 (September 2018). http://arxiv.org/abs/1804.04368
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 448–456. JMLR.org, Lille (July 2015)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2021–2031. Association for Computational Linguistics, Copenhagen (September 2017)
Katoen, J.-P., McIver, A.K., Meinicke, L.A., Morgan, C.C.: Linear-invariant generation for probabilistic programs. In: Cousot, R., Martel, M. (eds.) SAS 2010. LNCS, vol. 6337, pp. 390–406. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15769-1_24
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Verification, CAV 2017. Lecture Notes in Computer Science, vol. 10426. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Katz, G., et al.: The Marabou framework for verification and analysis of deep neural networks. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (May 2014). http://arxiv.org/abs/1312.6114
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Lamport, L.: Proving the correctness of multiprocess programs. IEEE Trans. Softw. Eng. 3(2), 125–143 (1977)
Latorre, F., Rolland, P., Cevher, V.: Lipschitz constant estimation of neural networks via sparse polynomial optimization. arXiv:2004.08688 (April 2020). http://arxiv.org/abs/2004.08688
Liu, C., Arnon, T., Lazarus, C., Barrett, C., Kochenderfer, M.J.: Algorithms for verifying deep neural networks. arXiv:1903.06758 (March 2019). http://arxiv.org/abs/1903.06758
Mangal, R., Nori, A.V., Orso, A.: Robustness of neural networks: a probabilistic and practical approach. In: Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019, pp. 93–96. IEEE Press, Montreal (May 2019)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231–5240 (May 2019). http://proceedings.mlr.press/v97/qin19a.html
Sampson, A., Panchekha, P., Mytkowicz, T., McKinley, K.S., Grossman, D., Ceze, L.: Expressing and verifying probabilistic assertions. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, pp. 112–122. Association for Computing Machinery, Edinburgh (June 2014)
Sankaranarayanan, S., Chakarov, A., Gulwani, S.: Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 447–458. Association for Computing Machinery, Seattle (June 2013)
Singh, C.: Csinva/gan-vae-pretrained-pytorch (May 2020). https://github.com/csinva/gan-vae-pretrained-pytorch
Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. Proc. ACM Program. Lang. 3(POPL), 41:1–41:30 (2019)
Slepak, J., Shivers, O., Manolios, P.: An array-oriented language with static rank polymorphism. In: Shao, Z. (ed.) ESOP 2014. LNCS, vol. 8410, pp. 27–46. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54833-8_3
Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014). http://arxiv.org/abs/1312.6199
Tsuzuku, Y., Sato, I., Sugiyama, M.: Lipschitz-margin training: scalable certification of perturbation invariance for deep neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 6542–6551. Curran Associates Inc., Montréal (December 2018)
Virmaux, A., Scaman, K.: Lipschitz regularity of deep neural networks: analysis and efficient estimation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3835–3844. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7640-lipschitz-regularity-of-deep-neural-networks-analysis-and-efficient-estimation.pdf
Wang, D., Hoffmann, J., Reps, T.: PMAF: an algebraic framework for static analysis of probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pp. 513–528. Association for Computing Machinery, Philadelphia (June 2018)
Webb, S., Rainforth, T., Teh, Y.W., Kumar, M.P.: A statistical approach to assessing neural network robustness. In: International Conference on Learning Representations (September 2018). https://openreview.net/forum?id=S1xcx3C5FX
Weng, L., et al.: PROVEN: verifying robustness of neural networks with a probabilistic approach. In: International Conference on Machine Learning, pp. 6727–6736 (May 2019). http://proceedings.mlr.press/v97/weng19a.html
Weng, L., et al.: Towards fast computation of certified robustness for ReLU networks. In: International Conference on Machine Learning, pp. 5276–5285 (July 2018). http://proceedings.mlr.press/v80/weng18a.html
Weng, T.W., et al.: Evaluating the robustness of neural networks: an extreme value theory approach. In: International Conference on Learning Representations (February 2018). https://openreview.net/forum?id=BkUHlMZ0b
Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. arXiv:1712.07107 (July 2018). http://arxiv.org/abs/1712.07107
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (June 2010)
Zhang, H., Zhang, P., Hsieh, C.J.: RecurJac: an efficient recursive algorithm for bounding Jacobian matrix of neural networks and its applications. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 5757–5764 (2019)
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 1 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Proof of Lemma 3
Lemma 3
(Equivalence of semantics)

Proof
We prove this by induction on the structure of statements in \(s^-\).
We first consider the base cases:
-
(i)
By definition, for any state \(\sigma \),
-
(ii)
Again, by definition, for any state \(\sigma \),
Next, we consider the inductive cases:
-
(iii)
-
(iv)
\(\blacksquare \)
B Proof of Corollary 4
Corollary 4

Proof
By definition,

Now suppose, . Then, continuing from above,

\(\blacksquare \)
C Proof of Theorem 6
We first prove a lemma needed for the proof.
Lemma 12
(Soundness of abstract conditional checks)
where
\(\gamma _C(\mathbf {tt}) = \{\mathbf {tt}\}\), \(\gamma _C(\mathbf {ff}) = \{\mathbf {ff}\}\), \(\gamma _C(\top ) = \{\mathbf {tt}, \mathbf {ff}\}\)
Proof
We prove this by induction on the structure of the boolean expressions in b.
We first consider the base cases:
-
(i)
By definition, \({[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_L}({\sigma }^{_L}) = {[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_B}({\sigma }^{_L}_1)\) Consider the case where, \({[\![\varvec{\pi }(x,m) \ge \varvec{\pi }(y,n) ]\!]}_{_B}({\sigma }^{_L}_1) = \mathbf {tt}\), then, by the semantics described in Fig. 6, we know that,
$$\begin{aligned} {\sigma }^{_L}_1(x)_m)_1 \ge ({\sigma }^{_L}_1(y)_n)_2) \end{aligned}$$(1)By the definition of \(\gamma _L\) (Definition 5), we also know that,
$$\begin{aligned} \forall {\sigma }^{_D}\in \gamma _L.\; ({\sigma }^{_L}_1(x)_1 \leqslant {\sigma }^{_D}_1(x) \leqslant {\sigma }^{_L}_1(x)_2) \wedge ({\sigma }^{_L}_1(y)_1 \leqslant {\sigma }^{_D}_1(y) \leqslant {\sigma }^{_L}_1(y)_2) \end{aligned}$$(2)where the comparisons are performed pointwise for every element in the vector.
From 1 and 2, we can conclude that,
$$\begin{aligned} \forall {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}).\; {\sigma }^{_D}_1(y)_n \leqslant ({\sigma }^{_L}_1(y)_n)_2 \leqslant ({\sigma }^{_L}_1(x)_m)_1 \leqslant {\sigma }^{_D}_1(x)_m \end{aligned}$$(3)Now,
(4)From 3 and 4, we can conclude that,
, or in other words,
when the analysis returns \(\mathbf {tt}\). We can similarly prove the case when the analysis returns \(\mathbf {ff}\). In case, the analysis returns \(\top \), the required subset containment is trivially true since \(\gamma _C(\top ) = \{\mathbf {tt},\mathbf {ff}\}\).
-
(ii)
The proof is very similar to the first case, and we skip the details.
-
(iii)
The proof is very similar to the first case, and we skip the details.
We next consider the inductive cases:
-
(iv)
By the inductive hypothesis, we know that,
If \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L}) = \top \vee {[\![b_2 ]\!]}_{_L}({\sigma }^{_L}) = \top \), then, as per the semantics in Fig. 6, \({[\![b_1 \wedge b_2 ]\!]}_{_L}({\sigma }^{_L}) = \top \), and the desired property trivially holds. However, if \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L}) \ne \top \,\wedge \, {[\![b_2 ]\!]}_{_L}({\sigma }^{_L}) \ne \top \), then using the inductive hypotheses, we know that for all \({\sigma }^{_D}\in \gamma _L({\sigma }^{_L})\),
evaluates to the same boolean value as \({[\![b_1 ]\!]}_{_L}({\sigma }^{_L})\). We can make the same deduction for \(b_2\). So, evaluating
also yields the same boolean value for all \({\sigma }^{_D}\in \gamma _L({\sigma }^{_L})\), and this value is equal to \({[\![b_1 \wedge b_2 ]\!]}_{_L}({\sigma }^{_L})\).
-
(v)
By the inductive hypothesis, we know that,
If \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \mathbf {tt}\), then
. So,
, and we can conclude that,
. We can similarly argue about the case when \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \mathbf {ff}\), and as stated previously, the case with, \({[\![b ]\!]}_{_L}({\sigma }^{_L}) = \top \) trivially holds.
\(\blacksquare \)
Theorem 13
(Soundness of Jacobian analysis)

Proof
We prove this by induction on the structure of statements in \(s^-\).
We first consider the base cases:
-
(i)
By definition, for any state \({\sigma }^{_L}\),
$$\begin{aligned} {[\![\mathbf {skip} ]\!]}_{_L}({\sigma }^{_L}) = {\sigma }^{_L}\end{aligned}$$(5)(6)(7) -
(ii)
We first observe that when multiplying an interval (l, u) with a constant c, if \(c \ge 0\), then the result is simply given by the interval \((c \cdot l, c \cdot u)\). But if \(c < 0\), then the result is in the interval \((c \cdot u, c \cdot l)\), i.e., the use of the lower bounds and upper bounds gets flipped. Similarly, when computing the dot product of an abstract vector v with a constant vector w, for each multiplication operation \(v_i \cdot w_i\), we use the same reasoning as above. Then, the lower bound and upper bound of the dot product result are given by, \((\sum \limits _{i=1 \wedge w_i \ge 0}^{n} w_i \cdot (v_i)_1 \,+\, \sum \limits _{i=1 \wedge w_i< 0}^{n} w_i \cdot (v_i)_2,\, \sum \limits _{i=1 \wedge w_i \ge 0}^{n} w_i \cdot (v_i)_2 \,+\, \sum \limits _{i=1 \wedge w_i < 0}^{n} w_i \cdot (v_i)_1)\) where \((v_i)_1\) represents the lower bound of the \(i^{th}\) element of v and \((v_i)_2\) represents the lower bound of the \(i^{th}\) element of v, and we assume \(\mathbf {dim}(w) = \mathbf {dim}(v) = n\). We do not provide the rest of the formal proof for this case since it just involves using the definitions.
Next, we consider the inductive cases:
-
(iii)
From the inductive hypothesis, we know,
(8)(9)From Eqs. 8 and 9, we conclude,
(10)Rewriting, we get,
(11)and this can be simplified further as,
(12) -
(iv)
From the inductive hypothesis, we know,
(13)(14)The conditional check can result in three different outcomes while performing the analysis - \(\mathbf {tt}\), \(\mathbf {ff}\), or \(\top \). From Lemma 12, we know that the abstract boolean checks are sound. We analyze each of the three cases separately.
-
(a)
Since we only consider the true case, we can write,
$$\begin{aligned} {[\![\mathbf {if}\; b\; \mathbf {then}\; s_1^-\; \mathbf {else}\; s_2^- ]\!]}_{_L}({\sigma }^{_L}) = {[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}) \end{aligned}$$(15)Also, from Lemma 12,
(16)(17) -
(b)
Similar to the \(\mathbf {tt}\) case, for the \(\mathbf {ff}\) case, we can show,
(18) -
(c)
We first prove the following about the join (\(\bigsqcup _L\)) operation,
$$\begin{aligned} \gamma _L({\sigma }^{_L}) \cup \gamma _L({\tilde{\sigma }}^{_L}) \subseteq \gamma _L({\sigma }^{_L}\sqcup _L {\tilde{\sigma }}^{_L}) \end{aligned}$$(19)By definition of \(\gamma _L\),
$$\begin{aligned} \begin{aligned} \gamma _L({\sigma }^{_L}) = \{ {\sigma }^{_D}\mid (\bigwedge _{v \in V}. {\sigma }^{_L}_1(v)_1 \leqslant {\sigma }^{_D}_1(v) \leqslant {\sigma }^{_L}_1(v)_2) \,\wedge \\ (\bigwedge _{v \in V}. ({\sigma }^{_L}_2(v)_1)_1 \leqslant {\sigma }^{_D}_2(v)_1 \leqslant ({\sigma }^{_L}_2(v)_1)_2) \,\wedge \\ {\sigma }^{_D}_2(v)_2 \in \gamma _V({\sigma }^{_L}_2(v)_2)\} \end{aligned} \end{aligned}$$(20)\(\gamma _L({\tilde{\sigma }}^{_L})\) can be defined similarly.
The join operation combines corresponding intervals in the abstract states by taking the smaller of the two lower bounds and larger of the two upper bounds. We do not prove the following formally, but from the definition of \(\gamma _L\) and \(\bigsqcup _L\), one can see that the intended property holds. Next, we consider the \(\mathbf {assert}\) statements that appear in the abstract denotational semantics for the \(\top \) case. Let us call, \({\sigma }^{_L}_1 ={[\![\mathbf {assert}\; b ]\!]}_{_L}({\sigma }^{_L})\) and \({\sigma }^{_L}_2 ={[\![\mathbf {assert}\; \lnot b ]\!]}_{_L}({\sigma }^{_L})\). From inductive hypothesis (13 and 14) we know,
(21)(22)$$\begin{aligned} L_1 \,\cup \, L_2 \subseteq \gamma _L({[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}_1)) \,\cup \, \gamma _L({[\![s_2^- ]\!]}_{_L}({\sigma }^{_L}_2)) \subseteq \gamma _L({[\![s_1^- ]\!]}_{_L}({\sigma }^{_L}_1) \,\sqcup \, {[\![s_2^- ]\!]}_{_L}({\sigma }^{_L}_2)) \end{aligned}$$(23)Then, if we can show that,
$$\begin{aligned} \{{\sigma }^{_D}\mid {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}) \wedge {[\![b ]\!]}({\sigma }^{_D}) = \mathbf {tt}\} \subseteq \gamma _L({\sigma }^{_L}_1) \end{aligned}$$(24)$$\begin{aligned} \{{\sigma }^{_D}\mid {\sigma }^{_D}\in \gamma _L({\sigma }^{_L}) \wedge {[\![b ]\!]}({\sigma }^{_D}) = \mathbf {ff}\} \subseteq \gamma _L({\sigma }^{_L}_2) \end{aligned}$$(25)then, from 21, 22, 23, 24, 25, and the semantics of \(\mathbf {if}\; b\; \mathbf {then}\; s_1^-\; \mathbf {else}\; s_2^-\), we can say,
(26)Now, we need to show that 24 and 25 are true. The \(\mathbf {assert}\) statements either behave as identity or produce a modified abstract state (see Fig. 6). When \(\mathbf {assert}\) behaves as identity, 24 and 25 are obviously true. We skip the proof of the case when \(\mathbf {assert}\) produces a modified abstract state.
-
(a)
\(\blacksquare \)
D Proof of Corollary 8
Corollary 8
(Upper bound of Jacobian operator norm)

Proof
From Theorem 6, we know that for any \(p \in s^-, {\sigma }^{_L}\in {\varSigma }^{_L}\),

Let us define, . This is the set of all Jacobian matrices associated with the variable v after executing p on the set of input states, \(\gamma _L({\sigma }^{_L})\). Note that the set \(D_V\) does not distinguish the Jacobians on the basis of the input that we are differentiating with respect to.
Let \(D_V^L = \{({\tilde{\sigma }}^{_D}_2(v))_1 \mid {\tilde{\sigma }}^{_D}\in \gamma _L({[\![p ]\!]}_{_L}({\sigma }^{_L}))\}\), and \(J = (({[\![p ]\!]}_{_L}({\sigma }^{_L}))_2(v))_1\).
Using Definition 5 of \(\gamma _L\), we can show,
where \(\leqslant \) is defined pointwise on the matrices, and \(J_1\)(\(J_2\)) refers to the matrix of lower(upper) bounds.
Then, from 1 and definitions of \(D_V\) and \(D_V^L\), we can deduce that,
Let \(J' = [\mathbf {max}\{|(J_{k,l})_1|, |(J_{k,l})_2|\} \mid k \in \{1,...,m\},\, l \in \{1,...,n\}]\). Then,
where \(|\cdot |\) applies pointwise on matrices d.
Using definition of operator norm, one can show that,
where \(M_1\) and \(M_2\) are matrices with \(\leqslant \) applied pointwise.
Finally, from 5 and 6, we conclude,
Unrolling the definitions,

\(\blacksquare \)
E Proof of Theorem 11
Theorem 11
(Soundness of \(\mathtt{PROLIP}\))
Let where \(g,\,f \in s^-\), \((k_U,d,vol) = \mathtt{PROLIP}(p,z_B)\), \(z \notin \mathbf {outv}(g), \, z \notin \mathbf {outv}(f), \,x \in \mathbf {inv}(f),\,\) and \(y \in \mathbf {outv}(f)\) then, \(\forall \sigma _0 \in \varSigma . \)
\(\underset{\sigma ,\sigma ' \sim {[\![p ]\!]}(\sigma _0)}{\Pr }((\Vert \sigma (y) - \sigma '(y)\Vert \leqslant k_U \cdot \Vert \sigma (x) - \sigma '(x)\Vert ) \wedge (\sigma (z),\sigma '(z) \in \gamma (z_B))) \ge vol\)
Proof
We prove this theorem in two parts.
First, let us define set \(\varSigma _P\) as, \(\varSigma _P = \{\sigma \mid \sigma \in \gamma _B({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1)\}\)
In words, \(\varSigma _P\) is the concretization of the abstract box produced by abstractly “interpreting” g; f on the input box \(z_B\). Assuming that z is not written to by g or f, it is easy to see from the definitions of the abstract semantics in Figs. 6 and 4 that, \(({[\![f ]\!]}_{_L}({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B])))_1(z) = z_B\), i.e., the final abstract value of z is the same as the initial value \(z_B\). Moreover, from Theorem 8, we know that the operator norm of the abstract Jacobian matrix, \({\Vert J\Vert }_{_L}\) upper bounds the operator norm of every Jacobian of f for variable y with respect to x (since \(x \in \mathbf {inv}(f), y \in \mathbf {outv}(f)\)) for every input in \(\gamma _B({[\![g ]\!]}_{_B}({\sigma }^{_B}[z \mapsto z_B]))\), which itself is an upper bound on the local Lipschitz constant in the same region.
In other words, we can say that,
\(\forall \sigma ,\sigma ' \in \varSigma _P.\; \sigma (z), \sigma '(z) \in \gamma (z_B) \wedge \Vert \sigma (y) - \sigma '(y)\Vert \leqslant k_U \cdot \Vert \sigma (x) - \sigma '(x)\Vert \).
To complete the proof, we need to show that, \(\underset{\sigma ,\sigma ' \sim {[\![p ]\!]}(\sigma _0)}{\Pr }(\sigma ,\sigma ' \in \varSigma _P) \ge vol\). We show this in the second part of this proof.
Using the semantic definition of \(pcat\) (Fig. 2), we know that,

We first analyze . Again using the semantic definition of \(pcat\), we write,

We are interested in the volume of the set \(\varSigma _z\), defined as, \(\varSigma _z = \{\sigma \mid \sigma (z) \in z_B\}\). Using the expression for from above, we can now compute the required probability as follows,

This shows that starting from any \(\sigma _0 \in \varSigma \), after executing the first statement of p, the probability that the value stored at z lies in the box \(z_B\) is \(vol'\).
Next, we analyze . In particular, we are interested in the volume of the set,
(which is notational abuse for the set
). We can lower bound this volume as follows,

We can similarly show that,

Now, \({\sigma }^{_B}[z \mapsto z_B]\) defined on line 2 of Algorithm 1 is such that
\(\gamma ({\sigma }^{_B}[z \mapsto z_B]) = \varSigma _z\). From Theorem 10, we can conclude that,

Similarly, from Theorem 6, we can conclude that,

From 4 and 6, we conclude that,
Consequently,
since each act of sampling is independent. \(\blacksquare \)
F Translating Neural Networks into \(pcat\)
NNs are often described as a sequential composition of “layers”, with each layer describing the computation to be performed on an incoming vector. Many commonly used layers can be expressed in the \(pcat\) language. For instance, [28] describes the translation of maxpool, convolution, ReLU, and fully connected layers into the \(cat\) language. Here, we describe the translation of two other common layers, namely, the batchnorm layer [34] and the transposed convolution layer (also referred to as the deconvolution layer) [60].
Batchnorm Layer. A batchnorm layer typically typically expects an input \(x \in \mathbb {R}^{C \times H \times W}\) which we flatten, using a row-major form in to \(x' \in \mathbb {R}^{C \cdot H \cdot W}\) where, historically, C denotes the number of channels in the input, H denotes the height, and W denotes the width. For instance, given an RGB image of dimensions 28 \(\times \) 28 pixels, \(H = 28\), \(W = 28\), and \(C = 3\).
A batchnorm layer is associated with vectors m and v such that \(\mathbf {dim}(m) = \mathbf {dim}(v) = C\) where \(\mathbf {dim}(\cdot )\) returns the dimension of a vector. m and v represent the running-mean and running-variance of the values in each channel observed during the training time of the NN. A batchnorm layer is also associated with a scaling vector \(s^1\) and a shift vector \(s^2\), both also of dimension c. For a particular element \(x_{i,j,k}\) in the input, the corresponding output element is \(s^1_i\cdot (\frac{x_{i,j,k}-m_i}{\sqrt{v_i+\epsilon }})+s^2_i\) where \(\epsilon \) is a constant that is added for numerical stability (commonly set to \(1e^{-5}\)). Note that the batchnorm operation produces an output of the same dimensions as the input. We can represent the batchnorm operation by the statement, \(y \leftarrow w \cdot x' + \beta \), where \(x'\) is the flattened input, w is a weight matrix of dimension \(C \cdot H \cdot W \times C \cdot H \cdot W\) and \(\beta \) is a bias vector of dimension \(C \cdot H \cdot W\), such that,
\(w = I \cdot [\frac{s^1_{\lfloor i/H\cdot W \rfloor }}{\sqrt{v_{\lfloor i/H\cdot W \rfloor }+\epsilon }} \mid i \in \{1,...,C \cdot H \cdot W\}] \)
\(\beta = [-\frac{s^1_{\lfloor i/H\cdot W \rfloor }\cdot m_{\lfloor i/H\cdot W \rfloor }}{\sqrt{v_{\lfloor i/H\cdot W \rfloor }+\epsilon }} + s^2_{\lfloor i/H\cdot W \rfloor } \mid i \in \{1,...,C \cdot H \cdot W\}]\)
where I is the identity matrix with dimension \((C \cdot H \cdot W, C \cdot H \cdot W)\), \(\lfloor \cdot \rfloor \) is the floor operation that rounds down to an integer, and \([ \; \mid \; ]\) is the list builder/comprehension notation.
Transposed Convolution Layer. A convolution layer applies a kernel or a filter on the input vector and typically, compresses this vector so that the output vector is of a smaller dimension. A deconvolution or transposed convolution layer does the opposite - it applies the kernel in a manner that produces a larger output vector. A transposed convolution layer expects an input \(x \in \mathbb {R}^{C_{in} \times H_{in} \times W_{in}}\) and applies a kernel \(k \in \mathbb {R}^{C_{out} \times C_{in} \times K_h \times K_w}\) using a stride S. For simplicity of presentation, we assume that \(K_h = K_w = K\) and \(W_{in}=H_{in}\). In \(pcat\), the transposed convolution layer can be expressed by the statement, \(y \leftarrow w \cdot x'\), where \(x'\) is the flattened version of input x, w is a weight matrix that we derive from the parameters associated with the transposed convolution layer, and the bias vector, \(\beta \), is a zero vector in this case. To compute the dimensions of the weight matrix, we first calculate the height (\(H_{out}\)) and width (\(W_{out}\)) of each channel in the output using formulae, \(H_{out}=H_{in}\cdot S + K\), and \(W_{out}=W_{in}\cdot S + K\). Since we assume \(W_{in}=H_{in}\), we have \(W_{out} = H_{out}\) here. Then, the dimension of w is \({C_{out} \cdot H_{out} \cdot W_{out}\times C_{in} \cdot H_{in} \cdot W_{in}}\), and the definition of w is as follows,

where \(I = \{1,...,C_{out} \cdot H_{out} \cdot W_{out}\}\) and \(J = \{1,...,C_{in} \cdot H_{in} \cdot W_{in}\}\)
G Details of Box Analysis
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mangal, R., Sarangmath, K., Nori, A.V., Orso, A. (2020). Probabilistic Lipschitz Analysis of Neural Networks. In: Pichardie, D., Sighireanu, M. (eds) Static Analysis. SAS 2020. Lecture Notes in Computer Science(), vol 12389. Springer, Cham. https://doi.org/10.1007/978-3-030-65474-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-65474-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65473-3
Online ISBN: 978-3-030-65474-0
eBook Packages: Computer ScienceComputer Science (R0)