Probabilistic robustness estimates for feed-forward neural networks
Introduction
Deep neural networks have proven to be very effective in practice to perform highly complex learning tasks (Goodfellow, Bengio, & Courville, 2016). Due to this success, they have gained a great deal of attention during these past few years and they have been applied widely. However, they also have been found to be very sensitive to data uncertainties (Fawzi et al., 2017, Szegedy et al., 2014) to the point that a whole research community is now addressing the so-called network attacks in order to study and design input noise that can fool the network decision. Attacks can be random when data are corrupted by some random noise or adversarial when the noise is specifically designed to alter the network output (Szegedy et al., 2014). Even though both types of attacks are related since they are both addressing the robustness of the network, we will only focus in this article on the random case. Most data are usually uncertain, either because the data are related to naturally noisy phenomenon and we only have access to some of its statistics or because assessing devices to do not have sufficient accuracy to record precisely the data. In this study, we will therefore assume that the network input data are corrupted by some additive bounded random noise.
Robustness to bounded input perturbations has been analyzed in the past few years. Most people have addressed the problem through the use of regularization techniques (Finlay et al., 2018, Gouk et al., 2018, Oberman and Calder, 2018, Virmaux and Scaman, 2018). The main idea is to consider the neural network as a Lipschitz map between the input and output data. The Lipschitz constant of the network is then estimated or upper bounded by the norm of the layer-by-layer weights product. This estimates the expansion or contraction capability of the network and is then used to regularize the loss during training. Often, there is a price to pay: the expressiveness of the network may be reduced, especially if the weights are too constrained or constrained layer by layer instead of constrained across layer (Couellan, 2021). Such strategies are enforcing robustness but do not provide guarantees or estimates on the level of robustness that has been achieved. In the case of adversarial perturbation, some authors have proposed methods for certifying robustness (Boopathy et al., 2018, Kolter and Wong, 2017). Recently, a probabilistic approach has also been proposed in the case of random noise for convolutional neural networks (Weng, Chen, Nguyen, Squillante, Boopathy, Oseledets, & Daniel, 2019). Pointing out that the threat of random noise may have been overlooked by the research community in favor of adversarial attacks, the authors have proposed probabilistic bounds based on the idea that the output of the network can be lower and upper bounded by two linear functions. The work proposed here is along the same line but distinct in several aspects. It combines upper bounds on tail probabilities calculated by deriving a specific Cramer–Chernoff concentration inequality for the propagation of uncertainty through the network with a network sensitivity estimate based on a network gradient calculation with respect to the inputs. The network gradient is computed by automatic differentiation and estimates the local variation of the output with respect to the input of the network. The estimation is carried out and averaged over the complete training set. A maximum component-wise gradient variation is also calculated in order to give probabilistic certificates rather than estimates. The certificates can be used in place of estimates whenever guaranteed upper bounds are needed, however they are often not as accurate since they are based on variation bounds rather than averages. For the specific case of piece-wise linear activation functions, we also propose an alternative bound based on the calculation of an average activation operator matrix computed at each layer using also the training samples. We then discuss the use of the derived bounds and estimate to regularize the neural network during training in order to reach regions of the weight space that ensure greater robustness properties. We then design experiments in order to assess the robustness probabilistic estimates for various regularization strategies.
The article is organized as follows: Section 2 provides the specific neural network concentration inequality using the Cramer–Chernoff method and the calculation of the network gradient estimate and the average activation operator for the case of piece-wise linear activations. Section 4 deals with training of the neural network and its regularization issues to increase its robustness. Section 5 provides the results of an empirical evaluation of the neural network robustness for various public datasets. Finally, Section 6 concludes the article.
Section snippets
Probabilistic certificates of robustness
Consider feed-forward neural networks that we represent as a successive composition of linear weighted combination of functions such that for , where is the input of the th layer, the function is the -Lipschitz continuous activation function at layer , and and are the weight matrix and bias vector between layer and that define our model parameter that we want to estimate during training. The network can be seen
General neural network activations
Remember that the bound derived above relies on the fact that we have considered the linear upper bound of the neural network response. Therefore, the inequality (2) applied to the multi-layers case gives Even if is known for all levels (ex: for all if all network activations are ReLu), their product may be a loose bound for the network Lipschitz constant. This means that the Chernoff bound proposed above may be tight with respect to
Controlling the bound during training
In this section, we are interested in exploiting the bounds derived above during the process of training the neural network. The main idea would be to ensure that optimal weights after training are satisfying the bound constraint (11) or (13). Naturally, this could be formulated as a constrained optimization training problem. Stochastic projected gradient techniques (Lacoste-Julien et al., 2012, Nedic and Lee, 2014) could be used to solve such a problem. However, in the general case, the
Experiments
In order to assess the quality of the estimated probability bounds, experiments are conducted on two types of datasets (regression and classification data). The neural network, its training and testing are implemented in the python (Team, 2015) environment using the keras (Chollet et al., 2015) library and Tensorflow (Abadi et al., 2015) backend. Results for the general network gradient strategy and the activation operator strategy presented in Section 3 are tested and presented next.
Conclusions
In this study, we have proposed analytical probabilistic estimates (and certificates) for feed-forward neural networks. The idea combines tail probability bound calculation using the Cramer–Chernoff scheme and the estimation of the network local variation. The network gradient computation is using the automatic differentiation procedure available in many neural network training packages and carried out only at the training samples which does not require much extra computational cost. In the
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Our work has benefited from the AI Interdisciplinary Institute ANITI. ANITI is funded by the French “Investing for the Future - PIA3” program under the Grant agreement # ANR-19-PI3A-0004.
References (30)
- et al.
Hedonic prices and the demand for clean air
Journal of Environmental Economics and Management
(1978) Universality of deep convolutional neural networks
Applied and Computational Harmonic Analysis
(2020)- et al.
TensorFlow: Large-scale machine learning on heterogeneous systems
(2015) Numerical analysis and optimization
(2007)- et al.
CNN-cert: An efficient framework for certifying robustness of convolutional neural networks
(2018) - et al.
Concentration inequalities, a nonasymptotic theory of independence
(2013) Keras
(2015)The coupling effect of Lipschitz regularization in deep neural networks
SN Computer Science
(2021)Bounds on the extreme eigenvalues of positive-definite Toeplitz matrices
IEEE Transactions on Information Theory
(1988)
The robustness of deep networks: A geometrical perspective
IEEE Signal Processing Magazine
Improved robustness to adversarial examples using Lipschitz regularization of the loss
Deep learning
Regularisation of neural networks by enforcing Lipschitz continuity
Adam: A method for stochastic optimization
Cited by (15)
Design and optimization of a novel U-type vertical axis wind turbine with response surface and machine learning methodology
2022, Energy Conversion and ManagementCitation Excerpt :The weights and bias are continuously adjusted and modified by back-propagation. After the error between the predicted value of output layer and true value drops to preset threshold, BPNN is believed to reveal the relationship of sample [73]. Afterwards, validation data sample need to assess the approximating performance of BPNN.
Optimization of a vertical axis wind turbine with a deflector under unsteady wind conditions via Taguchi and neural network applications
2022, Energy Conversion and ManagementNeural partially linear additive model
2024, Frontiers of Computer ScienceNeural network robustness evaluation based on interval analysis
2023, Neural Computing and Applications