Learn NNV in 3 Minutes¶
Why Neural Network Verification?¶
Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications, such as image classification [He et al., 2016] and natural language processing [Vaswani et al., 2017]. However, despite their wide applications, both traditional ML models and DNNs [Goodfellow et al., 2015, Szegedy et al., 2013] are vulnerable to adversarial evasion attacks where carefully crafted adversarial examples—inputs with adversarial perturbations—can mislead ML models to make arbitrarily incorrect predictions [Athalye et al., 2018, Brendel et al., 2018].
The existence of adversarial attacks leads to great safety concerns for DNN-based applications, especially in safety-critical scenarios such as autonomous driving [Eykholt et al., 2018]. While many empirical defenses have been proposed to improve robustness, many can be adaptively attacked again by sophisticated attackers [Athalye et al., 2018]. This everlasting competition between attackers and defenders motivates studies on certifiably robust approaches for DNNs, which include both robustness verification and robust training [Cohen et al., 2019, Katz et al., 2017, Weng et al., 2018].
What is Neural Network Verification?¶
Neural network verification (NNV) is a process to check whether a neural network model satisfies a given property. Robustness verification approaches aim to evaluate DNN robustness by providing a theoretically certified lower bound of robustness under certain perturbation constraints, providing guarantees against any possible future attacks.
Commonly, we want to check whether the output of a neural network satisfies a given property for a specified set of input data. For example, you can check whether the output of a neural network is always positive for all perturbed input data.
Tip
Even though the property to verify may also be about inputs with a given output, we only consider the output for a given input in this document. There have been some researches on the verification of the input-output relation of neural networks, but this is not the primary focus here.
Example
To understand it, we give a small example of a given function \(f(x)\) and a set of input \(\mathcal{X} = \{ x \mid x \in [-1,1]\}\). We want to check whether the output of the function \(f(x)\) is always positive for all input \(x \in \mathcal{X}\).
This is a very simple example, but it is a basic concept of neural network verification. For the case of neural networks, the function \(f\) is a neural network model and the input \(x\) is a data point. Commonly, the input \(x\) is a high-dimensional vector and the function \(f\) is a deep neural network model. The property we want to check is called a property and the set of input data is called an input domain.
Threat Model: \((\ell_p, \epsilon)\)-Adversary¶
Existing studies on certified robustness mainly aim to defend against white-box evasion attacks, which represent the strongest adversaries who have full knowledge of the target model, including its parameters and architecture. In particular, the adversary carefully crafts a bounded perturbation to the original input, generating an adversarial example [Szegedy et al., 2013] to fool the model into making incorrect predictions.
We formally define the \((\ell_p, \epsilon)\)-adversary:
Definition: \((\ell_p, \epsilon)\)-Adversary
For a given input \((x_0, y_0)\), where \(x_0 \in \mathcal{X}\) is the input instance and \(y_0\) is its true label, the \((\ell_p, \epsilon)\)-adversary will generate a perturbed input \(x \in B_{p,\epsilon}(x_0)\), such that \(F_\theta(x) \neq y_0\), where:
is the perturbation region, and \(\epsilon\) is called the perturbation radius.
Common \(\ell_p\) Norms:
Maximum difference
Each dimension can change by at most \(\epsilon\)
For images: each pixel within \(\epsilon\) of original
Euclidean distance
Total Euclidean distance at most \(\epsilon\)
Spherical perturbation region
Manhattan distance
Sum of absolute changes at most \(\epsilon\)
Diamond-shaped perturbation region
Difficulties on NNV¶
What makes this problem difficult is that
the neural network is a high-dimensional non-linear function and
the input domain is a high-dimensional space containing infinite points.
So, we cannot just check all points in the input domain by brute force. Instead, we need another analysis method to check the property.
Formal Problem Formulation¶
We can view robustness verification from an optimization perspective. The goal is to verify whether the model prediction remains correct for all perturbed inputs within the threat region.
Problem: Robustness Verification as Optimization
Given a neural network \(f_\theta: \mathcal{X} \to \mathbb{R}^C\), input instance \(x_0 \in \mathcal{X}\), ground-truth label \(y_0\), any other label \(y'\), and radius \(\epsilon > 0\), we define:
If \(\mathcal{M}(y_0, y') > 0\) for all \(y' \neq y_0\), then \(f_\theta\) is certifiably robust at \(x_0\) within radius \(\epsilon\).
Intuitively, this formulation searches for the minimum margin between the model confidence for the true class \(y_0\) and any other class \(y'\). If this margin is always positive, the model will always predict \(y_0\) for any perturbed input.
Complete vs. Incomplete Verification:
Exactly solves the optimization problem
✅ Soundness (no false positives)
✅ Completeness (no false negatives)
❌ NP-complete - computationally intractable for large networks
Conservatively provides a lower bound of \(\mathcal{M}\)
✅ More scalable to large networks
✅ Faster computation
❌ May be too loose (false negatives possible)
The fundamental challenge is the scalability-tightness trade-off: we can achieve either scalability or tightness, but not both simultaneously.
Approaches for NNV¶
To solve this problem, sometimes we cannot give a precise answer (what we call complete approaches) or it is computationally too expensive, but an approximate answer (what we call incomplete approaches) is sufficient.
One easy incomplete approach is to use interval arithmetic, but it is not tight enough for deep neural networks, especially with non-linear activations and deep layers. Bound propagation is a more advanced approach than interval arithmetic because it considers symbolic variables rather than just scalar values. Bound propagation has become a popular approach in neural network verification due to its high efficiency and scalability.
Local Robustness for Classification¶
We focus on the verification of local robustness of neural networks for classification tasks. Local robustness is a property that the output of a neural network does not change its predicted class when the input is perturbed slightly.
For classification tasks, the local robustness is a property that the output of a neural network does not change its class when the input is perturbed slightly. The logits of the correct label should be the highest among all labels for all perturbed input data, i.e., the correct label should be greater than all other labels.
Hint
Local robustness is commonly a property to evaluate the effectiveness and efficiency of a verification tool because the local robustness
considers all perturbation of the high-dimensional input and
cover all output properties that can be defined by a linear form.
For input perturbation, we commonly use the \(\epsilon\)-ball around the input data \(x\) as the perturbed input data. The \(\epsilon\)-ball is defined as a \(l_{\infty}\) norm ball with radius \(\epsilon\) around the input data \(x\):
For a image data, the \(l_{\infty}\) norm is defined as the maximum absolute difference between the pixel values of two images. That is to say each pixel value of the perturbed image is within \(\epsilon\) of the original image.
Then, we can define the property as follows:
This means that the output of the neural network should be the highest for the correct answer and the difference between the correct answer and the other answers should be positive.
Further Reading
This introduction provides a high-level overview of neural network verification. For readers interested in diving deeper, we recommend the following resources organized by topic:
Adversarial Examples and Attacks:
The field of neural network verification was motivated by the discovery of adversarial examples [Goodfellow et al., 2015, Szegedy et al., 2013]. The FGSM attack [Goodfellow et al., 2015] and C&W attack [Carlini and Wagner, 2017] demonstrated that deep neural networks are vulnerable to small, carefully crafted perturbations. The PGD attack [Madry et al., 2018] showed that adversarial training could improve robustness, while Athalye et al. [2018] revealed that many defenses provide only obfuscated gradients rather than true robustness.
Verification Methods:
Modern verification methods aim to provide formal guarantees against adversarial perturbations. Complete verification approaches include Reluplex [Katz et al., 2017] and MILP-based methods [Tjeng et al., 2019]. Incomplete but scalable verification approaches include CROWN [Weng et al., 2018], DeepPoly [Singh et al., 2019], and scalable verification [Gowal et al., 2019]. SDP-based approaches [Raghunathan et al., 2018] and Lipschitz-based methods [Fazlyab et al., 2019] provide alternative relaxation techniques.
Robust Training:
Certified adversarial training combines verification with training to improve certified robustness [Gowal et al., 2021, Madry et al., 2018, Wong and Kolter, 2018]. Randomized smoothing [Cohen et al., 2019] provides probabilistic certification guarantees with better scalability.
Surveys and Benchmarks:
For comprehensive treatments of the field, see the systematization of knowledge on certified robustness and related surveys on adversarial robustness.
Next Guide
Continue to Threat Models in Neural Network Verification to learn more about \(\ell_p\) norm perturbations and adversarial attack models in detail.
Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In International Conference on Machine Learning, 274–283. 2018.
Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In International Conference on Learning Representations. 2018.
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), 39–57. IEEE, 2017.
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1625–1634. 2018.
Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations. 2015.
Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.
Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data. Advances in Neural Information Processing Systems, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. 2016.
Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.
Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.
Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, 5998–6008. 2017.
Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.
Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 5286–5295. 2018.
Comments & Discussion