Learn NNV in 3 Minutes

Learn NNV in 3 Minutes¶

January 2026 NNV Guide 6 min read 1,297 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks. It introduces neural network verification, adversarial robustness, and certified defense concepts.

Foundations Introduction Overview

Why Neural Network Verification?¶

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications, such as image classification [He et al., 2016] and natural language processing [Vaswani et al., 2017]. However, despite their wide applications, both traditional ML models and DNNs [Goodfellow et al., 2015, Szegedy et al., 2013] are vulnerable to adversarial evasion attacks where carefully crafted adversarial examples—inputs with adversarial perturbations—can mislead ML models to make arbitrarily incorrect predictions [Athalye et al., 2018, Brendel et al., 2018].

The existence of adversarial attacks leads to great safety concerns for DNN-based applications, especially in safety-critical scenarios such as autonomous driving [Eykholt et al., 2018]. While many empirical defenses have been proposed to improve robustness, many can be adaptively attacked again by sophisticated attackers [Athalye et al., 2018]. This everlasting competition between attackers and defenders motivates studies on certifiably robust approaches for DNNs, which include both robustness verification and robust training [Cohen et al., 2019, Katz et al., 2017, Weng et al., 2018].

What is Neural Network Verification?¶

Neural network verification (NNV) is a process to check whether a neural network model satisfies a given property. Robustness verification approaches aim to evaluate DNN robustness by providing a theoretically certified lower bound of robustness under certain perturbation constraints, providing guarantees against any possible future attacks.

Commonly, we want to check whether the output of a neural network satisfies a given property for a specified set of input data. For example, you can check whether the output of a neural network is always positive for all perturbed input data.

Tip

Even though the property to verify may also be about inputs with a given output, we only consider the output for a given input in this document. There have been some researches on the verification of the input-output relation of neural networks, but this is not the primary focus here.

Example

To understand it, we give a small example of a given function \(f(x)\) and a set of input \(\mathcal{X} = \{ x \mid x \in [-1,1]\}\). We want to check whether the output of the function \(f(x)\) is always positive for all input \(x \in \mathcal{X}\).

This is a very simple example, but it is a basic concept of neural network verification. For the case of neural networks, the function \(f\) is a neural network model and the input \(x\) is a data point. Commonly, the input \(x\) is a high-dimensional vector and the function \(f\) is a deep neural network model. The property we want to check is called a property and the set of input data is called an input domain.

Threat Model: \((\ell_p, \epsilon)\)-Adversary¶

Existing studies on certified robustness mainly aim to defend against white-box evasion attacks, which represent the strongest adversaries who have full knowledge of the target model, including its parameters and architecture. In particular, the adversary carefully crafts a bounded perturbation to the original input, generating an adversarial example [Szegedy et al., 2013] to fool the model into making incorrect predictions.

We formally define the \((\ell_p, \epsilon)\)-adversary:

Definition: \((\ell_p, \epsilon)\)-Adversary

For a given input \((x_0, y_0)\), where \(x_0 \in \mathcal{X}\) is the input instance and \(y_0\) is its true label, the \((\ell_p, \epsilon)\)-adversary will generate a perturbed input \(x \in B_{p,\epsilon}(x_0)\), such that \(F_\theta(x) \neq y_0\), where:

\[B_{p,\epsilon}(x_0) := \{ x : \|x - x_0\|_p < \epsilon \}\]

is the perturbation region, and \(\epsilon\) is called the perturbation radius.

Common \(\ell_p\) Norms:

\(\ell_\infty\) Norm

Maximum difference

Each dimension can change by at most \(\epsilon\)

For images: each pixel within \(\epsilon\) of original

\(\ell_2\) Norm

Euclidean distance

Total Euclidean distance at most \(\epsilon\)

Spherical perturbation region

\(\ell_1\) Norm

Manhattan distance

Sum of absolute changes at most \(\epsilon\)

Diamond-shaped perturbation region

Difficulties on NNV¶

What makes this problem difficult is that

the neural network is a high-dimensional non-linear function and
the input domain is a high-dimensional space containing infinite points.

So, we cannot just check all points in the input domain by brute force. Instead, we need another analysis method to check the property.

Formal Problem Formulation¶

We can view robustness verification from an optimization perspective. The goal is to verify whether the model prediction remains correct for all perturbed inputs within the threat region.

Problem: Robustness Verification as Optimization

Given a neural network \(f_\theta: \mathcal{X} \to \mathbb{R}^C\), input instance \(x_0 \in \mathcal{X}\), ground-truth label \(y_0\), any other label \(y'\), and radius \(\epsilon > 0\), we define:

\[\mathcal{M}(y_0, y') = \min_{x \in B_{p,\epsilon}(x_0)} f_\theta(x)_{y_0} - f_\theta(x)_{y'}\]

If \(\mathcal{M}(y_0, y') > 0\) for all \(y' \neq y_0\), then \(f_\theta\) is certifiably robust at \(x_0\) within radius \(\epsilon\).

Intuitively, this formulation searches for the minimum margin between the model confidence for the true class \(y_0\) and any other class \(y'\). If this margin is always positive, the model will always predict \(y_0\) for any perturbed input.

Complete vs. Incomplete Verification:

Complete Verification

Exactly solves the optimization problem

✅ Soundness (no false positives)

✅ Completeness (no false negatives)

❌ NP-complete - computationally intractable for large networks

Incomplete Verification

Conservatively provides a lower bound of \(\mathcal{M}\)

✅ More scalable to large networks

✅ Faster computation

❌ May be too loose (false negatives possible)

The fundamental challenge is the scalability-tightness trade-off: we can achieve either scalability or tightness, but not both simultaneously.

Approaches for NNV¶

To solve this problem, sometimes we cannot give a precise answer (what we call complete approaches) or it is computationally too expensive, but an approximate answer (what we call incomplete approaches) is sufficient.

One easy incomplete approach is to use interval arithmetic, but it is not tight enough for deep neural networks, especially with non-linear activations and deep layers. Bound propagation is a more advanced approach than interval arithmetic because it considers symbolic variables rather than just scalar values. Bound propagation has become a popular approach in neural network verification due to its high efficiency and scalability.

Local Robustness for Classification¶

We focus on the verification of local robustness of neural networks for classification tasks. Local robustness is a property that the output of a neural network does not change its predicted class when the input is perturbed slightly.

For classification tasks, the local robustness is a property that the output of a neural network does not change its class when the input is perturbed slightly. The logits of the correct label should be the highest among all labels for all perturbed input data, i.e., the correct label should be greater than all other labels.

Hint

Local robustness is commonly a property to evaluate the effectiveness and efficiency of a verification tool because the local robustness

considers all perturbation of the high-dimensional input and
cover all output properties that can be defined by a linear form.

For input perturbation, we commonly use the \(\epsilon\)-ball around the input data \(x\) as the perturbed input data. The \(\epsilon\)-ball is defined as a \(l_{\infty}\) norm ball with radius \(\epsilon\) around the input data \(x\):

\[\{ x' \mid \| x' - x \| \leq \epsilon \}.\]

For a image data, the \(l_{\infty}\) norm is defined as the maximum absolute difference between the pixel values of two images. That is to say each pixel value of the perturbed image is within \(\epsilon\) of the original image.

Then, we can define the property as follows:

\[f(x) = \arg\max_{i} f_i(x) \quad \Rightarrow \quad \forall j \neq i, f_i(x) - f_j(x) > 0.\]

This means that the output of the neural network should be the highest for the correct answer and the difference between the correct answer and the other answers should be positive.

Next Guide

Continue to Threat Models in Neural Network Verification to learn more about \(\ell_p\) norm perturbations and adversarial attack models in detail.

[1] (1,2,3)

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In International Conference on Machine Learning, 274–283. 2018.

[2]

Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In International Conference on Learning Representations. 2018.

[3]

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), 39–57. IEEE, 2017.

[4] (1,2)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[5]

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1625–1634. 2018.

[6]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.

[7] (1,2,3)

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations. 2015.

[8]

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[9]

Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data. Advances in Neural Information Processing Systems, 2021.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. 2016.

[11] (1,2)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[12] (1,2)

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.

[13]

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[14]

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[15] (1,2,3)

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

[16]

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, 5998–6008. 2017.

[18] (1,2)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[19]

Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 5286–5295. 2018.

Learn NNV in 3 Minutes

Contents

Learn NNV in 3 Minutes¶

Why Neural Network Verification?¶

What is Neural Network Verification?¶

Threat Model: \((\ell_p, \epsilon)\)-Adversary¶

Difficulties on NNV¶

Formal Problem Formulation¶

Approaches for NNV¶

Local Robustness for Classification¶

Comments & Discussion