Learn NNV in 3 Minutes
A quick introduction to neural network verification, adversarial robustness, and certified defense concepts.
Why Neural Network Verification?
Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications, such as image classification and natural language processing. However, despite their wide applications, both traditional ML models and DNNs are vulnerable to adversarial evasion attacks where carefully crafted adversarial examples---inputs with adversarial perturbations---can mislead ML models to make arbitrarily incorrect predictions.
The existence of adversarial attacks leads to great safety concerns for DNN-based applications, especially in safety-critical scenarios such as autonomous driving. While many empirical defenses have been proposed to improve robustness, many can be adaptively attacked again by sophisticated attackers. This everlasting competition between attackers and defenders motivates studies on certifiably robust approaches for DNNs, which include both robustness verification and robust training.
What is Neural Network Verification?
Neural network verification (NNV) is a process to check whether a neural network model satisfies a given property. Robustness verification approaches aim to evaluate DNN robustness by providing a theoretically certified lower bound of robustness under certain perturbation constraints, providing guarantees against any possible future attacks.
Commonly, we want to check whether the output of a neural network satisfies a given property for a specified set of input data. For example, you can check whether the output of a neural network is always positive for all perturbed input data.
Tip: Even though the property to verify may also be about inputs with a given output, we only consider the output for a given input in this document. There have been some researches on the verification of the input-output relation of neural networks, but this is not the primary focus here.
Example
To understand it, we give a small example of a given function and a set of input . We want to check whether the output of the function is always positive for all input .
This is a very simple example, but it is a basic concept of neural network verification. For the case of neural networks, the function is a neural network model and the input is a data point. Commonly, the input is a high-dimensional vector and the function is a deep neural network model. The property we want to check is called a property and the set of input data is called an input domain.
Threat Model: -Adversary
Existing studies on certified robustness mainly aim to defend against white-box evasion attacks, which represent the strongest adversaries who have full knowledge of the target model, including its parameters and architecture. In particular, the adversary carefully crafts a bounded perturbation to the original input, generating an adversarial example to fool the model into making incorrect predictions.
What’s the threat model? An attacker gets to modify the input within a budget (epsilon = ε). Different ways of measuring the budget:
- L-infinity (pixel-level): “Each pixel value can change by up to ε” (practical for images—sticker attacks on stop signs)
- L2 (total distance): “The overall perturbation magnitude (Euclidean distance) is at most ε” (more mathematically clean)
- L1 (cumulative change): “Add up all the changes; the sum can’t exceed ε” (rarely used, but possible)
The attacker works within this budget. Your job: prove the network never fails even when attacked with that budget.
Why It’s Hard
Two fundamental obstacles:
- The network is complex — high-dimensional, highly nonlinear. Behavior in one region has no relation to behavior in another.
- The input space is infinite — for a 224×224×3 image, there are countless possible perturbations. You can’t test them all.
Brute force checking fails instantly. Need something smarter.
The Core Challenge
What does verification actually need to prove? That the network’s confidence in the correct class is always higher than its confidence in any wrong class, no matter what perturbation the attacker applies (within budget).
Think of it like this: the attacker is hunting for an adversarial example. The verifier is trying to prove such an example can’t exist. It’s a game: attacker searches, verifier proves impossibility.
Two Strategies, Two Tradeoffs:
Exact (but slow): Precisely solve the problem. Guaranteed sound — if verification succeeds, the network is truly robust. But the math gets exponentially harder as networks grow. Small networks? Fine. Large networks? Intractable.
Approximate (but fast): Don’t solve exactly. Instead, ask: “What’s the worst-case bound I can prove quickly?” Accept that sometimes you can’t answer (maybe the network is robust, maybe it isn’t — you just couldn’t prove it in time). But scales to real networks.
This is the core tension: precise analysis is slow; fast analysis is conservative.
How Verification Works
The practical approach: bound propagation. Instead of asking “what exact values can outputs take?” ask “what range of values?”
For each neuron, maintain bounds: “this activation will be between -5 and 10.” Propagate these bounds forward through the network. If the final layer bounds guarantee correct class > wrong class, you’ve proven robustness.
Simple but powerful. Scales to real networks.
What We Actually Verify
Most often: local robustness for classification. Simple goal: “If I perturb an image within my threat budget, does the network still predict the same class?”
For a stop sign (correct label), the network outputs confidence scores for all classes. Local robustness means: the stop-sign score stays highest compared to all other class scores, no matter what perturbation (within budget) you apply.
That’s it. Sounds simple. But proving it for a real network with millions of parameters is the hard part.
Next Guide
Continue to Threat Models to learn how attackers measure their budgets and what perturbation strategies matter.