Phase 1: Foundations Guide 1

Learn NNV in 3 Minutes

A quick introduction to neural network verification, adversarial robustness, and certified defense concepts.

Why Neural Network Verification?

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications, such as image classification and natural language processing. However, despite their wide applications, both traditional ML models and DNNs are vulnerable to adversarial evasion attacks where carefully crafted adversarial examples---inputs with adversarial perturbations---can mislead ML models to make arbitrarily incorrect predictions.

The existence of adversarial attacks leads to great safety concerns for DNN-based applications, especially in safety-critical scenarios such as autonomous driving. While many empirical defenses have been proposed to improve robustness, many can be adaptively attacked again by sophisticated attackers. This everlasting competition between attackers and defenders motivates studies on certifiably robust approaches for DNNs, which include both robustness verification and robust training.

What is Neural Network Verification?

Neural network verification (NNV) is a process to check whether a neural network model satisfies a given property. Robustness verification approaches aim to evaluate DNN robustness by providing a theoretically certified lower bound of robustness under certain perturbation constraints, providing guarantees against any possible future attacks.

Commonly, we want to check whether the output of a neural network satisfies a given property for a specified set of input data. For example, you can check whether the output of a neural network is always positive for all perturbed input data.

Tip: Even though the property to verify may also be about inputs with a given output, we only consider the output for a given input in this document. There have been some researches on the verification of the input-output relation of neural networks, but this is not the primary focus here.

Example

To understand it, we give a small example of a given function f(x)f(x) and a set of input X={xx[1,1]}\mathcal{X} = \{ x \mid x \in [-1,1]\}. We want to check whether the output of the function f(x)f(x) is always positive for all input xXx \in \mathcal{X}.

This is a very simple example, but it is a basic concept of neural network verification. For the case of neural networks, the function ff is a neural network model and the input xx is a data point. Commonly, the input xx is a high-dimensional vector and the function ff is a deep neural network model. The property we want to check is called a property and the set of input data is called an input domain.

Threat Model: (p,ϵ)(\ell_p, \epsilon)-Adversary

Existing studies on certified robustness mainly aim to defend against white-box evasion attacks, which represent the strongest adversaries who have full knowledge of the target model, including its parameters and architecture. In particular, the adversary carefully crafts a bounded perturbation to the original input, generating an adversarial example to fool the model into making incorrect predictions.

We formally define the (p,ϵ)(\ell_p, \epsilon)-adversary:

Definition: (p,ϵ)(\ell_p, \epsilon)-Adversary

For a given input (x0,y0)(x_0, y_0), where x0Xx_0 \in \mathcal{X} is the input instance and y0y_0 is its true label, the (p,ϵ)(\ell_p, \epsilon)-adversary will generate a perturbed input xBp,ϵ(x0)x \in B_{p,\epsilon}(x_0), such that Fθ(x)y0F_\theta(x) \neq y_0, where:

Bp,ϵ(x0):={x:xx0p<ϵ}B_{p,\epsilon}(x_0) := \{ x : \|x - x_0\|_p < \epsilon \}

is the perturbation region, and ϵ\epsilon is called the perturbation radius.

Common p\ell_p Norms:

\ell_\infty Norm --- Maximum difference. Each dimension can change by at most ϵ\epsilon. For images: each pixel within ϵ\epsilon of original.

2\ell_2 Norm --- Euclidean distance. Total Euclidean distance at most ϵ\epsilon. Spherical perturbation region.

1\ell_1 Norm --- Manhattan distance. Sum of absolute changes at most ϵ\epsilon. Diamond-shaped perturbation region.

Difficulties on NNV

What makes this problem difficult is that

  • the neural network is a high-dimensional non-linear function and
  • the input domain is a high-dimensional space containing infinite points.

So, we cannot just check all points in the input domain by brute force. Instead, we need another analysis method to check the property.

Formal Problem Formulation

We can view robustness verification from an optimization perspective. The goal is to verify whether the model prediction remains correct for all perturbed inputs within the threat region.

Problem: Robustness Verification as Optimization

Given a neural network fθ:XRCf_\theta: \mathcal{X} \to \mathbb{R}^C, input instance x0Xx_0 \in \mathcal{X}, ground-truth label y0y_0, any other label yy', and radius ϵ>0\epsilon > 0, we define:

M(y0,y)=minxBp,ϵ(x0)fθ(x)y0fθ(x)y\mathcal{M}(y_0, y') = \min_{x \in B_{p,\epsilon}(x_0)} f_\theta(x)_{y_0} - f_\theta(x)_{y'}

If M(y0,y)>0\mathcal{M}(y_0, y') > 0 for all yy0y' \neq y_0, then fθf_\theta is certifiably robust at x0x_0 within radius ϵ\epsilon.

Intuitively, this formulation searches for the minimum margin between the model confidence for the true class y0y_0 and any other class yy'. If this margin is always positive, the model will always predict y0y_0 for any perturbed input.

Complete vs. Incomplete Verification:

Complete Verification --- Exactly solves the optimization problem. Soundness (no false positives). Completeness (no false negatives). But NP-complete --- computationally intractable for large networks.

Incomplete Verification --- Conservatively provides a lower bound of M\mathcal{M}. More scalable to large networks. Faster computation. But may be too loose (false negatives possible).

The fundamental challenge is the scalability-tightness trade-off: we can achieve either scalability or tightness, but not both simultaneously.

Approaches for NNV

To solve this problem, sometimes we cannot give a precise answer (what we call complete approaches) or it is computationally too expensive, but an approximate answer (what we call incomplete approaches) is sufficient.

One easy incomplete approach is to use interval arithmetic, but it is not tight enough for deep neural networks, especially with non-linear activations and deep layers. Bound propagation is a more advanced approach than interval arithmetic because it considers symbolic variables rather than just scalar values. Bound propagation has become a popular approach in neural network verification due to its high efficiency and scalability.

Local Robustness for Classification

We focus on the verification of local robustness of neural networks for classification tasks. Local robustness is a property that the output of a neural network does not change its predicted class when the input is perturbed slightly.

For classification tasks, the local robustness is a property that the output of a neural network does not change its class when the input is perturbed slightly. The logits of the correct label should be the highest among all labels for all perturbed input data, i.e., the correct label should be greater than all other labels.

Hint: Local robustness is commonly a property to evaluate the effectiveness and efficiency of a verification tool because the local robustness

  • considers all perturbation of the high-dimensional input and
  • cover all output properties that can be defined by a linear form.

For input perturbation, we commonly use the ϵ\epsilon-ball around the input data xx as the perturbed input data. The ϵ\epsilon-ball is defined as a ll_{\infty} norm ball with radius ϵ\epsilon around the input data xx:

{xxxϵ}.\{ x' \mid \| x' - x \| \leq \epsilon \}.

For a image data, the ll_{\infty} norm is defined as the maximum absolute difference between the pixel values of two images. That is to say each pixel value of the perturbed image is within ϵ\epsilon of the original image.

Then, we can define the property as follows:

f(x)=argmaxifi(x)ji,fi(x)fj(x)>0.f(x) = \arg\max_{i} f_i(x) \quad \Rightarrow \quad \forall j \neq i, f_i(x) - f_j(x) > 0.

This means that the output of the neural network should be the highest for the correct answer and the difference between the correct answer and the other answers should be positive.

Further Reading

This introduction provides a high-level overview of neural network verification. For readers interested in diving deeper, we recommend the following resources organized by topic:

Adversarial Examples and Attacks:

The field of neural network verification was motivated by the discovery of adversarial examples. The FGSM attack and C&W attack demonstrated that deep neural networks are vulnerable to small, carefully crafted perturbations. The PGD attack showed that adversarial training could improve robustness, while other work revealed that many defenses provide only obfuscated gradients rather than true robustness.

Verification Methods:

Modern verification methods aim to provide formal guarantees against adversarial perturbations. Complete verification approaches include Reluplex and MILP-based methods. Incomplete but scalable verification approaches include CROWN, DeepPoly, and scalable verification. SDP-based approaches and Lipschitz-based methods provide alternative relaxation techniques.

Robust Training:

Certified adversarial training combines verification with training to improve certified robustness. Randomized smoothing provides probabilistic certification guarantees with better scalability.

Surveys and Benchmarks:

For comprehensive treatments of the field, see the systematization of knowledge on certified robustness and related surveys on adversarial robustness.

Next Guide

Continue to Threat Models to learn more about p\ell_p norm perturbations and adversarial attack models in detail.