Learn NNV in 3 Minutes

Why Neural Network Verification?

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications, such as image classification and natural language processing. However, despite their wide applications, both traditional ML models and DNNs are vulnerable to adversarial evasion attacks where carefully crafted adversarial examples---inputs with adversarial perturbations---can mislead ML models to make arbitrarily incorrect predictions.

The existence of adversarial attacks leads to great safety concerns for DNN-based applications, especially in safety-critical scenarios such as autonomous driving. While many empirical defenses have been proposed to improve robustness, many can be adaptively attacked again by sophisticated attackers. This everlasting competition between attackers and defenders motivates studies on certifiably robust approaches for DNNs, which include both robustness verification and robust training.

What is Neural Network Verification?

Neural network verification (NNV) is a process to check whether a neural network model satisfies a given property. Robustness verification approaches aim to evaluate DNN robustness by providing a theoretically certified lower bound of robustness under certain perturbation constraints, providing guarantees against any possible future attacks.

Commonly, we want to check whether the output of a neural network satisfies a given property for a specified set of input data. For example, you can check whether the output of a neural network is always positive for all perturbed input data.

Tip: Even though the property to verify may also be about inputs with a given output, we only consider the output for a given input in this document. There have been some researches on the verification of the input-output relation of neural networks, but this is not the primary focus here.

Example

To understand it, we give a small example of a given function $f(x)$ and a set of input $\mathcal{X} = \{ x \mid x \in [-1,1]\}$ . We want to check whether the output of the function $f(x)$ is always positive for all input $x \in \mathcal{X}$ .

This is a very simple example, but it is a basic concept of neural network verification. For the case of neural networks, the function $f$ is a neural network model and the input $x$ is a data point. Commonly, the input $x$ is a high-dimensional vector and the function $f$ is a deep neural network model. The property we want to check is called a property and the set of input data is called an input domain.

Threat Model: $(\ell_p, \epsilon)$ -Adversary

Existing studies on certified robustness mainly aim to defend against white-box evasion attacks, which represent the strongest adversaries who have full knowledge of the target model, including its parameters and architecture. In particular, the adversary carefully crafts a bounded perturbation to the original input, generating an adversarial example to fool the model into making incorrect predictions.

We formally define the $(\ell_p, \epsilon)$ -adversary:

Definition: $(\ell_p, \epsilon)$ -Adversary

For a given input $(x_0, y_0)$ , where $x_0 \in \mathcal{X}$ is the input instance and $y_0$ is its true label, the $(\ell_p, \epsilon)$ -adversary will generate a perturbed input $x \in B_{p,\epsilon}(x_0)$ , such that $F_\theta(x) \neq y_0$ , where:

$B_{p,\epsilon}(x_0) := \{ x : \|x - x_0\|_p < \epsilon \}$

is the perturbation region, and $\epsilon$ is called the perturbation radius.

Common $\ell_p$ Norms:

$\ell_\infty$ Norm --- Maximum difference. Each dimension can change by at most $\epsilon$ . For images: each pixel within $\epsilon$ of original.

$\ell_2$ Norm --- Euclidean distance. Total Euclidean distance at most $\epsilon$ . Spherical perturbation region.

$\ell_1$ Norm --- Manhattan distance. Sum of absolute changes at most $\epsilon$ . Diamond-shaped perturbation region.

Difficulties on NNV

What makes this problem difficult is that

the neural network is a high-dimensional non-linear function and
the input domain is a high-dimensional space containing infinite points.

So, we cannot just check all points in the input domain by brute force. Instead, we need another analysis method to check the property.

Formal Problem Formulation

We can view robustness verification from an optimization perspective. The goal is to verify whether the model prediction remains correct for all perturbed inputs within the threat region.

Problem: Robustness Verification as Optimization

Given a neural network $f_\theta: \mathcal{X} \to \mathbb{R}^C$ , input instance $x_0 \in \mathcal{X}$ , ground-truth label $y_0$ , any other label $y'$ , and radius $\epsilon > 0$ , we define:

$\mathcal{M}(y_0, y') = \min_{x \in B_{p,\epsilon}(x_0)} f_\theta(x)_{y_0} - f_\theta(x)_{y'}$

If $\mathcal{M}(y_0, y') > 0$ for all $y' \neq y_0$ , then $f_\theta$ is certifiably robust at $x_0$ within radius $\epsilon$ .

Intuitively, this formulation searches for the minimum margin between the model confidence for the true class $y_0$ and any other class $y'$ . If this margin is always positive, the model will always predict $y_0$ for any perturbed input.

Complete vs. Incomplete Verification:

Complete Verification --- Exactly solves the optimization problem. Soundness (no false positives). Completeness (no false negatives). But NP-complete --- computationally intractable for large networks.

Incomplete Verification --- Conservatively provides a lower bound of $\mathcal{M}$ . More scalable to large networks. Faster computation. But may be too loose (false negatives possible).

The fundamental challenge is the scalability-tightness trade-off: we can achieve either scalability or tightness, but not both simultaneously.

Approaches for NNV

To solve this problem, sometimes we cannot give a precise answer (what we call complete approaches) or it is computationally too expensive, but an approximate answer (what we call incomplete approaches) is sufficient.

One easy incomplete approach is to use interval arithmetic, but it is not tight enough for deep neural networks, especially with non-linear activations and deep layers. Bound propagation is a more advanced approach than interval arithmetic because it considers symbolic variables rather than just scalar values. Bound propagation has become a popular approach in neural network verification due to its high efficiency and scalability.

Local Robustness for Classification

We focus on the verification of local robustness of neural networks for classification tasks. Local robustness is a property that the output of a neural network does not change its predicted class when the input is perturbed slightly.

For classification tasks, the local robustness is a property that the output of a neural network does not change its class when the input is perturbed slightly. The logits of the correct label should be the highest among all labels for all perturbed input data, i.e., the correct label should be greater than all other labels.

Hint: Local robustness is commonly a property to evaluate the effectiveness and efficiency of a verification tool because the local robustness

considers all perturbation of the high-dimensional input and

cover all output properties that can be defined by a linear form.

For input perturbation, we commonly use the $\epsilon$ -ball around the input data $x$ as the perturbed input data. The $\epsilon$ -ball is defined as a $l_{\infty}$ norm ball with radius $\epsilon$ around the input data $x$ :

$\{ x' \mid \| x' - x \| \leq \epsilon \}.$

For a image data, the $l_{\infty}$ norm is defined as the maximum absolute difference between the pixel values of two images. That is to say each pixel value of the perturbed image is within $\epsilon$ of the original image.

Then, we can define the property as follows:

$f(x) = \arg\max_{i} f_i(x) \quad \Rightarrow \quad \forall j \neq i, f_i(x) - f_j(x) > 0.$

This means that the output of the neural network should be the highest for the correct answer and the difference between the correct answer and the other answers should be positive.