The Verification Problem: Mathematical Formulation

Neural network verification can feel abstract—talking about “proving properties” and “certified robustness” without clear mathematical grounding. But verification has a precise mathematical formulation. Understanding this formulation clarifies what verification means, why it’s hard, and how different methods approach the problem.

This guide presents the formal mathematical foundations of neural network verification: problem definitions, complexity results, and the fundamental tradeoffs that shape verification research and practice.

The Core Verification Problem

At its heart, verification asks: does a property hold for all inputs in a specified region?

Formal Problem Statement

Given:

  • A neural network \(f_\theta: \mathbb{R}^n \to \mathbb{R}^m\) with parameters \(\theta\)

  • An input region \(\mathcal{X} \subseteq \mathbb{R}^n\)

  • A property \(\phi: \mathbb{R}^m \to \{\text{true}, \text{false}\}\) The verification problem is to determine:

\[\forall x \in \mathcal{X}, \quad \phi(f_\theta(x)) = \text{true}\]

The verifier must return one of three answers:

  • VERIFIED: The property holds for all \(x \in \mathcal{X}\)

  • FALSIFIED: A counterexample \(x^* \in \mathcal{X}\) exists where \(\phi(f_\theta(x^*)) = \text{false}\)

  • UNKNOWN: Unable to determine (for incomplete methods)

This formulation is general. Different instantiations of \(\mathcal{X}\) and \(\phi\) yield different verification problems. The challenge is computing this universal quantification (\(\forall x \in \mathcal{X}\)) efficiently.

Robustness Verification as Optimization

The most common verification problem—robustness verification—can be formulated as an optimization problem [Weng et al., 2018].

Local Robustness: Given an input \(x_0\) with ground-truth label \(y_0\), verify that all inputs within a perturbation ball maintain the same classification:

\[\forall x \in B_{p,\epsilon}(x_0), \quad \arg\max_c f_\theta(x)_c = y_0\]

This universal quantification is equivalent to solving:

\[\min_{x \in B_{p,\epsilon}(x_0)} \quad f_\theta(x)_{y_0} - \max_{c \neq y_0} f_\theta(x)_c\]

If the minimum value is positive, the network maintains classification for all \(x \in B_{p,\epsilon}(x_0)\). If negative, a counterexample exists.

Margin-based formulation: For a given pair of classes \((y_0, y')\), define the margin optimization problem:

Problem: Robustness Verification as Optimization

Given a neural network \(f_\theta: \mathcal{X} \to \mathbb{R}^C\), input instance \(x_0 \in \mathcal{X}\), ground-truth label \(y_0 \in [C]\), any other label \(y' \in [C]\), and radius \(\epsilon > 0\):

\[\mathcal{M}(y_0,y') = \min_{x}\, f_\theta(x)_{y_0} - f_\theta(x)_{y'} \quad \text{ s.t. } \quad x \in B_{p,\epsilon}(x_0)\]

If \(\mathcal{M}(y_0, y') > 0\) for all \(y' \neq y_0\), the network is verified robust at \(x_0\).

The margin \(\mathcal{M}(y_0, y')\) represents the minimum confidence difference between the true class \(y_0\) and alternative class \(y'\) over all perturbed inputs. For robustness, we need \(\mathcal{M}(y_0, y') > 0\) for every \(y' \neq y_0\).

Computational challenge: This optimization is non-convex (neural networks are non-convex functions) and high-dimensional. Finding the global minimum is hard—this is the source of verification’s computational difficulty.

Computational Complexity

The hardness of verification isn’t just empirical—it’s fundamentally hard in a computational complexity sense.

NP-Completeness: Verifying properties of ReLU neural networks is NP-complete [Katz et al., 2017]. This means that unless P=NP, no polynomial-time algorithm exists that solves all verification instances.

Inapproximability: Even approximating the certified robustness radius within any constant factor is NP-hard [Weng et al., 2018]. You can’t even efficiently approximate how robust a network is, let alone verify exact robustness.

Complexity Implications

The NP-completeness result [Katz et al., 2017, Weng et al., 2018] has profound implications:

  • Complete verification methods (those always returning verified/falsified) have worst-case exponential time

  • Scalability-tightness tradeoff is fundamental—you can’t have both perfect tightness and polynomial-time verification

  • Incomplete methods (accepting “unknown”) are necessary for practical scalability

This isn’t an engineering limitation—it’s a mathematical fact. All verification methods face these barriers.

Why it’s hard: The difficulty stems from ReLU’s piecewise-linear nature. A ReLU network with \(k\) ReLU activations partitions input space into up to \(2^k\) linear regions. Verification must reason about behavior across all these regions—exponentially many regions lead to exponential complexity.

Complete vs Incomplete Verification

The computational hardness forces a fundamental choice: complete methods that always give definitive answers but take exponential time, or incomplete methods that run in polynomial time but sometimes return “unknown.”

Complete Verification Methods

Definition: A complete verification method always returns either VERIFIED or FALSIFIED (with counterexample). It never returns UNKNOWN.

Approaches:

Cost: Worst-case exponential time. For large networks, complete methods often timeout or exhaust memory.

When to use: Small networks, critical properties requiring absolute certainty, when computational budget allows exhaustive search.

Incomplete Verification Methods

Definition: An incomplete verification method may return VERIFIED, FALSIFIED, or UNKNOWN. When it returns UNKNOWN, the true answer is uncertain.

Approaches:

Cost: Polynomial time. Scales to large networks.

Tradeoff: Soundness vs completeness. Incomplete methods are sound (if they say VERIFIED, it’s correct) but incomplete (they might say UNKNOWN when the property actually holds).

When to use: Large networks, rapid iteration, when accepting “unknown” is tolerable.

Table 14 Complete vs Incomplete Methods

Method Type

Time Complexity

Scalability

Answers

Guarantee

Complete (SMT, MILP, B&B)

Exponential (worst-case)

Small networks

Verified / Falsified

Definitive

Incomplete (Bound prop, LP)

Polynomial

Large networks

Verified / Falsified / Unknown

Sound but incomplete

Sound vs Unsound Approximations

Verification methods that use approximations must maintain soundness: correct answers even when using approximations.

Sound approximation: If the approximation says “verified,” the true answer is “verified.” Approximations can be conservative (saying “unknown” when the true answer is “verified”) but never incorrect.

Unsound approximation: Might say “verified” when the true answer is “falsified.” This is unacceptable for formal verification—you’re providing false guarantees.

\[ \begin{align}\begin{aligned}\text{Sound: } \quad \text{Approx-Verified} \Rightarrow \text{True-Verified}\\\text{Unsound: } \quad \text{Approx-Verified} \not\Rightarrow \text{True-Verified}\end{aligned}\end{align} \]

All formal verification methods must be sound. Testing and adversarial attacks cannot provide verification guarantees—failing to find counterexamples doesn’t prove robustness. However, adversarial attacks are sound falsifiers: when they find a counterexample, it’s real and demonstrates the property is violated. Formal verifiers use sound approximations—they might be loose (incomplete), but they’re never incorrect.

Bound Computation as Core Primitive

Most verification methods reduce to computing bounds on network outputs given input constraints.

Problem: Given input constraints \(x \in \mathcal{X}\), compute bounds on output:

\[\underline{y}_i \leq f_\theta(x)_i \leq \overline{y}_i \quad \forall x \in \mathcal{X}\]

Tighter bounds enable stronger verification. Loose bounds lead to “unknown” results.

Layer-by-layer propagation: For feedforward networks, compute bounds layer by layer:

\[\begin{split}\begin{aligned} \text{Layer } \ell: \quad & \underline{z}^{(\ell)} \leq W^{(\ell)} h^{(\ell-1)} + b^{(\ell)} \leq \overline{z}^{(\ell)} \\ & \underline{h}^{(\ell)} \leq \sigma(\underline{z}^{(\ell)}, \overline{z}^{(\ell)}) \leq \overline{h}^{(\ell)} \end{aligned}\end{split}\]

Different methods compute these bounds differently:

The tightness-speed tradeoff is fundamental: tighter bounds require more computation.

Certified Radius Computation

Rather than verifying a fixed \(\epsilon\) , you might ask: what’s the maximum \(\epsilon\) for which robustness holds?**

\[\epsilon_{\text{cert}} = \max \{\epsilon : \forall x \in B_{p,\epsilon}(x_0), \arg\max_c f_\theta(x)_c = y_0\}\]

This certified radius quantifies robustness. Larger certified radius means stronger robustness.

Binary search: Most methods compute certified radius via binary search over \(\epsilon\):

  1. Start with \(\epsilon_{\min}, \epsilon_{\max}\)

  2. Test \(\epsilon_{\text{mid}} = (\epsilon_{\min} + \epsilon_{\max})/2\)

  3. If verified, increase \(\epsilon_{\min} = \epsilon_{\text{mid}}\); else decrease \(\epsilon_{\max} = \epsilon_{\text{mid}}\)

  4. Repeat until convergence

Direct computation: Some methods (randomized smoothing [Cohen et al., 2019]) compute certified radius directly without binary search.

Probabilistic Certification

An alternative to deterministic verification is probabilistic certification [Cohen et al., 2019]: prove that properties hold with high probability under randomness.

Randomized smoothing provides probabilistic robustness guarantees:

\[g(x) = \arg\max_c \mathbb{P}_{\epsilon \sim \mathcal{N}(0, \sigma^2 I)}[f_\theta(x + \epsilon) = c]\]

The smoothed classifier \(g\) has a provable certified radius \(r\) derived from statistical estimates. The guarantee is: with high probability, \(g(x') = g(x)\) for all \(\|x' - x\|_2 \leq r\).

Difference from deterministic verification: The radius \(r\) is deterministically certified based on statistical estimates. The randomness is in estimation, not in the final guarantee.

Scalability Barriers and Solutions

Verification scales poorly with network size and input dimension. Understanding why helps appreciate current methods.

Curse of dimensionality: For an input space with dimension \(n\), a complete method partitioning the space requires \(O(k^n)\) partitions for resolution \(k\). This exponential growth is unavoidable for complete methods.

Accumulation of approximation error: In bound propagation, errors accumulate through layers. A network with \(L\) layers might have approximation error \(O(L \cdot \epsilon_{\text{layer}})\), growing linearly with depth.

Non-linearity complexity: Each ReLU introduces a non-linear decision. With \(k\) ReLUs, there are up to \(2^k\) linear regions. Reasoning about all regions requires exponential effort.

Solutions:

  • Incomplete methods: Accept “unknown” to achieve polynomial time

  • GPU acceleration [Xu et al., 2020]: Parallelize bound computation

  • Hybrid methods [Wang et al., 2021, Zhang et al., 2022]: Use incomplete methods for initial pruning, complete methods for refinement

  • Architecture-aware methods: Exploit specific architectures (CNNs, residual networks) for tighter bounds

Final Thoughts

The verification problem has a clean mathematical formulation, but solving it is fundamentally hard. NP-completeness [Katz et al., 2017, Weng et al., 2018] establishes that no efficient general algorithm exists for complete verification. This forces the field toward incomplete methods [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018], which scale by accepting “unknown” results, and hybrid approaches [Wang et al., 2021, Zhang et al., 2022], which balance completeness and efficiency.

Understanding the mathematical foundations clarifies what’s possible and what’s not. It explains why verification is hard, why tradeoffs exist, and why different methods make different choices. Armed with this understanding, you can choose verification methods intelligently and interpret results correctly.

Further Reading

This guide provides the mathematical foundations of the neural network verification problem. For readers interested in diving deeper, we recommend the following resources organized by topic:

Fundamental Complexity:

The NP-completeness of neural network verification was proven for ReLU networks [Katz et al., 2017], establishing that complete verification has worst-case exponential complexity. This result was strengthened to show that even approximating the certified robustness radius is NP-hard [Weng et al., 2018], explaining the fundamental scalability-tightness tradeoff all verification methods face.

Complete Verification Methods:

Complete verification methods always return definitive answers. SMT-based approaches [Katz et al., 2017, Katz et al., 2019] encode networks as logical constraints. MILP-based methods [Tjeng et al., 2019] formulate verification as mixed-integer programming. Branch-and-bound approaches [Wang et al., 2021, Zhang et al., 2022] partition the input space and verify each partition. All complete methods face exponential worst-case complexity due to NP-completeness.

Incomplete Verification Methods:

To achieve scalability, incomplete methods accept “unknown” results. Bound propagation approaches [Gowal et al., 2019, Weng et al., 2018, Zhang et al., 2018] compute output bounds layer-by-layer with varying tightness. Abstract interpretation [Singh et al., 2019, Gehr et al., 2018] uses abstract domains to approximate network behavior soundly. Linear programming relaxations [Salman et al., 2019] relax non-convex verification to convex optimization.

Probabilistic Certification:

Randomized smoothing [Cohen et al., 2019] provides an alternative formulation: instead of deterministic verification, prove probabilistic guarantees through statistical estimation. This approach scales to large networks where deterministic verification becomes intractable.

Hybrid Approaches:

State-of-the-art verifiers [Wang et al., 2021, Zhang et al., 2022] combine incomplete and complete methods: using fast incomplete verification for initial bounds, then applying branch-and-bound selectively where needed. GPU acceleration [Xu et al., 2020] enables practical implementation of these hybrid strategies.

Related Topics:

For understanding how to specify properties formally in the verification problem, see Formal Specifications. For the distinction between sound and complete verification, see Soundness and Completeness. For practical verification workflows that instantiate this mathematical formulation, see Robustness Testing Guide.

Next Guide

Continue to Soundness and Completeness to understand the fundamental properties of soundness and completeness in verification methods.

[1] (1,2,3)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[2] (1,2)

Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI²: safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), 3–18. IEEE, 2018.

[3] (1,2,3,4)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[4] (1,2,3,4,5,6)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[5] (1,2)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[6] (1,2)

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[7] (1,2,3,4,5)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[8] (1,2)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[9] (1,2,3,4,5)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[10] (1,2,3,4,5,6,7,8,9)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[11] (1,2)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[12] (1,2,3,4)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[13] (1,2,3,4)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.