The Verification Problem: Mathematical Formulation

The Verification Problem: Mathematical Formulation¶

January 2026 NNV Guide 7 min read 1,454 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks. It covers the mathematical formulation of the verification problem, NP-completeness results, and fundamental complexity analysis.

Foundations Problem Formulation Mathematics

Neural network verification can feel abstract—talking about “proving properties” and “certified robustness” without clear mathematical grounding. But verification has a precise mathematical formulation. Understanding this formulation clarifies what verification means, why it’s hard, and how different methods approach the problem.

This guide presents the formal mathematical foundations of neural network verification: problem definitions, complexity results, and the fundamental tradeoffs that shape verification research and practice.

The Core Verification Problem¶

At its heart, verification asks: does a property hold for all inputs in a specified region?

Formal Problem Statement

Given:

A neural network \(f_\theta: \mathbb{R}^n \to \mathbb{R}^m\) with parameters \(\theta\)
An input region \(\mathcal{X} \subseteq \mathbb{R}^n\)
A property \(\phi: \mathbb{R}^m \to \{\text{true}, \text{false}\}\) The verification problem is to determine:

\[\forall x \in \mathcal{X}, \quad \phi(f_\theta(x)) = \text{true}\]

The verifier must return one of three answers:

VERIFIED: The property holds for all \(x \in \mathcal{X}\)
FALSIFIED: A counterexample \(x^* \in \mathcal{X}\) exists where \(\phi(f_\theta(x^*)) = \text{false}\)
UNKNOWN: Unable to determine (for incomplete methods)

This formulation is general. Different instantiations of \(\mathcal{X}\) and \(\phi\) yield different verification problems. The challenge is computing this universal quantification (\(\forall x \in \mathcal{X}\)) efficiently.

Robustness Verification as Optimization¶

The most common verification problem—robustness verification—can be formulated as an optimization problem [Weng et al., 2018].

Local Robustness: Given an input \(x_0\) with ground-truth label \(y_0\), verify that all inputs within a perturbation ball maintain the same classification:

\[\forall x \in B_{p,\epsilon}(x_0), \quad \arg\max_c f_\theta(x)_c = y_0\]

This universal quantification is equivalent to solving:

\[\min_{x \in B_{p,\epsilon}(x_0)} \quad f_\theta(x)_{y_0} - \max_{c \neq y_0} f_\theta(x)_c\]

If the minimum value is positive, the network maintains classification for all \(x \in B_{p,\epsilon}(x_0)\). If negative, a counterexample exists.

Margin-based formulation: For a given pair of classes \((y_0, y')\), define the margin optimization problem:

Problem: Robustness Verification as Optimization

Given a neural network \(f_\theta: \mathcal{X} \to \mathbb{R}^C\), input instance \(x_0 \in \mathcal{X}\), ground-truth label \(y_0 \in [C]\), any other label \(y' \in [C]\), and radius \(\epsilon > 0\):

\[\mathcal{M}(y_0,y') = \min_{x}\, f_\theta(x)_{y_0} - f_\theta(x)_{y'} \quad \text{ s.t. } \quad x \in B_{p,\epsilon}(x_0)\]

If \(\mathcal{M}(y_0, y') > 0\) for all \(y' \neq y_0\), the network is verified robust at \(x_0\).

The margin \(\mathcal{M}(y_0, y')\) represents the minimum confidence difference between the true class \(y_0\) and alternative class \(y'\) over all perturbed inputs. For robustness, we need \(\mathcal{M}(y_0, y') > 0\) for every \(y' \neq y_0\).

Computational challenge: This optimization is non-convex (neural networks are non-convex functions) and high-dimensional. Finding the global minimum is hard—this is the source of verification’s computational difficulty.

Computational Complexity¶

The hardness of verification isn’t just empirical—it’s fundamentally hard in a computational complexity sense.

NP-Completeness: Verifying properties of ReLU neural networks is NP-complete [Katz et al., 2017]. This means that unless P=NP, no polynomial-time algorithm exists that solves all verification instances.

Inapproximability: Even approximating the certified robustness radius within any constant factor is NP-hard [Weng et al., 2018]. You can’t even efficiently approximate how robust a network is, let alone verify exact robustness.

Complexity Implications

The NP-completeness result [Katz et al., 2017, Weng et al., 2018] has profound implications:

Complete verification methods (those always returning verified/falsified) have worst-case exponential time
Scalability-tightness tradeoff is fundamental—you can’t have both perfect tightness and polynomial-time verification
Incomplete methods (accepting “unknown”) are necessary for practical scalability

This isn’t an engineering limitation—it’s a mathematical fact. All verification methods face these barriers.

Why it’s hard: The difficulty stems from ReLU’s piecewise-linear nature. A ReLU network with \(k\) ReLU activations partitions input space into up to \(2^k\) linear regions. Verification must reason about behavior across all these regions—exponentially many regions lead to exponential complexity.

Complete vs Incomplete Verification¶

The computational hardness forces a fundamental choice: complete methods that always give definitive answers but take exponential time, or incomplete methods that run in polynomial time but sometimes return “unknown.”

Complete Verification Methods¶

Definition: A complete verification method always returns either VERIFIED or FALSIFIED (with counterexample). It never returns UNKNOWN.

Approaches:

SMT-based [Katz et al., 2017, Katz et al., 2019]: Encode the network as logical constraints, use SMT solvers
MILP-based [Tjeng et al., 2019]: Encode as mixed-integer linear programming, use MILP solvers
Branch-and-bound [Wang et al., 2021]: Partition input space, verify each partition

Cost: Worst-case exponential time. For large networks, complete methods often timeout or exhaust memory.

When to use: Small networks, critical properties requiring absolute certainty, when computational budget allows exhaustive search.

Incomplete Verification Methods¶

Definition: An incomplete verification method may return VERIFIED, FALSIFIED, or UNKNOWN. When it returns UNKNOWN, the true answer is uncertain.

Approaches:

Bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]: Propagate bounds through layers, compute output ranges
Abstract interpretation [Singh et al., 2019, Gehr et al., 2018]: Use abstract domains (intervals, zonotopes, polyhedra) to approximate network behavior
Linear programming relaxation [Salman et al., 2019]: Relax non-convex problem to convex LP

Cost: Polynomial time. Scales to large networks.

Tradeoff: Soundness vs completeness. Incomplete methods are sound (if they say VERIFIED, it’s correct) but incomplete (they might say UNKNOWN when the property actually holds).

When to use: Large networks, rapid iteration, when accepting “unknown” is tolerable.

Table 14 Complete vs Incomplete Methods¶
Method Type	Time Complexity	Scalability	Answers	Guarantee
Complete (SMT, MILP, B&B)	Exponential (worst-case)	Small networks	Verified / Falsified	Definitive
Incomplete (Bound prop, LP)	Polynomial	Large networks	Verified / Falsified / Unknown	Sound but incomplete

Sound vs Unsound Approximations¶

Verification methods that use approximations must maintain soundness: correct answers even when using approximations.

Sound approximation: If the approximation says “verified,” the true answer is “verified.” Approximations can be conservative (saying “unknown” when the true answer is “verified”) but never incorrect.

Unsound approximation: Might say “verified” when the true answer is “falsified.” This is unacceptable for formal verification—you’re providing false guarantees.

\[ \begin{align}\begin{aligned}\text{Sound: } \quad \text{Approx-Verified} \Rightarrow \text{True-Verified}\\\text{Unsound: } \quad \text{Approx-Verified} \not\Rightarrow \text{True-Verified}\end{aligned}\end{align} \]

All formal verification methods must be sound. Testing and adversarial attacks cannot provide verification guarantees—failing to find counterexamples doesn’t prove robustness. However, adversarial attacks are sound falsifiers: when they find a counterexample, it’s real and demonstrates the property is violated. Formal verifiers use sound approximations—they might be loose (incomplete), but they’re never incorrect.

Bound Computation as Core Primitive¶

Most verification methods reduce to computing bounds on network outputs given input constraints.

Problem: Given input constraints \(x \in \mathcal{X}\), compute bounds on output:

\[\underline{y}_i \leq f_\theta(x)_i \leq \overline{y}_i \quad \forall x \in \mathcal{X}\]

Tighter bounds enable stronger verification. Loose bounds lead to “unknown” results.

Layer-by-layer propagation: For feedforward networks, compute bounds layer by layer:

\[\begin{split}\begin{aligned} \text{Layer } \ell: \quad & \underline{z}^{(\ell)} \leq W^{(\ell)} h^{(\ell-1)} + b^{(\ell)} \leq \overline{z}^{(\ell)} \\ & \underline{h}^{(\ell)} \leq \sigma(\underline{z}^{(\ell)}, \overline{z}^{(\ell)}) \leq \overline{h}^{(\ell)} \end{aligned}\end{split}\]

Different methods compute these bounds differently:

Interval Bound Propagation [Gowal et al., 2019]: Propagate intervals, very fast but loose
CROWN [Weng et al., 2018, Zhang et al., 2018]: Backward linear bound propagation, tighter
DeepPoly [Singh et al., 2019]: Abstract interpretation with polyhedra, tight and efficient

The tightness-speed tradeoff is fundamental: tighter bounds require more computation.

Certified Radius Computation¶

Rather than verifying a fixed \(\epsilon\) , you might ask: what’s the maximum \(\epsilon\) for which robustness holds?**

\[\epsilon_{\text{cert}} = \max \{\epsilon : \forall x \in B_{p,\epsilon}(x_0), \arg\max_c f_\theta(x)_c = y_0\}\]

This certified radius quantifies robustness. Larger certified radius means stronger robustness.

Binary search: Most methods compute certified radius via binary search over \(\epsilon\):

Start with \(\epsilon_{\min}, \epsilon_{\max}\)
Test \(\epsilon_{\text{mid}} = (\epsilon_{\min} + \epsilon_{\max})/2\)
If verified, increase \(\epsilon_{\min} = \epsilon_{\text{mid}}\); else decrease \(\epsilon_{\max} = \epsilon_{\text{mid}}\)
Repeat until convergence

Direct computation: Some methods (randomized smoothing [Cohen et al., 2019]) compute certified radius directly without binary search.

Probabilistic Certification¶

An alternative to deterministic verification is probabilistic certification [Cohen et al., 2019]: prove that properties hold with high probability under randomness.

Randomized smoothing provides probabilistic robustness guarantees:

\[g(x) = \arg\max_c \mathbb{P}_{\epsilon \sim \mathcal{N}(0, \sigma^2 I)}[f_\theta(x + \epsilon) = c]\]

The smoothed classifier \(g\) has a provable certified radius \(r\) derived from statistical estimates. The guarantee is: with high probability, \(g(x') = g(x)\) for all \(\|x' - x\|_2 \leq r\).

Difference from deterministic verification: The radius \(r\) is deterministically certified based on statistical estimates. The randomness is in estimation, not in the final guarantee.

Scalability Barriers and Solutions¶

Verification scales poorly with network size and input dimension. Understanding why helps appreciate current methods.

Curse of dimensionality: For an input space with dimension \(n\), a complete method partitioning the space requires \(O(k^n)\) partitions for resolution \(k\). This exponential growth is unavoidable for complete methods.

Accumulation of approximation error: In bound propagation, errors accumulate through layers. A network with \(L\) layers might have approximation error \(O(L \cdot \epsilon_{\text{layer}})\), growing linearly with depth.

Non-linearity complexity: Each ReLU introduces a non-linear decision. With \(k\) ReLUs, there are up to \(2^k\) linear regions. Reasoning about all regions requires exponential effort.

Solutions:

Incomplete methods: Accept “unknown” to achieve polynomial time
GPU acceleration [Xu et al., 2020]: Parallelize bound computation
Hybrid methods [Wang et al., 2021, Zhang et al., 2022]: Use incomplete methods for initial pruning, complete methods for refinement
Architecture-aware methods: Exploit specific architectures (CNNs, residual networks) for tighter bounds

Final Thoughts¶

The verification problem has a clean mathematical formulation, but solving it is fundamentally hard. NP-completeness [Katz et al., 2017, Weng et al., 2018] establishes that no efficient general algorithm exists for complete verification. This forces the field toward incomplete methods [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018], which scale by accepting “unknown” results, and hybrid approaches [Wang et al., 2021, Zhang et al., 2022], which balance completeness and efficiency.

Understanding the mathematical foundations clarifies what’s possible and what’s not. It explains why verification is hard, why tradeoffs exist, and why different methods make different choices. Armed with this understanding, you can choose verification methods intelligently and interpret results correctly.

Next Guide

Continue to Soundness and Completeness to understand the fundamental properties of soundness and completeness in verification methods.

[1] (1,2,3)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[2] (1,2)

Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI²: safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), 3–18. IEEE, 2018.

[3] (1,2,3,4)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[4] (1,2,3,4,5,6)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[5] (1,2)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[6] (1,2)

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[7] (1,2,3,4,5)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[8] (1,2)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[9] (1,2,3,4,5)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[10] (1,2,3,4,5,6,7,8,9)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[11] (1,2)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[12] (1,2,3,4)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[13] (1,2,3,4)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.