Theoretical Barriers in Neural Network Verification

Theoretical Barriers in Neural Network Verification¶

January 2026 NNV Guide 14 min read 2,885 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks and Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective. It covers NP-completeness results, the convex barrier theorem, and tightness hierarchies from both papers.

Foundations Complexity Limitations

Verification methods keep improving—faster solvers, tighter bounds, better heuristics. But fundamental limits constrain what’s achievable. Neural network verification faces mathematical barriers: NP-completeness means exponential worst-case complexity, approximation hardness means you can’t even efficiently approximate certified robustness, and inherent tradeoffs between accuracy and robustness limit what’s trainable.

Understanding these barriers clarifies what verification can and cannot achieve. They’re not engineering limitations—they’re mathematical facts. No algorithmic breakthrough will make complete verification polynomial-time (unless P=NP). No clever trick will eliminate the accuracy-robustness tradeoff.

This guide explores the fundamental theoretical barriers in neural network verification: what they are, why they exist, and what they mean for research and practice.

NP-Completeness of Verification¶

The foundational barrier: verifying properties of ReLU neural networks is NP-complete [Katz et al., 2017].

The Verification Problem¶

Decision problem: Given a neural network \(f_\theta\), input region \(\mathcal{X}\), and property \(\phi\), does \(\forall x \in \mathcal{X}, \phi(f_\theta(x)) = \text{true}\)?

This is NP-complete for ReLU networks [Katz et al., 2017]. Specifically, the problem of determining whether a ReLU network maintains a specific classification on all inputs in a polytope is NP-complete.

What NP-Completeness Means¶

Formal statement: Verification is at least as hard as any problem in NP. No known polynomial-time algorithm exists, and if one did, it would prove P=NP (resolving a millennium problem).

Practical implications:

Worst-case exponential time: Any complete verification algorithm must take exponential time in the worst case
No efficient general algorithm: Can’t develop a verifier that runs in polynomial time for all instances
Heuristics necessary: Must use approximations, relaxations, or instance-specific optimizations
Fundamental barrier: Not an engineering limitation but mathematical impossibility (assuming P≠NP)

Why Verification is NP-Complete

Reduction from satisfiability: The proof [Katz et al., 2017] reduces Boolean satisfiability (SAT) to network verification.

ReLU creates decisions: Each ReLU activation creates a binary choice (active or inactive). With \(k\) ReLUs, there are \(2^k\) possible activation patterns.

Exponential combinations: Determining which pattern occurs for inputs in a region requires exploring exponentially many cases in the worst case.

Result: Verification inherits the hardness of combinatorial optimization.

Implications for Verification Methods¶

Complete methods must be exponential: Marabou [Katz et al., 2019], MILP [Tjeng et al., 2019], and branch-and-bound [Wang et al., 2021] all have exponential worst-case complexity. This is unavoidable.

Incomplete methods sacrifice completeness: CROWN [Weng et al., 2018, Zhang et al., 2018], IBP [Gowal et al., 2019], and other polynomial-time methods avoid NP-completeness by accepting “unknown” results. They trade completeness for efficiency.

No breakthrough coming: Unless P=NP (extremely unlikely), no polynomial-time complete verifier will ever exist. Research focuses on:

Better average-case performance (even if worst-case remains exponential)
Tighter incomplete methods
Identifying tractable special cases

Table 17 Complexity Hierarchy¶
Problem	Complexity	Implication
Verification (ReLU networks)	NP-complete	Exponential worst-case
Linear program solving	P (polynomial)	Efficient
Semi-definite program	P (polynomial)	Efficient (but high degree)
Mixed-integer program	NP-complete	Exponential worst-case
Boolean satisfiability	NP-complete	Exponential worst-case

Approximation Hardness¶

Can’t solve exactly? What about approximation?

Bad news: Even approximating the certified robustness radius within any constant factor is NP-hard [Weng et al., 2018].

The Approximation Problem¶

Certified radius: For input \(x_0\) with true label \(y_0\), the certified radius is:

\[r_{\text{cert}}(x_0) = \sup \{ r : \forall x \in \mathcal{B}_r(x_0), \arg\max_c f_\theta(x)_c = y_0 \}\]

Approximation question: Can we efficiently compute \(\tilde{r}\) such that:

\[\frac{1}{c} r_{\text{cert}} \leq \tilde{r} \leq r_{\text{cert}}\]

for some constant \(c\)?

Answer: No (unless P=NP). The problem of approximating certified radius within any constant factor is NP-hard [Weng et al., 2018].

What This Means¶

Can’t even approximate: Not only is finding the exact certified radius hard, even getting a constant-factor approximation is hard.

All methods are conservative: IBP [Gowal et al., 2019], CROWN [Zhang et al., 2018], SDP [Raghunathan et al., 2018]—all provide lower bounds on certified radius. They might be arbitrarily loose.

No guarantee of tightness: Even if a method seems tight on benchmarks, no theoretical guarantee says it’s within any factor of optimal.

Practical impact: The gap between what verification methods certify and true robustness may be large, with no efficient way to close it.

Curse of Dimensionality¶

Verification difficulty grows exponentially with input dimension—the classic curse of dimensionality.

Volume of High-Dimensional Balls¶

\(\ell_\infty\) ball volume: For an \(\ell_\infty\) ball with radius \(\epsilon\) in \(d\) dimensions:

\[\text{Vol}(\mathcal{B}_\infty^\epsilon) = (2\epsilon)^d\]

This grows exponentially with dimension \(d\).

Implication: For images (e.g., CIFAR-10 with \(d = 32 \times 32 \times 3 = 3072\)), an \(\ell_\infty\) ball contains an astronomical number of points. Complete verification must reason about all of them.

Activation Pattern Explosion¶

ReLU activation patterns: A network with \(k\) ReLU activations has up to \(2^k\) distinct linear regions.

For deep networks: A typical ResNet might have 100K ReLU activations. That’s \(2^{100000}\) potential patterns—more than atoms in the universe.

Combinatorial explosion: Reasoning about which patterns are possible for inputs in a region requires exploring this vast space. Complete methods must systematically search; incomplete methods approximate.

Sampling is Insufficient¶

Random sampling: For a ball of volume \(V = (2\epsilon)^d\), uniformly sampling \(N\) points covers fraction \(N/V\) of the space.

For high \(d\): Even billions of samples cover negligible fraction. Testing on \(10^9\) samples in a 3072-dimensional \(\ell_\infty\) ball with \(\epsilon = 8/255\) covers roughly \(10^{9} / (2 \cdot 8/255)^{3072} \approx 0\) of the space.

Conclusion: Empirical testing can’t provide guarantees in high dimensions. Formal verification is necessary but faces exponential difficulty.

The Convex Barrier: Fundamental Limit of Linear Relaxation¶

Beyond computational complexity, there exist mathematical limits on how tight linear approximations of activation functions can be. The convex barrier theorem establishes a fundamental precision ceiling for single-neuron verification methods.

The Triangle Relaxation for ReLU¶

The triangle relaxation [Ehlers, 2017] is a fundamental technique for convex approximation of ReLU activations. For ReLU activation \(a_i = \max(0, x_i)\) with input bounds \(x_i \in [l, u]\) where \(l < 0 < u\) (the crossing or unstable case), the standard linear relaxation uses:

\[\begin{split}\begin{aligned} \text{Lower bound: } & a_i \geq 0 \\ \text{Upper bound: } & a_i \leq \frac{u}{u - l}(x_i - l) \end{aligned}\end{split}\]

The upper bound is the secant line connecting points \((l, 0)\) and \((u, u)\), forming a triangular region between the bounds and the true ReLU function.

Theorem (Convex Barrier for ReLU): The triangle relaxation above is the optimal convex hull of the ReLU function over the interval \([l, u]\) with \(l < 0 < u\). No sound single-neuron linear relaxation can produce tighter bounds without sacrificing convexity or soundness.

Proof Sketch:

Convex hull construction: The ReLU function \(\max(0, x)\) over \([l, u]\) has exactly two linear pieces: \(y = 0\) for \(x \in [l, 0]\) and \(y = x\) for \(x \in [0, u]\).
Endpoints of convex hull: The convex hull of these two segments connects the extreme points \((l, 0)\) and \((u, u)\), which defines the secant line.
Lower bound optimality: The lower bound \(a_i \geq 0\) is exact for \(x_i \in [l, 0]\) and lies on the ReLU for \(x_i > 0\). Any tighter lower bound would violate soundness (pass below the true function).
Upper bound optimality: The secant line is the unique line connecting the convex hull endpoints. Any line with smaller slope would fail to upper-bound the ReLU for some \(x \in [l, u]\).
Conclusion: The triangle region is the minimal convex over-approximation. QED.

Implications for Verification Methods:

DeepZ [Singh et al., 2018], Fast-Lin [Weng et al., 2018], CROWN [Zhang et al., 2018] all use triangle relaxation for single-neuron ReLU analysis
These methods cannot improve beyond triangle tightness for individual ReLU neurons
The “highest precision among all relevant works” (as stated in the survey) for single-neuron ReLU approximation is bounded by the convex hull
Multi-neuron methods (PRIMA [Müller et al., 2022], k-ReLU [Singh et al., 2019]) must analyze multiple ReLUs jointly to exceed this barrier

Why the Barrier Exists

The convex barrier arises from the piecewise-linear nature of ReLU. The function has a non-differentiable kink at \(x = 0\), creating two distinct linear regions. Any convex over-approximation must “bridge” these regions with a straight line, which is exactly the secant.

To get tighter bounds, methods must either:

Abandon single-neuron analysis (multi-neuron methods analyze correlations)
Abandon linear bounds (quadratic methods like SDP)
Abandon soundness (empirical methods)

All polynomial-time incomplete methods using linear single-neuron relaxation hit this fundamental wall.

Tightness Ordering and Incomparability of Verification Methods¶

The verification literature contains dozens of methods with varying precision. How do they compare? Can we establish a formal ordering?

The Tightness Ordering Problem¶

Question: Given two incomplete verification methods \(M_1\) and \(M_2\), is one always at least as tight as the other?

Formal definition: Method \(M_1\) is at least as tight as \(M_2\) (written \(M_1 \sqsupseteq M_2\)) if:

\[\begin{split}\forall \text{ networks } f, \forall \text{ input regions } X, \forall \text{ properties } \phi: \\ M_2 \text{ certifies } \phi \Rightarrow M_1 \text{ certifies } \phi\end{split}\]

In other words, \(M_1\) certifies at least as many properties as \(M_2\) across all instances.

Deterministic Method Rankings: T1–T7¶

The survey establishes a tightness hierarchy for deterministic incomplete methods based on their abstract domains and relaxation techniques. Methods are ranked T1 (loosest) through T7 (tightest):

T1 (Interval / Box Domain):

Methods: Basic interval arithmetic [Cheng et al., 2017]
Precision: Loosest; loses all correlation information between neurons
Advantage: Extremely fast (\(O(n)\) per layer)

T2 (Zonotope Domain):

Methods: DeepZ [Singh et al., 2018], DIFFAI
Precision: Better than intervals; tracks affine dependencies
Abstract domain: Symmetric polytopes (zonotopes)
Advantage: Same computational cost as intervals (\(O(n)\)) but maintains some symbolic information

T3 (Hybrid Zonotope):

Methods: RefineZono [Singh et al., 2019]
Precision: Zonotopes + MILP refinement for unstable neurons
Advantage: Improves zonotope precision without full polyhedra cost

T4 (Single-Neuron Polyhedra):

Methods: DeepPoly [Singh et al., 2019], CROWN [Zhang et al., 2018]
Precision: Polyhedral constraints per neuron; maximum tightness for single-neuron linear relaxation
Convex barrier: These methods hit the triangle relaxation optimality limit
Advantage: Polynomial time (\(O(n^2)\) per layer) with near-optimal single-neuron bounds

T5 (Multi-Neuron Polyhedra - Pairs):

Methods: k-ReLU with \(k=2\) [Singh et al., 2019], PRIMA [Müller et al., 2022]
Precision: Analyzes pairs of neurons jointly; breaks through convex barrier
Improvement: Can certify properties that single-neuron methods cannot
Cost: Higher computational cost (\(O(n^3)\) to \(O(n^4)\) depending on implementation)

T6 (Multi-Neuron Polyhedra - k neurons):

Methods: k-ReLU with \(k > 2\) [Singh et al., 2019]
Precision: Analyzes \(k\) neurons simultaneously
Cost: Exponential in \(k\): \(O(2^k \cdot n^2)\) constraints
Tradeoff: Precision vs. tractability

T7 (Quadratic/SDP Constraints):

Methods: LipSDP [Fazlyab et al., 2020], SDP-based methods [Raghunathan et al., 2018]
Precision: Tightest among incomplete methods; captures quadratic relations
Cost: Very high (\(O(n^6)\) for interior point SDP solvers)
Limitation: Practical only for small networks

Formal Ordering:

\[T_1 \sqsubseteq T_2 \sqsubseteq T_3 \sqsubseteq T_4 \sqsubseteq T_5 \sqsubseteq T_6 \sqsubseteq T_7\]

where \(\sqsubseteq\) denotes “at least as loose as.”

Proof Technique for Ordering

To prove \(M_1 \sqsupseteq M_2\), we show that \(M_1\)’s abstract domain includes \(M_2\)’s domain as a special case. For example:

DeepPoly ⊇ DeepZ: Every zonotope can be represented as a polyhedron (polyhedra are more general)
PRIMA ⊇ DeepPoly: PRIMA includes DeepPoly’s single-neuron constraints plus additional multi-neuron constraints

This inclusion relationship guarantees \(M_1\) is at least as tight as \(M_2\).

Incomparability Results¶

Not all methods can be formally ordered. Some methods are incomparable—each may be tighter on different instances.

Example: CROWN vs. Fast-Lin

Both use single-neuron polyhedral relaxation (both in T4 class)
CROWN uses backward bound propagation with optimized dual variables
Fast-Lin uses forward propagation with fixed relaxation parameters
Neither dominates the other across all networks

Result: \(\text{CROWN} \not\sqsupseteq \text{Fast-Lin}\) and \(\text{Fast-Lin} \not\sqsupseteq \text{CROWN}\). They are incomparable.

Practical Implication: Method selection depends on network structure and property. Benchmarking on specific instances is necessary.

Stochastic Method Rankings: ST1–ST4¶

For probabilistic certification methods (randomized smoothing variants), a separate ranking exists:

ST1: Basic randomized smoothing [Cohen et al., 2019] (Gaussian noise)

ST2: Improved smoothing with noise optimization

ST3: Smoothing with adversarial training

ST4: Hybrid deterministic-stochastic methods

Note: Stochastic and deterministic methods are incomparable (one provides probabilistic guarantees, the other deterministic).

Dependency Loss in Compositional Reasoning¶

Incomplete methods that propagate bounds layer-by-layer face a subtle but critical challenge: dependency loss. As bounds propagate through network layers, correlation information between neurons is progressively lost, causing bounds to inflate exponentially with depth.

The Interval Arithmetic Problem¶

Interval arithmetic (also known as basic bound propagation) treats each neuron independently:

For affine node \(i\) with \(J\) incoming connections:

\[x_i = b_i + \sum_{j \in [J]} w_{ij} \cdot a_j\]

Given bounds \(a_j \in [a_j^{\ell}, a_j^{u}]\), interval arithmetic computes:

\[\begin{split}\begin{aligned} x_i^{\ell} &= b_i + \sum_{j} \left( \max(0, w_{ij}) \cdot a_j^{\ell} + \min(0, w_{ij}) \cdot a_j^{u} \right) \\ x_i^{u} &= b_i + \sum_{j} \left( \max(0, w_{ij}) \cdot a_j^{u} + \min(0, w_{ij}) \cdot a_j^{\ell} \right) \end{aligned}\end{split}\]

The problem: This formula assumes \(a_j\) and \(a_k\) are independent for \(j \neq k\). But they’re not—both may depend on the same input variables \(x_0^{(m)}\).

Theorem (Dependency Loss in Interval Arithmetic): Interval arithmetic loses precision exponentially with network depth when neurons have correlated dependencies on input variables.

Proof Sketch:

Consider a simple case: two parallel paths in a network both depending on input \(x_0 \in [-1, 1]\):

Path 1: \(y_1 = x_0\) - Path 2: :math:`y_2 = x_0`True combined range: Since \(y_1 = y_2\), we have \(y_1 + y_2 \in [-2, 2]\) (exact).

Interval arithmetic computes:

\(y_1 \in [-1, 1]\)
\(y_2 \in [-1, 1]\)
\(y_1 + y_2 \in [-1-1, 1+1] = [-2, 2]\)

In this simple case, interval arithmetic is exact. But now add a ReLU and another layer:

\(z_1 = \text{ReLU}(y_1 + y_2)\) - :math:`z_2 = text{ReLU}(y_1 - y_2)`True range: Since \(y_1 = y_2\), we have \(y_1 + y_2 = 2x_0 \in [-2, 2]\) and \(y_1 - y_2 = 0\) (always).
\(z_1 = \max(0, 2x_0) \in [0, 2]\)
\(z_2 = \max(0, 0) = 0\) (exact!)

Interval arithmetic computes:

\(y_1 + y_2 \in [-2, 2]\) (correct)
\(y_1 - y_2 \in [-1-1, 1-(-1)] = [-2, 2]\) (wrong! Should be 0)
\(z_1 \in [0, 2]\) (correct)
\(z_2 \in [0, 2]\) (wrong! Should be 0)

The error doubles because interval arithmetic can’t detect that \(y_1 - y_2 = 0\) always.

As depth increases, such errors compound multiplicatively, leading to exponential bound inflation.

Survey Quote: “Interval arithmetic does not take symbolic dependency into account so it tends to generate a larger interval for a deep neural network when the number of layers increases.”

Symbolic Methods Mitigate Dependency Loss¶

DeepPoly and other polyhedral methods avoid this problem by maintaining symbolic expressions:

Instead of just bounds \([a_j^{\ell}, a_j^{u}]\), DeepPoly tracks:

\[a_j = \sum_{k} \alpha_{jk} \cdot x_0^{(k)} + \beta_j\]

This symbolic representation reveals that \(a_1\) and \(a_2\) both depend on \(x_0\), allowing tight bound computation.

Backtracking: Some methods symbolically compose through layers, then backtrack to simplify expressions in terms of input variables. This removes redundant terms and tightens bounds.

Cost: Symbolic methods cost more—\(O(n^2)\) per layer vs. \(O(n)\) for interval arithmetic—but avoid exponential precision loss.

Practical Impact

For shallow networks (2-5 layers), interval arithmetic may suffice.

For deep networks (20+ layers), dependency loss makes pure interval arithmetic unusable. Symbolic tracking (DeepPoly, CROWN) becomes essential.

Rule of thumb: If certified radius drops sharply with depth, suspect dependency loss. Upgrade to symbolic methods.

Fundamental Accuracy-Robustness Tradeoff¶

Beyond computational barriers and mathematical precision limits, inherent tradeoffs between clean accuracy and robust accuracy limit what’s achievable.

Empirical Observations¶

Consistent phenomenon: Robust training (adversarial or certified) reduces clean accuracy:

Standard CIFAR-10: ~95% clean accuracy
PGD adversarial training [Madry et al., 2018]: ~85% clean accuracy (~10% drop)
Certified training [Gowal et al., 2019, Zhang et al., 2020]: ~80% clean accuracy (~15% drop)

Worse at larger ε: Larger robustness radius → larger accuracy drop.

Not just an implementation issue: This occurs across different training methods, architectures, and datasets.

Theoretical Explanations¶

Sample complexity: Robust learning requires more samples than standard learning. For \(\epsilon\)-robust learning on \(d\)-dimensional data:

\[N_{\text{robust}} = \Omega\left(\frac{d}{\epsilon^2}\right)\]

compared to standard learning’s \(N_{\text{standard}} = O(1/\epsilon^2)\). The dimension \(d\) appears in the robust case—exponentially more data needed for high-dimensional problems.

Concentration of measure: In high dimensions, most volume of a ball is near its surface. Robust classifiers must be constant on balls, requiring simple decision boundaries. Complex boundaries (needed for high accuracy) can’t be robust.

Information-theoretic limits: Recent work suggests fundamental information-theoretic limits on accuracy-robustness tradeoffs—you can’t have both arbitrarily high accuracy and robustness simultaneously.

The Robustness-Accuracy Dilemma

Clean accuracy requires:

Complex decision boundaries
Fitting subtle patterns in data
Exploiting all available features

Robustness requires:

Simple, smooth decision boundaries
Ignoring fragile features
Only using stable patterns

These are fundamentally in tension. High accuracy exploits everything; high robustness ignores fragile signals.

Separable vs Inseparable Problems¶

Separable: If natural and adversarial examples come from truly different distributions, high accuracy and robustness are achievable simultaneously.

Inseparable: If adversarial perturbations merely explore the natural data manifold (moving within distribution), accuracy and robustness trade off fundamentally.

Reality: Natural images likely have complex structure; adversarial examples may or may not be “off-manifold.” Debate continues, but evidence suggests some inherent tradeoff exists.

Verification-Specific Barriers¶

Beyond general computational barriers, verification faces domain-specific challenges:

ReLU Non-Linearity¶

Piecewise linear: ReLU networks are piecewise linear—linear within each region, but exponentially many regions.

Relaxation necessity: Incomplete methods must relax ReLU constraints. Triangle relaxation, convex hull, abstract domains—all lose tightness.

Tightness vs efficiency: Tighter relaxations (SDP [Raghunathan et al., 2018], multi-neuron [Müller et al., 2022]) cost more. Polynomial-time methods must accept some looseness.

Depth Amplifies Difficulty¶

Error accumulation: Bound propagation methods accumulate approximation error through layers. A network with \(L\) layers may have error growing as \(O(L \cdot \epsilon_{\text{layer}})\).

Deep networks: Modern architectures (ResNet, transformers) have 50-100+ layers. Even small per-layer looseness compounds.

Mitigation: Multi-neuron relaxations [Müller et al., 2022], tighter propagation [Singh et al., 2019] help but don’t eliminate the fundamental challenge.

Non-Convexity¶

ReLU non-convexity: The ReLU function \(\max(0, z)\) is non-convex. Composition of non-convexities is highly non-convex.

Verification as non-convex optimization: Finding worst-case perturbation is solving a non-convex optimization. No guarantee of finding global optimum efficiently.

Relaxation necessity: LP [Salman et al., 2019], SDP [Raghunathan et al., 2018] relax to convex problems. Completeness requires exploring non-convex space—exponential effort.

What Barriers Mean for Practice¶

Realistic Expectations¶

Complete verification won’t scale to huge networks: ResNets with millions of parameters, transformers with billions—complete verification is intractable. Accept this and use incomplete methods.

Certified accuracy will lag clean accuracy: Don’t expect 95% certified accuracy on CIFAR-10. Current state-of-the-art (~70%) is impressive given fundamental barriers.

Approximation gaps are unavoidable: The gap between what verification certifies and true robustness exists and can’t be eliminated efficiently.

Strategic Choices¶

When to use complete methods:

Small networks (hundreds to few thousand neurons)
Safety-critical applications where guarantees essential
When computational budget allows days of solving time

When to accept incompleteness:

Large networks (tens of thousands of neurons and up)
Rapid iteration required
Applications where “probably robust” suffices

Design for verifiability:

Train networks to be easier to verify (spectral normalization, certified training [Gowal et al., 2019, Zhang et al., 2020])
Use architectures amenable to verification (avoid complex activations, limit depth)
Accept accuracy loss for certified robustness

Research Directions Despite Barriers¶

While fundamental barriers exist, research finds ways to push boundaries:

Better Average-Case Performance¶

Heuristics within exponential worst-case: Branch-and-bound [Wang et al., 2021, Zhang et al., 2022] is exponential worst-case but often polynomial average-case through smart pruning.

Instance-dependent complexity: Some networks/properties are easier than others. Identify and exploit tractable special cases.

Tighter Incomplete Methods¶

Multi-neuron relaxations [Müller et al., 2022]: Capture correlations for tighter bounds without complete verification.

Optimized bound propagation: α-β-CROWN [Wang et al., 2021, Xu et al., 2020] optimizes relaxation parameters for much tighter bounds.

GPU acceleration [Xu et al., 2020]: Massively parallel bound computation enables practical verification at scale.

Probabilistic Guarantees¶

Randomized smoothing [Cohen et al., 2019]: Accept probabilistic rather than deterministic guarantees. Achieves better certified radius through statistical concentration.

Tradeoff: Statistical confidence instead of absolute certainty. For many applications, 99.9% confidence suffices.

Co-Design of Training and Verification¶

Certified training [Gowal et al., 2019, Zhang et al., 2020]: Train networks to maximize verified robustness. Result: networks easier to verify, higher certified accuracy.

Architecture search for verifiability: Design architectures specifically for verification (e.g., bounded Lipschitz, fewer uncertain ReLUs).

Regularization: Lipschitz regularization [Fazlyab et al., 2019], margin maximization encourage properties that aid verification.

Fundamental vs Practical Barriers¶

Fundamental barriers (won’t change):

NP-completeness of verification
Approximation hardness
Curse of dimensionality (for complete methods)
Some accuracy-robustness tradeoff

Practical barriers (improving over time):

Tightness of incomplete methods (multi-neuron, SDP improving)
Speed of solvers (GPU acceleration, better heuristics)
Certified accuracy achieved (70%+ on CIFAR-10, up from 30% five years ago)
Network sizes verifiable (now tens of thousands, growing)

Research improves practice within fundamental limits. We won’t eliminate NP-completeness, but we can make verification practical for increasingly large networks through cleverer algorithms and better engineering.

Final Thoughts¶

Theoretical barriers—NP-completeness [Katz et al., 2017, Weng et al., 2018], approximation hardness, dimensionality curses, accuracy-robustness tradeoffs—are not obstacles to overcome but realities to understand. They define what’s possible and what’s not.

Complete verification will never be polynomial-time (assuming P≠NP). Certified accuracy will never match clean accuracy without fundamental changes to machine learning. The gap between certified and true robustness can’t be eliminated efficiently.

But within these constraints, remarkable progress continues. Tighter incomplete methods [Singh et al., 2019, Müller et al., 2022], better heuristics for complete methods [Wang et al., 2021, Zhang et al., 2022], probabilistic alternatives [Cohen et al., 2019], and co-design of training and verification [Gowal et al., 2019, Zhang et al., 2020] push the boundaries of what’s practical.

Understanding theoretical barriers clarifies where to invest research effort: not chasing polynomial-time complete verification (impossible) but developing better average-case heuristics, tighter incomplete bounds, and methods that work within fundamental limits. This realism guides productive research and sets appropriate expectations for verification’s capabilities.

Next Phase

Congratulations on completing Phase 1: Foundations! Continue to Phase 2: Core Verification Techniques to explore core verification techniques including bound propagation, LP/SDP methods, and complete verification approaches.

[1]

Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, 2017.

[2] (1,2,3,4)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[3]

Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, 269–286. Springer, 2017.

[4]

Mahyar Fazlyab, Manfred Morari, and George J Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Transactions on Automatic Control, 2020.

[5]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.

[6] (1,2,3,4,5,6,7,8)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[7] (1,2,3,4,5)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[8] (1,2)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[9] (1,2)

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.

[10] (1,2,3,4,5,6,7)

Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.

[11] (1,2,3,4,5)

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[12]

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[13] (1,2,3,4,5,6)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[14] (1,2)

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802–10813. 2018.

[15]

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.

[16]

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Boosting robustness certification of neural networks. In International Conference on Learning Representations. 2019.

[17] (1,2)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[18] (1,2,3,4,5)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[19] (1,2,3,4,5,6,7)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[20] (1,2,3,4)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[21] (1,2,3,4,5)

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.

[22] (1,2,3)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[23] (1,2,3,4,5)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.