Theoretical Barriers in Neural Network Verification

Verification methods keep improving—faster solvers, tighter bounds, better heuristics. But fundamental limits constrain what’s achievable. Neural network verification faces mathematical barriers: NP-completeness means exponential worst-case complexity, approximation hardness means you can’t even efficiently approximate certified robustness, and inherent tradeoffs between accuracy and robustness limit what’s trainable.

Understanding these barriers clarifies what verification can and cannot achieve. They’re not engineering limitations—they’re mathematical facts. No algorithmic breakthrough will make complete verification polynomial-time (unless P=NP). No clever trick will eliminate the accuracy-robustness tradeoff.

This guide explores the fundamental theoretical barriers in neural network verification: what they are, why they exist, and what they mean for research and practice.

NP-Completeness of Verification

The foundational barrier: verifying properties of ReLU neural networks is NP-complete [Katz et al., 2017].

The Verification Problem

Decision problem: Given a neural network \(f_\theta\), input region \(\mathcal{X}\), and property \(\phi\), does \(\forall x \in \mathcal{X}, \phi(f_\theta(x)) = \text{true}\)?

This is NP-complete for ReLU networks [Katz et al., 2017]. Specifically, the problem of determining whether a ReLU network maintains a specific classification on all inputs in a polytope is NP-complete.

What NP-Completeness Means

Formal statement: Verification is at least as hard as any problem in NP. No known polynomial-time algorithm exists, and if one did, it would prove P=NP (resolving a millennium problem).

Practical implications:

  1. Worst-case exponential time: Any complete verification algorithm must take exponential time in the worst case

  2. No efficient general algorithm: Can’t develop a verifier that runs in polynomial time for all instances

  3. Heuristics necessary: Must use approximations, relaxations, or instance-specific optimizations

  4. Fundamental barrier: Not an engineering limitation but mathematical impossibility (assuming P≠NP)

Why Verification is NP-Complete

Reduction from satisfiability: The proof [Katz et al., 2017] reduces Boolean satisfiability (SAT) to network verification.

ReLU creates decisions: Each ReLU activation creates a binary choice (active or inactive). With \(k\) ReLUs, there are \(2^k\) possible activation patterns.

Exponential combinations: Determining which pattern occurs for inputs in a region requires exploring exponentially many cases in the worst case.

Result: Verification inherits the hardness of combinatorial optimization.

Implications for Verification Methods

Complete methods must be exponential: Marabou [Katz et al., 2019], MILP [Tjeng et al., 2019], and branch-and-bound [Wang et al., 2021] all have exponential worst-case complexity. This is unavoidable.

Incomplete methods sacrifice completeness: CROWN [Weng et al., 2018, Zhang et al., 2018], IBP [Gowal et al., 2019], and other polynomial-time methods avoid NP-completeness by accepting “unknown” results. They trade completeness for efficiency.

No breakthrough coming: Unless P=NP (extremely unlikely), no polynomial-time complete verifier will ever exist. Research focuses on:

  • Better average-case performance (even if worst-case remains exponential)

  • Tighter incomplete methods

  • Identifying tractable special cases

Table 17 Complexity Hierarchy

Problem

Complexity

Implication

Verification (ReLU networks)

NP-complete

Exponential worst-case

Linear program solving

P (polynomial)

Efficient

Semi-definite program

P (polynomial)

Efficient (but high degree)

Mixed-integer program

NP-complete

Exponential worst-case

Boolean satisfiability

NP-complete

Exponential worst-case

Approximation Hardness

Can’t solve exactly? What about approximation?

Bad news: Even approximating the certified robustness radius within any constant factor is NP-hard [Weng et al., 2018].

The Approximation Problem

Certified radius: For input \(x_0\) with true label \(y_0\), the certified radius is:

\[r_{\text{cert}}(x_0) = \sup \{ r : \forall x \in \mathcal{B}_r(x_0), \arg\max_c f_\theta(x)_c = y_0 \}\]

Approximation question: Can we efficiently compute \(\tilde{r}\) such that:

\[\frac{1}{c} r_{\text{cert}} \leq \tilde{r} \leq r_{\text{cert}}\]

for some constant \(c\)?

Answer: No (unless P=NP). The problem of approximating certified radius within any constant factor is NP-hard [Weng et al., 2018].

What This Means

Can’t even approximate: Not only is finding the exact certified radius hard, even getting a constant-factor approximation is hard.

All methods are conservative: IBP [Gowal et al., 2019], CROWN [Zhang et al., 2018], SDP [Raghunathan et al., 2018]—all provide lower bounds on certified radius. They might be arbitrarily loose.

No guarantee of tightness: Even if a method seems tight on benchmarks, no theoretical guarantee says it’s within any factor of optimal.

Practical impact: The gap between what verification methods certify and true robustness may be large, with no efficient way to close it.

Curse of Dimensionality

Verification difficulty grows exponentially with input dimension—the classic curse of dimensionality.

Volume of High-Dimensional Balls

\(\ell_\infty\) ball volume: For an \(\ell_\infty\) ball with radius \(\epsilon\) in \(d\) dimensions:

\[\text{Vol}(\mathcal{B}_\infty^\epsilon) = (2\epsilon)^d\]

This grows exponentially with dimension \(d\).

Implication: For images (e.g., CIFAR-10 with \(d = 32 \times 32 \times 3 = 3072\)), an \(\ell_\infty\) ball contains an astronomical number of points. Complete verification must reason about all of them.

Activation Pattern Explosion

ReLU activation patterns: A network with \(k\) ReLU activations has up to \(2^k\) distinct linear regions.

For deep networks: A typical ResNet might have 100K ReLU activations. That’s \(2^{100000}\) potential patterns—more than atoms in the universe.

Combinatorial explosion: Reasoning about which patterns are possible for inputs in a region requires exploring this vast space. Complete methods must systematically search; incomplete methods approximate.

Sampling is Insufficient

Random sampling: For a ball of volume \(V = (2\epsilon)^d\), uniformly sampling \(N\) points covers fraction \(N/V\) of the space.

For high \(d\): Even billions of samples cover negligible fraction. Testing on \(10^9\) samples in a 3072-dimensional \(\ell_\infty\) ball with \(\epsilon = 8/255\) covers roughly \(10^{9} / (2 \cdot 8/255)^{3072} \approx 0\) of the space.

Conclusion: Empirical testing can’t provide guarantees in high dimensions. Formal verification is necessary but faces exponential difficulty.

The Convex Barrier: Fundamental Limit of Linear Relaxation

Beyond computational complexity, there exist mathematical limits on how tight linear approximations of activation functions can be. The convex barrier theorem establishes a fundamental precision ceiling for single-neuron verification methods.

The Triangle Relaxation for ReLU

The triangle relaxation [Ehlers, 2017] is a fundamental technique for convex approximation of ReLU activations. For ReLU activation \(a_i = \max(0, x_i)\) with input bounds \(x_i \in [l, u]\) where \(l < 0 < u\) (the crossing or unstable case), the standard linear relaxation uses:

\[\begin{split}\begin{aligned} \text{Lower bound: } & a_i \geq 0 \\ \text{Upper bound: } & a_i \leq \frac{u}{u - l}(x_i - l) \end{aligned}\end{split}\]

The upper bound is the secant line connecting points \((l, 0)\) and \((u, u)\), forming a triangular region between the bounds and the true ReLU function.

Theorem (Convex Barrier for ReLU): The triangle relaxation above is the optimal convex hull of the ReLU function over the interval \([l, u]\) with \(l < 0 < u\). No sound single-neuron linear relaxation can produce tighter bounds without sacrificing convexity or soundness.

Proof Sketch:

  1. Convex hull construction: The ReLU function \(\max(0, x)\) over \([l, u]\) has exactly two linear pieces: \(y = 0\) for \(x \in [l, 0]\) and \(y = x\) for \(x \in [0, u]\).

  2. Endpoints of convex hull: The convex hull of these two segments connects the extreme points \((l, 0)\) and \((u, u)\), which defines the secant line.

  3. Lower bound optimality: The lower bound \(a_i \geq 0\) is exact for \(x_i \in [l, 0]\) and lies on the ReLU for \(x_i > 0\). Any tighter lower bound would violate soundness (pass below the true function).

  4. Upper bound optimality: The secant line is the unique line connecting the convex hull endpoints. Any line with smaller slope would fail to upper-bound the ReLU for some \(x \in [l, u]\).

  5. Conclusion: The triangle region is the minimal convex over-approximation. QED.

Implications for Verification Methods:

  • DeepZ [Singh et al., 2018], Fast-Lin [Weng et al., 2018], CROWN [Zhang et al., 2018] all use triangle relaxation for single-neuron ReLU analysis

  • These methods cannot improve beyond triangle tightness for individual ReLU neurons

  • The “highest precision among all relevant works” (as stated in the survey) for single-neuron ReLU approximation is bounded by the convex hull

  • Multi-neuron methods (PRIMA [Müller et al., 2022], k-ReLU [Singh et al., 2019]) must analyze multiple ReLUs jointly to exceed this barrier

Why the Barrier Exists

The convex barrier arises from the piecewise-linear nature of ReLU. The function has a non-differentiable kink at \(x = 0\), creating two distinct linear regions. Any convex over-approximation must “bridge” these regions with a straight line, which is exactly the secant.

To get tighter bounds, methods must either:

  1. Abandon single-neuron analysis (multi-neuron methods analyze correlations)

  2. Abandon linear bounds (quadratic methods like SDP)

  3. Abandon soundness (empirical methods)

All polynomial-time incomplete methods using linear single-neuron relaxation hit this fundamental wall.

Tightness Ordering and Incomparability of Verification Methods

The verification literature contains dozens of methods with varying precision. How do they compare? Can we establish a formal ordering?

The Tightness Ordering Problem

Question: Given two incomplete verification methods \(M_1\) and \(M_2\), is one always at least as tight as the other?

Formal definition: Method \(M_1\) is at least as tight as \(M_2\) (written \(M_1 \sqsupseteq M_2\)) if:

\[\begin{split}\forall \text{ networks } f, \forall \text{ input regions } X, \forall \text{ properties } \phi: \\ M_2 \text{ certifies } \phi \Rightarrow M_1 \text{ certifies } \phi\end{split}\]

In other words, \(M_1\) certifies at least as many properties as \(M_2\) across all instances.

Deterministic Method Rankings: T1–T7

The survey establishes a tightness hierarchy for deterministic incomplete methods based on their abstract domains and relaxation techniques. Methods are ranked T1 (loosest) through T7 (tightest):

T1 (Interval / Box Domain):

  • Methods: Basic interval arithmetic [Cheng et al., 2017]

  • Precision: Loosest; loses all correlation information between neurons

  • Advantage: Extremely fast (\(O(n)\) per layer)

T2 (Zonotope Domain):

  • Methods: DeepZ [Singh et al., 2018], DIFFAI

  • Precision: Better than intervals; tracks affine dependencies

  • Abstract domain: Symmetric polytopes (zonotopes)

  • Advantage: Same computational cost as intervals (\(O(n)\)) but maintains some symbolic information

T3 (Hybrid Zonotope):

  • Methods: RefineZono [Singh et al., 2019]

  • Precision: Zonotopes + MILP refinement for unstable neurons

  • Advantage: Improves zonotope precision without full polyhedra cost

T4 (Single-Neuron Polyhedra):

  • Methods: DeepPoly [Singh et al., 2019], CROWN [Zhang et al., 2018]

  • Precision: Polyhedral constraints per neuron; maximum tightness for single-neuron linear relaxation

  • Convex barrier: These methods hit the triangle relaxation optimality limit

  • Advantage: Polynomial time (\(O(n^2)\) per layer) with near-optimal single-neuron bounds

T5 (Multi-Neuron Polyhedra - Pairs):

  • Methods: k-ReLU with \(k=2\) [Singh et al., 2019], PRIMA [Müller et al., 2022]

  • Precision: Analyzes pairs of neurons jointly; breaks through convex barrier

  • Improvement: Can certify properties that single-neuron methods cannot

  • Cost: Higher computational cost (\(O(n^3)\) to \(O(n^4)\) depending on implementation)

T6 (Multi-Neuron Polyhedra - k neurons):

  • Methods: k-ReLU with \(k > 2\) [Singh et al., 2019]

  • Precision: Analyzes \(k\) neurons simultaneously

  • Cost: Exponential in \(k\): \(O(2^k \cdot n^2)\) constraints

  • Tradeoff: Precision vs. tractability

T7 (Quadratic/SDP Constraints):

  • Methods: LipSDP [Fazlyab et al., 2020], SDP-based methods [Raghunathan et al., 2018]

  • Precision: Tightest among incomplete methods; captures quadratic relations

  • Cost: Very high (\(O(n^6)\) for interior point SDP solvers)

  • Limitation: Practical only for small networks

Formal Ordering:

\[T_1 \sqsubseteq T_2 \sqsubseteq T_3 \sqsubseteq T_4 \sqsubseteq T_5 \sqsubseteq T_6 \sqsubseteq T_7\]

where \(\sqsubseteq\) denotes “at least as loose as.”

Proof Technique for Ordering

To prove \(M_1 \sqsupseteq M_2\), we show that \(M_1\)’s abstract domain includes \(M_2\)’s domain as a special case. For example:

  • DeepPoly ⊇ DeepZ: Every zonotope can be represented as a polyhedron (polyhedra are more general)

  • PRIMA ⊇ DeepPoly: PRIMA includes DeepPoly’s single-neuron constraints plus additional multi-neuron constraints

This inclusion relationship guarantees \(M_1\) is at least as tight as \(M_2\).

Incomparability Results

Not all methods can be formally ordered. Some methods are incomparable—each may be tighter on different instances.

Example: CROWN vs. Fast-Lin

  • Both use single-neuron polyhedral relaxation (both in T4 class)

  • CROWN uses backward bound propagation with optimized dual variables

  • Fast-Lin uses forward propagation with fixed relaxation parameters

  • Neither dominates the other across all networks

Result: \(\text{CROWN} \not\sqsupseteq \text{Fast-Lin}\) and \(\text{Fast-Lin} \not\sqsupseteq \text{CROWN}\). They are incomparable.

Practical Implication: Method selection depends on network structure and property. Benchmarking on specific instances is necessary.

Stochastic Method Rankings: ST1–ST4

For probabilistic certification methods (randomized smoothing variants), a separate ranking exists:

ST1: Basic randomized smoothing [Cohen et al., 2019] (Gaussian noise)

ST2: Improved smoothing with noise optimization

ST3: Smoothing with adversarial training

ST4: Hybrid deterministic-stochastic methods

Note: Stochastic and deterministic methods are incomparable (one provides probabilistic guarantees, the other deterministic).

Dependency Loss in Compositional Reasoning

Incomplete methods that propagate bounds layer-by-layer face a subtle but critical challenge: dependency loss. As bounds propagate through network layers, correlation information between neurons is progressively lost, causing bounds to inflate exponentially with depth.

The Interval Arithmetic Problem

Interval arithmetic (also known as basic bound propagation) treats each neuron independently:

For affine node \(i\) with \(J\) incoming connections:

\[x_i = b_i + \sum_{j \in [J]} w_{ij} \cdot a_j\]

Given bounds \(a_j \in [a_j^{\ell}, a_j^{u}]\), interval arithmetic computes:

\[\begin{split}\begin{aligned} x_i^{\ell} &= b_i + \sum_{j} \left( \max(0, w_{ij}) \cdot a_j^{\ell} + \min(0, w_{ij}) \cdot a_j^{u} \right) \\ x_i^{u} &= b_i + \sum_{j} \left( \max(0, w_{ij}) \cdot a_j^{u} + \min(0, w_{ij}) \cdot a_j^{\ell} \right) \end{aligned}\end{split}\]

The problem: This formula assumes \(a_j\) and \(a_k\) are independent for \(j \neq k\). But they’re not—both may depend on the same input variables \(x_0^{(m)}\).

Theorem (Dependency Loss in Interval Arithmetic): Interval arithmetic loses precision exponentially with network depth when neurons have correlated dependencies on input variables.

Proof Sketch:

Consider a simple case: two parallel paths in a network both depending on input \(x_0 \in [-1, 1]\):

  • Path 1: \(y_1 = x_0\) - Path 2: :math:`y_2 = x_0`True combined range: Since \(y_1 = y_2\), we have \(y_1 + y_2 \in [-2, 2]\) (exact).

Interval arithmetic computes:

  • \(y_1 \in [-1, 1]\)

  • \(y_2 \in [-1, 1]\)

  • \(y_1 + y_2 \in [-1-1, 1+1] = [-2, 2]\)

In this simple case, interval arithmetic is exact. But now add a ReLU and another layer:

  • \(z_1 = \text{ReLU}(y_1 + y_2)\) - :math:`z_2 = text{ReLU}(y_1 - y_2)`True range: Since \(y_1 = y_2\), we have \(y_1 + y_2 = 2x_0 \in [-2, 2]\) and \(y_1 - y_2 = 0\) (always).

  • \(z_1 = \max(0, 2x_0) \in [0, 2]\)

  • \(z_2 = \max(0, 0) = 0\) (exact!)

Interval arithmetic computes:

  • \(y_1 + y_2 \in [-2, 2]\) (correct)

  • \(y_1 - y_2 \in [-1-1, 1-(-1)] = [-2, 2]\) (wrong! Should be 0)

  • \(z_1 \in [0, 2]\) (correct)

  • \(z_2 \in [0, 2]\) (wrong! Should be 0)

The error doubles because interval arithmetic can’t detect that \(y_1 - y_2 = 0\) always.

As depth increases, such errors compound multiplicatively, leading to exponential bound inflation.

Survey Quote: “Interval arithmetic does not take symbolic dependency into account so it tends to generate a larger interval for a deep neural network when the number of layers increases.”

Symbolic Methods Mitigate Dependency Loss

DeepPoly and other polyhedral methods avoid this problem by maintaining symbolic expressions:

Instead of just bounds \([a_j^{\ell}, a_j^{u}]\), DeepPoly tracks:

\[a_j = \sum_{k} \alpha_{jk} \cdot x_0^{(k)} + \beta_j\]

This symbolic representation reveals that \(a_1\) and \(a_2\) both depend on \(x_0\), allowing tight bound computation.

Backtracking: Some methods symbolically compose through layers, then backtrack to simplify expressions in terms of input variables. This removes redundant terms and tightens bounds.

Cost: Symbolic methods cost more—\(O(n^2)\) per layer vs. \(O(n)\) for interval arithmetic—but avoid exponential precision loss.

Practical Impact

For shallow networks (2-5 layers), interval arithmetic may suffice.

For deep networks (20+ layers), dependency loss makes pure interval arithmetic unusable. Symbolic tracking (DeepPoly, CROWN) becomes essential.

Rule of thumb: If certified radius drops sharply with depth, suspect dependency loss. Upgrade to symbolic methods.

Fundamental Accuracy-Robustness Tradeoff

Beyond computational barriers and mathematical precision limits, inherent tradeoffs between clean accuracy and robust accuracy limit what’s achievable.

Empirical Observations

Consistent phenomenon: Robust training (adversarial or certified) reduces clean accuracy:

Worse at larger ε: Larger robustness radius → larger accuracy drop.

Not just an implementation issue: This occurs across different training methods, architectures, and datasets.

Theoretical Explanations

Sample complexity: Robust learning requires more samples than standard learning. For \(\epsilon\)-robust learning on \(d\)-dimensional data:

\[N_{\text{robust}} = \Omega\left(\frac{d}{\epsilon^2}\right)\]

compared to standard learning’s \(N_{\text{standard}} = O(1/\epsilon^2)\). The dimension \(d\) appears in the robust case—exponentially more data needed for high-dimensional problems.

Concentration of measure: In high dimensions, most volume of a ball is near its surface. Robust classifiers must be constant on balls, requiring simple decision boundaries. Complex boundaries (needed for high accuracy) can’t be robust.

Information-theoretic limits: Recent work suggests fundamental information-theoretic limits on accuracy-robustness tradeoffs—you can’t have both arbitrarily high accuracy and robustness simultaneously.

The Robustness-Accuracy Dilemma

Clean accuracy requires:

  • Complex decision boundaries

  • Fitting subtle patterns in data

  • Exploiting all available features

Robustness requires:

  • Simple, smooth decision boundaries

  • Ignoring fragile features

  • Only using stable patterns

These are fundamentally in tension. High accuracy exploits everything; high robustness ignores fragile signals.

Separable vs Inseparable Problems

Separable: If natural and adversarial examples come from truly different distributions, high accuracy and robustness are achievable simultaneously.

Inseparable: If adversarial perturbations merely explore the natural data manifold (moving within distribution), accuracy and robustness trade off fundamentally.

Reality: Natural images likely have complex structure; adversarial examples may or may not be “off-manifold.” Debate continues, but evidence suggests some inherent tradeoff exists.

Verification-Specific Barriers

Beyond general computational barriers, verification faces domain-specific challenges:

ReLU Non-Linearity

Piecewise linear: ReLU networks are piecewise linear—linear within each region, but exponentially many regions.

Relaxation necessity: Incomplete methods must relax ReLU constraints. Triangle relaxation, convex hull, abstract domains—all lose tightness.

Tightness vs efficiency: Tighter relaxations (SDP [Raghunathan et al., 2018], multi-neuron [Müller et al., 2022]) cost more. Polynomial-time methods must accept some looseness.

Depth Amplifies Difficulty

Error accumulation: Bound propagation methods accumulate approximation error through layers. A network with \(L\) layers may have error growing as \(O(L \cdot \epsilon_{\text{layer}})\).

Deep networks: Modern architectures (ResNet, transformers) have 50-100+ layers. Even small per-layer looseness compounds.

Mitigation: Multi-neuron relaxations [Müller et al., 2022], tighter propagation [Singh et al., 2019] help but don’t eliminate the fundamental challenge.

Non-Convexity

ReLU non-convexity: The ReLU function \(\max(0, z)\) is non-convex. Composition of non-convexities is highly non-convex.

Verification as non-convex optimization: Finding worst-case perturbation is solving a non-convex optimization. No guarantee of finding global optimum efficiently.

Relaxation necessity: LP [Salman et al., 2019], SDP [Raghunathan et al., 2018] relax to convex problems. Completeness requires exploring non-convex space—exponential effort.

What Barriers Mean for Practice

Realistic Expectations

Complete verification won’t scale to huge networks: ResNets with millions of parameters, transformers with billions—complete verification is intractable. Accept this and use incomplete methods.

Certified accuracy will lag clean accuracy: Don’t expect 95% certified accuracy on CIFAR-10. Current state-of-the-art (~70%) is impressive given fundamental barriers.

Approximation gaps are unavoidable: The gap between what verification certifies and true robustness exists and can’t be eliminated efficiently.

Strategic Choices

When to use complete methods:

  • Small networks (hundreds to few thousand neurons)

  • Safety-critical applications where guarantees essential

  • When computational budget allows days of solving time

When to accept incompleteness:

  • Large networks (tens of thousands of neurons and up)

  • Rapid iteration required

  • Applications where “probably robust” suffices

Design for verifiability:

  • Train networks to be easier to verify (spectral normalization, certified training [Gowal et al., 2019, Zhang et al., 2020])

  • Use architectures amenable to verification (avoid complex activations, limit depth)

  • Accept accuracy loss for certified robustness

Research Directions Despite Barriers

While fundamental barriers exist, research finds ways to push boundaries:

Better Average-Case Performance

Heuristics within exponential worst-case: Branch-and-bound [Wang et al., 2021, Zhang et al., 2022] is exponential worst-case but often polynomial average-case through smart pruning.

Instance-dependent complexity: Some networks/properties are easier than others. Identify and exploit tractable special cases.

Tighter Incomplete Methods

Multi-neuron relaxations [Müller et al., 2022]: Capture correlations for tighter bounds without complete verification.

Optimized bound propagation: α-β-CROWN [Wang et al., 2021, Xu et al., 2020] optimizes relaxation parameters for much tighter bounds.

GPU acceleration [Xu et al., 2020]: Massively parallel bound computation enables practical verification at scale.

Probabilistic Guarantees

Randomized smoothing [Cohen et al., 2019]: Accept probabilistic rather than deterministic guarantees. Achieves better certified radius through statistical concentration.

Tradeoff: Statistical confidence instead of absolute certainty. For many applications, 99.9% confidence suffices.

Co-Design of Training and Verification

Certified training [Gowal et al., 2019, Zhang et al., 2020]: Train networks to maximize verified robustness. Result: networks easier to verify, higher certified accuracy.

Architecture search for verifiability: Design architectures specifically for verification (e.g., bounded Lipschitz, fewer uncertain ReLUs).

Regularization: Lipschitz regularization [Fazlyab et al., 2019], margin maximization encourage properties that aid verification.

Fundamental vs Practical Barriers

Fundamental barriers (won’t change):

  • NP-completeness of verification

  • Approximation hardness

  • Curse of dimensionality (for complete methods)

  • Some accuracy-robustness tradeoff

Practical barriers (improving over time):

  • Tightness of incomplete methods (multi-neuron, SDP improving)

  • Speed of solvers (GPU acceleration, better heuristics)

  • Certified accuracy achieved (70%+ on CIFAR-10, up from 30% five years ago)

  • Network sizes verifiable (now tens of thousands, growing)

Research improves practice within fundamental limits. We won’t eliminate NP-completeness, but we can make verification practical for increasingly large networks through cleverer algorithms and better engineering.

Final Thoughts

Theoretical barriers—NP-completeness [Katz et al., 2017, Weng et al., 2018], approximation hardness, dimensionality curses, accuracy-robustness tradeoffs—are not obstacles to overcome but realities to understand. They define what’s possible and what’s not.

Complete verification will never be polynomial-time (assuming P≠NP). Certified accuracy will never match clean accuracy without fundamental changes to machine learning. The gap between certified and true robustness can’t be eliminated efficiently.

But within these constraints, remarkable progress continues. Tighter incomplete methods [Singh et al., 2019, Müller et al., 2022], better heuristics for complete methods [Wang et al., 2021, Zhang et al., 2022], probabilistic alternatives [Cohen et al., 2019], and co-design of training and verification [Gowal et al., 2019, Zhang et al., 2020] push the boundaries of what’s practical.

Understanding theoretical barriers clarifies where to invest research effort: not chasing polynomial-time complete verification (impossible) but developing better average-case heuristics, tighter incomplete bounds, and methods that work within fundamental limits. This realism guides productive research and sets appropriate expectations for verification’s capabilities.

Further Reading

This guide provides comprehensive coverage of theoretical barriers in neural network verification. For readers interested in diving deeper into the mathematical foundations, we recommend the following resources organized by topic:

NP-Completeness of Verification:

The foundational result [Katz et al., 2017] proved that verifying properties of ReLU neural networks is NP-complete through reduction from Boolean satisfiability. This established that complete verification cannot be polynomial-time unless P=NP. Further work [Weng et al., 2018] showed that even approximating certified robustness within any constant factor is NP-hard, demonstrating that the difficulty extends beyond exact verification.

Complexity Theory Foundations:

Understanding computational complexity theory provides context for these results. The P vs NP problem represents one of computer science’s deepest unsolved questions, and neural network verification’s NP-completeness places it among fundamentally hard problems.

Accuracy-Robustness Tradeoffs:

The tension between clean accuracy and robust accuracy has been documented empirically across many studies. PGD adversarial training [Madry et al., 2018] consistently shows accuracy drops when optimizing for robustness. Theoretical work explores information-theoretic and sample complexity explanations for these tradeoffs.

Complete Verification Methods and Complexity:

Complete methods—Marabou [Katz et al., 2019], MILP [Tjeng et al., 2019], branch-and-bound [Wang et al., 2021, Zhang et al., 2022]—all face exponential worst-case complexity due to NP-completeness. Understanding their algorithms clarifies how they navigate this complexity barrier through heuristics and pruning.

Incomplete Methods as Response:

The NP-completeness barrier motivated development of incomplete methods that sacrifice completeness for polynomial-time complexity. IBP [Gowal et al., 2019], CROWN [Weng et al., 2018, Zhang et al., 2018], and DeepPoly [Singh et al., 2019] accept “unknown” results to achieve scalability.

Tighter Incomplete Methods:

Recent advances—multi-neuron relaxations [Müller et al., 2022], SDP [Raghunathan et al., 2018], optimized bound propagation [Xu et al., 2020]—push incomplete methods toward tighter bounds without sacrificing polynomial-time complexity. This represents practical progress within theoretical constraints.

Probabilistic Certification:

Randomized smoothing [Cohen et al., 2019] offers an alternative approach, accepting probabilistic rather than deterministic guarantees to achieve better scalability and tighter certified radii. This sidesteps some deterministic verification barriers through statistical concentration.

Certified Training:

Training for verifiability [Gowal et al., 2019, Zhang et al., 2020] demonstrates that networks can be designed to be easier to verify, achieving higher certified accuracy within the same computational budget. This represents co-design of training and verification within fundamental barriers.

GPU Acceleration:

Massive parallelization [Xu et al., 2020] enables practical verification at scales previously intractable. While it doesn’t change worst-case complexity, it dramatically improves constant factors and average-case performance.

Related Topics:

For understanding verification methods within these barriers, see Verification Taxonomy: A Systematic View. For the verification problem’s mathematical formulation, see The Verification Problem: Mathematical Formulation. For soundness and completeness tradeoffs, see Soundness and Completeness. For practical implications of these barriers, see Robustness Testing Guide.

Next Phase

Congratulations on completing Phase 1: Foundations! Continue to Phase 2: Core Verification Techniques to explore core verification techniques including bound propagation, LP/SDP methods, and complete verification approaches.

[1]

Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, 2017.

[2] (1,2,3,4)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[3]

Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, 269–286. Springer, 2017.

[4]

Mahyar Fazlyab, Manfred Morari, and George J Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Transactions on Automatic Control, 2020.

[5]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.

[6] (1,2,3,4,5,6,7,8)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[7] (1,2,3,4,5)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[8] (1,2)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[9] (1,2)

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.

[10] (1,2,3,4,5,6,7)

Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.

[11] (1,2,3,4,5)

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[12]

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[13] (1,2,3,4,5,6)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[14] (1,2)

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802–10813. 2018.

[15]

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.

[16]

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Boosting robustness certification of neural networks. In International Conference on Learning Representations. 2019.

[17] (1,2)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[18] (1,2,3,4,5)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[19] (1,2,3,4,5,6,7)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[20] (1,2,3,4)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[21] (1,2,3,4,5)

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.

[22] (1,2,3)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[23] (1,2,3,4,5)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.