Verification Taxonomy: A Systematic View¶
Neural network verification encompasses dozens of methods—SMT solvers [Katz et al., 2017, Katz et al., 2019], bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018], randomized smoothing [Cohen et al., 2019], and more. Each method makes different tradeoffs between tightness, speed, and scalability. Understanding how these methods relate helps you choose the right tool for your application.
This guide presents a systematic taxonomy of neural network verification methods, organizing them along key dimensions: complete vs incomplete, deterministic vs probabilistic, and bound propagation vs optimization-based. By understanding this landscape, you can navigate verification research and select methods intelligently.
Why Taxonomy Matters¶
The verification landscape is vast: Researchers have developed numerous approaches, each with strengths and limitations. Without organization, this diversity is overwhelming.
Taxonomy provides structure: By categorizing methods along fundamental axes, we can:
Understand relationships: How does CROWN relate to DeepPoly? Both are incomplete bound propagation, but use different abstract domains.
Identify gaps: What combinations of properties are unexplored? (e.g., complete probabilistic verification)
Guide selection: Given requirements (speed, tightness, scalability), which category fits best?
Predict performance: Methods in the same category often have similar characteristics.
Taxonomy Benefits
A well-organized taxonomy helps:
Researchers: Identify open problems and unexplored method combinations
Practitioners: Choose verification methods matching their constraints
Students: Build mental models of the verification landscape
Tool builders: Position new methods relative to existing work
Primary Classification Axes¶
Axis 1: Complete vs Incomplete¶
The most fundamental distinction: does the method always give definitive answers?
Complete verification [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019]:
Definition: Always returns VERIFIED or FALSIFIED (with counterexample)
Never returns: “Unknown”
Guarantee: If it says verified, the property holds; if falsified, here’s proof
Cost: Exponential worst-case complexity
Examples: SMT (Marabou), MILP, branch-and-bound
Incomplete verification [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:
Definition: May return VERIFIED, FALSIFIED, or UNKNOWN
Guarantee: If verified, property holds (sound). If unknown, property might hold or not.
Cost: Polynomial-time complexity
Examples: IBP, CROWN, DeepPoly, LP relaxation, SDP
Property |
Complete |
Incomplete |
|---|---|---|
Possible answers |
Verified, Falsified |
Verified, Falsified, Unknown |
Soundness |
Yes (always correct) |
Yes (verified → true) |
Completeness |
Yes (always decides) |
No (may return unknown) |
Time complexity |
Exponential (worst-case) |
Polynomial |
Scalability |
Small networks (~1000 neurons) |
Large networks (millions) |
Why incomplete methods dominate: The NP-completeness [Katz et al., 2017, Weng et al., 2018] of verification means complete methods face fundamental scalability barriers. For practical large-scale verification, incomplete methods are necessary.
Axis 2: Deterministic vs Probabilistic¶
How are guarantees provided?
Deterministic verification [Weng et al., 2018, Zhang et al., 2018, Katz et al., 2019]:
Guarantee: Definite bounds on outputs for all inputs in region
Example: “For all \(x\) in \(\epsilon\)-ball, output stays in class A”
Methods: All bound propagation, SMT, MILP, LP/SDP
Advantage: No uncertainty in guarantee
Limitation: Conservative (may be loose)
Probabilistic verification [Cohen et al., 2019]:
Guarantee: Probabilistic bound with confidence level
Example: “With 99% confidence, for all \(x\) in radius \(r\), output stays in class A”
Methods: Randomized smoothing
Advantage: Tighter bounds than deterministic for same cost
Limitation: Probabilistic (not absolute certainty)
Deterministic vs Probabilistic
Deterministic:
Absolute guarantee
May be conservative (loose bounds)
Required for safety-critical systems needing provable guarantees
Probabilistic:
Statistical guarantee with confidence level
Often tighter for same computational cost
Acceptable when probabilistic guarantees suffice
Most verification methods are deterministic. Randomized smoothing [Cohen et al., 2019] is the primary probabilistic approach, offering better scalability at the cost of statistical rather than absolute guarantees.
Axis 3: Bound Propagation vs Optimization¶
How are bounds computed?
Bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:
Approach: Propagate bounds layer-by-layer through network
Mechanism: Abstract interpretation, interval arithmetic, linear relaxations
Complexity: O(network size) typically
Examples: IBP, CROWN, DeepPoly
Advantage: Fast, scales well
Limitation: May be loose (accumulates approximation errors)
Optimization-based [Raghunathan et al., 2018, Katz et al., 2019, Salman et al., 2019]:
Approach: Formulate bound computation as optimization problem
Mechanism: LP, SDP, SMT, MILP
Complexity: O(n³) for LP/SDP, exponential for SMT/MILP
Examples: LP relaxation, SDP, Marabou
Advantage: Can be tighter (explicit optimization)
Limitation: Slower (solving optimization expensive)
Hybrid: Branch-and-bound [Wang et al., 2021, Zhang et al., 2022]:
Uses bound propagation for initial bounds
Branches (partitions input space) when bounds too loose
Applies bound propagation to each partition
Combines speed of propagation with refinement of partitioning
Verification Method Taxonomy¶
Combining these axes yields a taxonomy of verification approaches:
Complete Methods¶
1. SMT-Based [Katz et al., 2017, Katz et al., 2019]
Approach: Encode network as logical constraints, solve with SMT
Example: Marabou/Reluplex
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Hundreds to ~1000 neurons
Best for: Small networks requiring definitive answers
2. MILP-Based [Tjeng et al., 2019, Cheng et al., 2017]
Approach: Encode as mixed-integer linear program
Example: Gurobi/CPLEX formulations
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Similar to SMT
Best for: Finding minimum perturbations, optimization objectives
3. Branch-and-Bound [Wang et al., 2021, Zhang et al., 2022]
Approach: Partition input space, verify each partition with incomplete methods
Example: α-β-CROWN with branching
Tightness: Exact (complete) if given sufficient time
Speed: Moderate to slow (depends on branching)
Scalability: Larger than SMT/MILP (tens of thousands of neurons)
Best for: When incomplete methods return “unknown” but complete needed
Incomplete Methods: Bound Propagation¶
1. Interval Bound Propagation (IBP) [Gowal et al., 2019]
Approach: Propagate interval bounds layer-by-layer
Tightness: Loosest
Speed: Fastest (O(network size))
Scalability: Millions of neurons
Best for: Rapid screening, certified training
2. Linear Bound Propagation (CROWN) [Weng et al., 2018, Zhang et al., 2018]
Approach: Backward linear bound propagation with optimized coefficients
Tightness: Moderate to tight
Speed: Fast (O(network size))
Scalability: Millions of neurons
Best for: General verification, production use
3. Abstract Interpretation (DeepPoly) [Singh et al., 2019]
Approach: Abstract interpretation with polyhedral domains
Tightness: Tight
Speed: Fast to moderate
Scalability: Large networks
Best for: When tighter than CROWN needed, faster than LP
4. Multi-Neuron Relaxation (PRIMA) [Müller et al., 2022]
Approach: Consider multiple neurons jointly via product terms
Tightness: Very tight
Speed: Moderate (O(network size + k pairs))
Scalability: Thousands to tens of thousands
Best for: When single-neuron bounds too loose
Incomplete Methods: Optimization-Based¶
1. Linear Programming [Salman et al., 2019]
Approach: Encode network as LP, solve for bounds
Tightness: Moderate to tight
Speed: Moderate (O(n³) LP solving)
Scalability: Thousands of neurons
Best for: When explicit optimization formulation needed
2. Semi-Definite Programming [Raghunathan et al., 2018]
Approach: Encode as SDP with lifted quadratic variables
Tightness: Very tight
Speed: Slow (O(n^4.5))
Scalability: Hundreds to ~1000 neurons
Best for: Small networks needing very tight bounds
3. Lipschitz-Based [Fazlyab et al., 2019]
Approach: Bound outputs via network Lipschitz constant
Tightness: Loose (product bound) to tight (LipSDP)
Speed: Fast (product) to slow (SDP)
Scalability: Depends on method
Best for: Networks with controlled Lipschitz constants
Probabilistic Methods¶
Randomized Smoothing [Cohen et al., 2019]
Approach: Smooth classifier with Gaussian noise, statistical certification
Tightness: Tight (often tightest certified radius)
Speed: Slow (requires many samples)
Scalability: Excellent (works for any network size)
Best for: Large networks where deterministic verification intractable
Tightness-Speed-Scalability Tradeoffs¶
Every verification method navigates a three-way tradeoff:
The Tradeoff Triangle¶
Tightness
△
/│\
/ │ \
/ │ \
/ │ \
Complete / │ \ SDP/PRIMA
(SMT) / │ \
/ │ \
/ Branch& \
/ Bound \
/ │ \
/ │ \
/ CROWN/DeepPoly \
/ │ \
/ │ \
/ │ \
Speed ◁────────────┼────────────────▷ Scalability
\ │ /
\ IBP /
\ │ /
\ │ /
\ Randomized /
\ Smoothing /
\ │ /
\ │ /
\ │ /
\ │ /
\ │ /
\ │ /
\ │ /
\ │ /
\│/
▽
You can optimize for at most two:
Tight + Fast: Impossible for large networks (NP-completeness barrier)
Tight + Scalable: Randomized smoothing (probabilistic)
Fast + Scalable: IBP, CROWN (loose to moderate tightness)
Middle ground methods (CROWN, DeepPoly, PRIMA): Balance all three reasonably well.
Quantitative Comparison¶
Method |
Tightness |
Time Complexity |
Max Network Size |
Certified CIFAR-10 (ε=8/255) |
|---|---|---|---|---|
Marabou (SMT) |
Exact |
Exponential |
~1K neurons |
N/A (too slow) |
MILP |
Exact |
Exponential |
~1K neurons |
N/A (too slow) |
Branch&Bound |
Exact |
Exp (avg better) |
~100K neurons |
~65% |
IBP |
Loose |
O(n) |
Millions |
~35% |
CROWN |
Moderate |
O(n) |
Millions |
~55% |
DeepPoly |
Tight |
O(n) |
Millions |
~60% |
PRIMA |
Very Tight |
O(n + k) |
~100K |
~62% |
LP |
Moderate |
O(n³) |
~10K |
~58% |
SDP |
Very Tight |
O(n^4.5) |
~1K |
~63% |
LipSDP |
Tight |
O(n^3+) |
~10K |
~50% |
Rand. Smooth |
Tight (prob) |
O(samples) |
Unlimited |
~70% |
Decision Guide: Choosing a Method¶
Given your requirements, which method should you use?
By Use Case¶
Safety-Critical Application (need absolute guarantees):
Try incomplete methods (CROWN, DeepPoly) first
If unknown, use branch-and-bound [Wang et al., 2021]
If still too slow, use Marabou [Katz et al., 2019] for small networks
If network too large, redesign network to be smaller/easier to verify
Production Deployment (need fast verification):
Use CROWN [Zhang et al., 2018] or DeepPoly [Singh et al., 2019]
If too loose, try PRIMA [Müller et al., 2022]
If still insufficient, use randomized smoothing [Cohen et al., 2019]
Accept “unknown” results gracefully
Research / Benchmarking (need tightest bounds):
Try multi-neuron relaxations (PRIMA) [Müller et al., 2022]
For small networks, use SDP [Raghunathan et al., 2018]
For completeness, branch-and-bound [Wang et al., 2021, Zhang et al., 2022]
Compare multiple methods to understand tradeoffs
Large-Scale Networks (millions of parameters):
Use IBP [Gowal et al., 2019] or CROWN [Zhang et al., 2018]
If certification needed, use randomized smoothing [Cohen et al., 2019]
Accept that complete verification is intractable
Focus on certified training [Gowal et al., 2019, Zhang et al., 2020] instead
By Network Size¶
Small (<1000 neurons):
Best: Marabou [Katz et al., 2019], SDP [Raghunathan et al., 2018], branch-and-bound
Rationale: Can afford complete/expensive methods
Medium (1K-100K neurons):
Best: CROWN [Zhang et al., 2018], DeepPoly [Singh et al., 2019], PRIMA [Müller et al., 2022]
Rationale: Need balance between tightness and speed
Large (>100K neurons):
Best: IBP [Gowal et al., 2019], CROWN [Zhang et al., 2018], randomized smoothing [Cohen et al., 2019]
Rationale: Only scalable methods work
By Tightness Requirement¶
Need exact answers:
Use complete methods: Marabou [Katz et al., 2019], branch-and-bound [Wang et al., 2021]
Need tight bounds, can accept “unknown”:
Use PRIMA [Müller et al., 2022], SDP [Raghunathan et al., 2018], or α-β-CROWN
Moderate bounds acceptable:
Use CROWN [Zhang et al., 2018] or DeepPoly [Singh et al., 2019]
Coarse bounds sufficient:
Use IBP [Gowal et al., 2019] for maximum speed
Evolution of Verification Methods¶
Understanding the historical progression clarifies why current methods exist:
2017: Complete methods emerge [Katz et al., 2017]
Reluplex/Marabou: First practical SMT-based verifier
Proved NP-completeness of verification
Established fundamental complexity barriers
2018: Fast incomplete methods [Weng et al., 2018, Zhang et al., 2018, Singh et al., 2018]
CROWN, Fast-Lin: Linear bound propagation
Polynomial-time, scalable to large networks
Trade completeness for speed
2019: Certified training [Gowal et al., 2019, Zhang et al., 2019]
IBP training: First scalable certified training
TRADES: Accuracy-robustness tradeoff
Shift from verification-only to training integration
2020: Tighter incomplete methods [Xu et al., 2020, Zhang et al., 2020]
α-β-CROWN: Optimized bound parameters
GPU acceleration: Massive speedups
Improved certified accuracy on benchmarks
2021-2022: Multi-neuron and hybrid [Müller et al., 2022, Wang et al., 2021, Zhang et al., 2022]
PRIMA: Multi-neuron relaxations
β-CROWN: Branch-and-bound with tight bounds
Complete verification at larger scale
Current frontier:
Scaling to vision transformers and larger architectures
Tighter relaxations without sacrificing speed
Integration with foundation models
Remaining Open Problems¶
Despite progress, fundamental challenges remain:
Scalability-Tightness Gap The gap between what we can verify tightly (thousands of neurons) and what we want to verify (billions of parameters in LLMs) remains enormous.
Complete Verification Barrier NP-completeness [Katz et al., 2017, Weng et al., 2018] means complete verification will always face exponential worst-case. Better heuristics help but don’t eliminate this barrier.
Non-ReLU Activations Most methods assume ReLU. Modern architectures use GeLU, SiLU, attention—extending verification to these is challenging.
Non-\(\ell_p\) Perturbations
Most verification focuses on \(\ell_p\) norms. Semantic perturbations, physical adversarial examples require different approaches.
Certified Accuracy Gap Even state-of-the-art achieves ~70% certified accuracy on CIFAR-10, far below 95%+ clean accuracy. Closing this gap remains elusive.
Final Thoughts¶
The taxonomy of neural network verification reveals a rich landscape: complete methods [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019] providing definitive answers at high cost, incomplete methods [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018] scaling efficiently with partial guarantees, and probabilistic methods [Cohen et al., 2019] offering alternative tradeoffs.
Understanding this taxonomy helps you navigate the verification literature, choose methods appropriately, and appreciate the fundamental tradeoffs shaping the field. There’s no single “best” verification method—only methods suited to different requirements, constraints, and applications.
As verification research advances, the taxonomy will evolve. New methods may occupy unexplored regions (e.g., probabilistic complete verification, fast tight bounds). But the fundamental axes—completeness, determinism, tightness-speed-scalability—will likely persist, providing enduring structure to organize verification approaches.
Further Reading
This guide provides a systematic taxonomy of neural network verification methods. For readers interested in diving deeper, we recommend exploring specific methods within each category:
Complete Verification Methods:
SMT-based verification with Marabou [Katz et al., 2017, Katz et al., 2019] pioneered complete verification, while MILP approaches [Tjeng et al., 2019, Cheng et al., 2017] offer alternative complete formulations. Branch-and-bound [Wang et al., 2021, Zhang et al., 2022] extends complete verification to larger networks through hybrid incomplete-complete strategies.
Incomplete Bound Propagation:
IBP [Gowal et al., 2019] provides the fastest bounds, CROWN [Weng et al., 2018, Zhang et al., 2018] balances speed and tightness, and DeepPoly [Singh et al., 2019] offers tight abstract interpretation. Multi-neuron relaxations [Müller et al., 2022] capture correlations for even tighter bounds.
Optimization-Based Incomplete Methods:
LP relaxation [Salman et al., 2019] and SDP [Raghunathan et al., 2018] provide tighter bounds through explicit optimization. Lipschitz-based methods [Fazlyab et al., 2019] offer alternative perspectives on robustness certification.
Probabilistic Certification:
Randomized smoothing [Cohen et al., 2019] represents the primary probabilistic approach, providing excellent scalability through statistical guarantees.
Theoretical Foundations:
The NP-completeness result [Katz et al., 2017, Weng et al., 2018] establishes fundamental complexity barriers. Understanding this helps set realistic expectations about verification’s capabilities and limitations.
Certified Training:
IBP training [Gowal et al., 2019], CROWN training [Zhang et al., 2020], and GPU-accelerated methods [Xu et al., 2020] show how verification integrates with training. TRADES [Zhang et al., 2019] balances natural and robust accuracy.
Related Topics:
For mathematical foundations of the verification problem, see The Verification Problem: Mathematical Formulation. For understanding soundness and completeness, see Soundness and Completeness. For complexity barriers, see Theoretical Barriers in Neural Network Verification. For practical method selection, see Robustness Testing Guide.
Next Guide
Continue to Theoretical Barriers in Neural Network Verification to understand the fundamental mathematical limits and complexity barriers in neural network verification.
Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, 2017.
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.
Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.
Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.
Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.
Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.
Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.
Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.
Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.
Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.
Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802–10813. 2018.
Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.
Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.
Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.
Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR, 2019.
Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.
Comments & Discussion