Verification Taxonomy: A Systematic View
A comprehensive taxonomy of verification methods, including scalability measures and tightness rankings.
Neural network verification encompasses dozens of methods---SMT solvers, bound propagation, randomized smoothing, and more. Each method makes different tradeoffs between tightness, speed, and scalability. Understanding how these methods relate helps you choose the right tool for your application.
This guide presents a systematic taxonomy of neural network verification methods, organizing them along key dimensions: complete vs incomplete, deterministic vs probabilistic, and bound propagation vs optimization-based. By understanding this landscape, you can navigate verification research and select methods intelligently.
Why Taxonomy Matters
The verification landscape is vast: Researchers have developed numerous approaches, each with strengths and limitations. Without organization, this diversity is overwhelming.
Taxonomy provides structure: By categorizing methods along fundamental axes, we can:
- Understand relationships: How does CROWN relate to DeepPoly? Both are incomplete bound propagation, but use different abstract domains.
- Identify gaps: What combinations of properties are unexplored? (e.g., complete probabilistic verification)
- Guide selection: Given requirements (speed, tightness, scalability), which category fits best?
- Predict performance: Methods in the same category often have similar characteristics.
Taxonomy Benefits
A well-organized taxonomy helps:
- Researchers: Identify open problems and unexplored method combinations
- Practitioners: Choose verification methods matching their constraints
- Students: Build mental models of the verification landscape
- Tool builders: Position new methods relative to existing work
Primary Classification Axes
Axis 1: Complete vs Incomplete
The most fundamental distinction: does the method always give definitive answers?
Complete verification:
- Definition: Always returns VERIFIED or FALSIFIED (with counterexample)
- Never returns: “Unknown”
- Guarantee: If it says verified, the property holds; if falsified, here’s proof
- Cost: Exponential worst-case complexity
- Examples: SMT (Marabou), MILP, branch-and-bound
Incomplete verification:
- Definition: May return VERIFIED, FALSIFIED, or UNKNOWN
- Guarantee: If verified, property holds (sound). If unknown, property might hold or not.
- Cost: Polynomial-time complexity
- Examples: IBP, CROWN, DeepPoly, LP relaxation, SDP
| Property | Complete | Incomplete |
|---|---|---|
| Possible answers | Verified, Falsified | Verified, Falsified, Unknown |
| Soundness | Yes (always correct) | Yes (verified implies true) |
| Completeness | Yes (always decides) | No (may return unknown) |
| Time complexity | Exponential (worst-case) | Polynomial |
| Scalability | Small networks (~1000 neurons) | Large networks (millions) |
Why incomplete methods dominate: The NP-completeness of verification means complete methods face fundamental scalability barriers. For practical large-scale verification, incomplete methods are necessary.
Axis 2: Deterministic vs Probabilistic
How are guarantees provided?
Deterministic verification:
- Guarantee: Definite bounds on outputs for all inputs in region
- Example: “For all in -ball, output stays in class A”
- Methods: All bound propagation, SMT, MILP, LP/SDP
- Advantage: No uncertainty in guarantee
- Limitation: Conservative (may be loose)
Probabilistic verification:
- Guarantee: Probabilistic bound with confidence level
- Example: “With 99% confidence, for all in radius , output stays in class A”
- Methods: Randomized smoothing
- Advantage: Tighter bounds than deterministic for same cost
- Limitation: Probabilistic (not absolute certainty)
Deterministic vs Probabilistic
Deterministic:
- Absolute guarantee
- May be conservative (loose bounds)
- Required for safety-critical systems needing provable guarantees
Probabilistic:
- Statistical guarantee with confidence level
- Often tighter for same computational cost
- Acceptable when probabilistic guarantees suffice
Most verification methods are deterministic. Randomized smoothing is the primary probabilistic approach, offering better scalability at the cost of statistical rather than absolute guarantees.
Axis 3: Bound Propagation vs Optimization
How are bounds computed?
Bound propagation:
- Approach: Propagate bounds layer-by-layer through network
- Mechanism: Abstract interpretation, interval arithmetic, linear relaxations
- Complexity: O(network size) typically
- Examples: IBP, CROWN, DeepPoly
- Advantage: Fast, scales well
- Limitation: May be loose (accumulates approximation errors)
Optimization-based:
- Approach: Formulate bound computation as optimization problem
- Mechanism: LP, SDP, SMT, MILP
- Complexity: O(n^3) for LP/SDP, exponential for SMT/MILP
- Examples: LP relaxation, SDP, Marabou
- Advantage: Can be tighter (explicit optimization)
- Limitation: Slower (solving optimization expensive)
Hybrid: Branch-and-bound:
- Uses bound propagation for initial bounds
- Branches (partitions input space) when bounds too loose
- Applies bound propagation to each partition
- Combines speed of propagation with refinement of partitioning
Verification Method Taxonomy
Combining these axes yields a taxonomy of verification approaches:
Complete Methods
1. SMT-Based
- Approach: Encode network as logical constraints, solve with SMT
- Example: Marabou/Reluplex
- Tightness: Exact (complete)
- Speed: Slow (exponential worst-case)
- Scalability: Hundreds to ~1000 neurons
- Best for: Small networks requiring definitive answers
2. MILP-Based
- Approach: Encode as mixed-integer linear program
- Example: Gurobi/CPLEX formulations
- Tightness: Exact (complete)
- Speed: Slow (exponential worst-case)
- Scalability: Similar to SMT
- Best for: Finding minimum perturbations, optimization objectives
3. Branch-and-Bound
- Approach: Partition input space, verify each partition with incomplete methods
- Example: alpha-beta-CROWN with branching
- Tightness: Exact (complete) if given sufficient time
- Speed: Moderate to slow (depends on branching)
- Scalability: Larger than SMT/MILP (tens of thousands of neurons)
- Best for: When incomplete methods return “unknown” but complete needed
Incomplete Methods: Bound Propagation
1. Interval Bound Propagation (IBP)
- Approach: Propagate interval bounds layer-by-layer
- Tightness: Loosest
- Speed: Fastest (O(network size))
- Scalability: Millions of neurons
- Best for: Rapid screening, certified training
2. Linear Bound Propagation (CROWN)
- Approach: Backward linear bound propagation with optimized coefficients
- Tightness: Moderate to tight
- Speed: Fast (O(network size))
- Scalability: Millions of neurons
- Best for: General verification, production use
3. Abstract Interpretation (DeepPoly)
- Approach: Abstract interpretation with polyhedral domains
- Tightness: Tight
- Speed: Fast to moderate
- Scalability: Large networks
- Best for: When tighter than CROWN needed, faster than LP
4. Multi-Neuron Relaxation (PRIMA)
- Approach: Consider multiple neurons jointly via product terms
- Tightness: Very tight
- Speed: Moderate (O(network size + k pairs))
- Scalability: Thousands to tens of thousands
- Best for: When single-neuron bounds too loose
Incomplete Methods: Optimization-Based
1. Linear Programming
- Approach: Encode network as LP, solve for bounds
- Tightness: Moderate to tight
- Speed: Moderate (O(n^3) LP solving)
- Scalability: Thousands of neurons
- Best for: When explicit optimization formulation needed
2. Semi-Definite Programming
- Approach: Encode as SDP with lifted quadratic variables
- Tightness: Very tight
- Speed: Slow (O(n^4.5))
- Scalability: Hundreds to ~1000 neurons
- Best for: Small networks needing very tight bounds
3. Lipschitz-Based
- Approach: Bound outputs via network Lipschitz constant
- Tightness: Loose (product bound) to tight (LipSDP)
- Speed: Fast (product) to slow (SDP)
- Scalability: Depends on method
- Best for: Networks with controlled Lipschitz constants
Probabilistic Methods
Randomized Smoothing
- Approach: Smooth classifier with Gaussian noise, statistical certification
- Tightness: Tight (often tightest certified radius)
- Speed: Slow (requires many samples)
- Scalability: Excellent (works for any network size)
- Best for: Large networks where deterministic verification intractable
Tightness-Speed-Scalability Tradeoffs
Every verification method navigates a three-way tradeoff:
The Tradeoff Triangle
Tightness
/|\
/ | \
/ | \
/ | \
Complete / | \ SDP/PRIMA
(SMT) / | \
/ | \
/ Branch& \
/ Bound \
/ | \
/ | \
/ CROWN/DeepPoly \
/ | \
/ | \
/ | \
Speed -------------|---------------- Scalability
\ | /
\ IBP /
\ | /
\ | /
\ Randomized /
\ Smoothing /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\|/
You can optimize for at most two:
- Tight + Fast: Impossible for large networks (NP-completeness barrier)
- Tight + Scalable: Randomized smoothing (probabilistic)
- Fast + Scalable: IBP, CROWN (loose to moderate tightness)
Middle ground methods (CROWN, DeepPoly, PRIMA): Balance all three reasonably well.
Quantitative Comparison
| Method | Tightness | Time Complexity | Max Network Size | Certified CIFAR-10 (epsilon=8/255) |
|---|---|---|---|---|
| Marabou (SMT) | Exact | Exponential | ~1K neurons | N/A (too slow) |
| MILP | Exact | Exponential | ~1K neurons | N/A (too slow) |
| Branch&Bound | Exact | Exp (avg better) | ~100K neurons | ~65% |
| IBP | Loose | O(n) | Millions | ~35% |
| CROWN | Moderate | O(n) | Millions | ~55% |
| DeepPoly | Tight | O(n) | Millions | ~60% |
| PRIMA | Very Tight | O(n + k) | ~100K | ~62% |
| LP | Moderate | O(n^3) | ~10K | ~58% |
| SDP | Very Tight | O(n^4.5) | ~1K | ~63% |
| LipSDP | Tight | O(n^3+) | ~10K | ~50% |
| Rand. Smooth | Tight (prob) | O(samples) | Unlimited | ~70% |
Decision Guide: Choosing a Method
Given your requirements, which method should you use?
By Use Case
Safety-Critical Application (need absolute guarantees):
- Try incomplete methods (CROWN, DeepPoly) first
- If unknown, use branch-and-bound
- If still too slow, use Marabou for small networks
- If network too large, redesign network to be smaller/easier to verify
Production Deployment (need fast verification):
- Use CROWN or DeepPoly
- If too loose, try PRIMA
- If still insufficient, use randomized smoothing
- Accept “unknown” results gracefully
Research / Benchmarking (need tightest bounds):
- Try multi-neuron relaxations (PRIMA)
- For small networks, use SDP
- For completeness, branch-and-bound
- Compare multiple methods to understand tradeoffs
Large-Scale Networks (millions of parameters):
- Use IBP or CROWN
- If certification needed, use randomized smoothing
- Accept that complete verification is intractable
- Focus on certified training instead
By Network Size
Small (fewer than 1000 neurons):
- Best: Marabou, SDP, branch-and-bound
- Rationale: Can afford complete/expensive methods
Medium (1K-100K neurons):
- Best: CROWN, DeepPoly, PRIMA
- Rationale: Need balance between tightness and speed
Large (>100K neurons):
- Best: IBP, CROWN, randomized smoothing
- Rationale: Only scalable methods work
By Tightness Requirement
Need exact answers:
- Use complete methods: Marabou, branch-and-bound
Need tight bounds, can accept “unknown”:
- Use PRIMA, SDP, or alpha-beta-CROWN
Moderate bounds acceptable:
- Use CROWN or DeepPoly
Coarse bounds sufficient:
- Use IBP for maximum speed
Evolution of Verification Methods
Understanding the historical progression clarifies why current methods exist:
2017: Complete methods emerge
- Reluplex/Marabou: First practical SMT-based verifier
- Proved NP-completeness of verification
- Established fundamental complexity barriers
2018: Fast incomplete methods
- CROWN, Fast-Lin: Linear bound propagation
- Polynomial-time, scalable to large networks
- Trade completeness for speed
2019: Certified training
- IBP training: First scalable certified training
- TRADES: Accuracy-robustness tradeoff
- Shift from verification-only to training integration
2020: Tighter incomplete methods
- alpha-beta-CROWN: Optimized bound parameters
- GPU acceleration: Massive speedups
- Improved certified accuracy on benchmarks
2021-2022: Multi-neuron and hybrid
- PRIMA: Multi-neuron relaxations
- beta-CROWN: Branch-and-bound with tight bounds
- Complete verification at larger scale
Current frontier:
- Scaling to vision transformers and larger architectures
- Tighter relaxations without sacrificing speed
- Integration with foundation models
Remaining Open Problems
Despite progress, fundamental challenges remain:
Scalability-Tightness Gap --- The gap between what we can verify tightly (thousands of neurons) and what we want to verify (billions of parameters in LLMs) remains enormous.
Complete Verification Barrier --- NP-completeness means complete verification will always face exponential worst-case. Better heuristics help but don’t eliminate this barrier.
Non-ReLU Activations --- Most methods assume ReLU. Modern architectures use GeLU, SiLU, attention---extending verification to these is challenging.
Non- Perturbations --- Most verification focuses on norms. Semantic perturbations, physical adversarial examples require different approaches.
Certified Accuracy Gap --- Even state-of-the-art achieves ~70% certified accuracy on CIFAR-10, far below 95%+ clean accuracy. Closing this gap remains elusive.
Final Thoughts
The taxonomy of neural network verification reveals a rich landscape: complete methods providing definitive answers at high cost, incomplete methods scaling efficiently with partial guarantees, and probabilistic methods offering alternative tradeoffs.
Understanding this taxonomy helps you navigate the verification literature, choose methods appropriately, and appreciate the fundamental tradeoffs shaping the field. There’s no single “best” verification method---only methods suited to different requirements, constraints, and applications.
As verification research advances, the taxonomy will evolve. New methods may occupy unexplored regions (e.g., probabilistic complete verification, fast tight bounds). But the fundamental axes---completeness, determinism, tightness-speed-scalability---will likely persist, providing enduring structure to organize verification approaches.
Further Reading
This guide provides a systematic taxonomy of neural network verification methods. For readers interested in diving deeper, we recommend exploring specific methods within each category:
Complete Verification Methods:
SMT-based verification with Marabou pioneered complete verification, while MILP approaches offer alternative complete formulations. Branch-and-bound extends complete verification to larger networks through hybrid incomplete-complete strategies.
Incomplete Bound Propagation:
IBP provides the fastest bounds, CROWN balances speed and tightness, and DeepPoly offers tight abstract interpretation. Multi-neuron relaxations (PRIMA) capture correlations for even tighter bounds.
Optimization-Based Incomplete Methods:
LP relaxation and SDP provide tighter bounds through explicit optimization. Lipschitz-based methods offer alternative perspectives on robustness certification.
Probabilistic Certification:
Randomized smoothing represents the primary probabilistic approach, providing excellent scalability through statistical guarantees.
Theoretical Foundations:
The NP-completeness result establishes fundamental complexity barriers. Understanding this helps set realistic expectations about verification’s capabilities and limitations.
Certified Training:
IBP training, CROWN training, and GPU-accelerated methods show how verification integrates with training. TRADES balances natural and robust accuracy.
Related Topics:
For mathematical foundations of the verification problem, see The Verification Problem. For understanding soundness and completeness, see Soundness and Completeness. For complexity barriers, see Theoretical Barriers. For practical method selection, see the robustness testing guide.
Next Guide
Continue to Theoretical Barriers to understand the fundamental mathematical limits and complexity barriers in neural network verification.