Verification Taxonomy: A Systematic View

Neural network verification encompasses dozens of methods---SMT solvers, bound propagation, randomized smoothing, and more. Each method makes different tradeoffs between tightness, speed, and scalability. Understanding how these methods relate helps you choose the right tool for your application.

This guide presents a systematic taxonomy of neural network verification methods, organizing them along key dimensions: complete vs incomplete, deterministic vs probabilistic, and bound propagation vs optimization-based. By understanding this landscape, you can navigate verification research and select methods intelligently.

Why Taxonomy Matters

The verification landscape is vast: Researchers have developed numerous approaches, each with strengths and limitations. Without organization, this diversity is overwhelming.

Taxonomy provides structure: By categorizing methods along fundamental axes, we can:

Understand relationships: How does CROWN relate to DeepPoly? Both are incomplete bound propagation, but use different abstract domains.
Identify gaps: What combinations of properties are unexplored? (e.g., complete probabilistic verification)
Guide selection: Given requirements (speed, tightness, scalability), which category fits best?
Predict performance: Methods in the same category often have similar characteristics.

Taxonomy Benefits

A well-organized taxonomy helps:

Researchers: Identify open problems and unexplored method combinations

Practitioners: Choose verification methods matching their constraints

Students: Build mental models of the verification landscape

Tool builders: Position new methods relative to existing work

Primary Classification Axes

Axis 1: Complete vs Incomplete

The most fundamental distinction: does the method always give definitive answers?

Complete verification:

Definition: Always returns VERIFIED or FALSIFIED (with counterexample)
Never returns: “Unknown”
Guarantee: If it says verified, the property holds; if falsified, here’s proof
Cost: Exponential worst-case complexity
Examples: SMT (Marabou), MILP, branch-and-bound

Incomplete verification:

Definition: May return VERIFIED, FALSIFIED, or UNKNOWN
Guarantee: If verified, property holds (sound). If unknown, property might hold or not.
Cost: Polynomial-time complexity
Examples: IBP, CROWN, DeepPoly, LP relaxation, SDP

Property	Complete	Incomplete
Possible answers	Verified, Falsified	Verified, Falsified, Unknown
Soundness	Yes (always correct)	Yes (verified implies true)
Completeness	Yes (always decides)	No (may return unknown)
Time complexity	Exponential (worst-case)	Polynomial
Scalability	Small networks (~1000 neurons)	Large networks (millions)

Why incomplete methods dominate: The NP-completeness of verification means complete methods face fundamental scalability barriers. For practical large-scale verification, incomplete methods are necessary.

Axis 2: Deterministic vs Probabilistic

How are guarantees provided?

Deterministic verification:

Guarantee: Definite bounds on outputs for all inputs in region
Example: “For all $x$ in $\epsilon$ -ball, output stays in class A”
Methods: All bound propagation, SMT, MILP, LP/SDP
Advantage: No uncertainty in guarantee
Limitation: Conservative (may be loose)

Probabilistic verification:

Guarantee: Probabilistic bound with confidence level
Example: “With 99% confidence, for all $x$ in radius $r$ , output stays in class A”
Methods: Randomized smoothing
Advantage: Tighter bounds than deterministic for same cost
Limitation: Probabilistic (not absolute certainty)

Deterministic vs Probabilistic

Deterministic:

Absolute guarantee

May be conservative (loose bounds)

Required for safety-critical systems needing provable guarantees

Probabilistic:

Statistical guarantee with confidence level

Often tighter for same computational cost

Acceptable when probabilistic guarantees suffice

Most verification methods are deterministic. Randomized smoothing is the primary probabilistic approach, offering better scalability at the cost of statistical rather than absolute guarantees.

Axis 3: Bound Propagation vs Optimization

How are bounds computed?

Bound propagation:

Approach: Propagate bounds layer-by-layer through network
Mechanism: Abstract interpretation, interval arithmetic, linear relaxations
Complexity: O(network size) typically
Examples: IBP, CROWN, DeepPoly
Advantage: Fast, scales well
Limitation: May be loose (accumulates approximation errors)

Optimization-based:

Approach: Formulate bound computation as optimization problem
Mechanism: LP, SDP, SMT, MILP
Complexity: O(n^3) for LP/SDP, exponential for SMT/MILP
Examples: LP relaxation, SDP, Marabou
Advantage: Can be tighter (explicit optimization)
Limitation: Slower (solving optimization expensive)

Hybrid: Branch-and-bound:

Uses bound propagation for initial bounds
Branches (partitions input space) when bounds too loose
Applies bound propagation to each partition
Combines speed of propagation with refinement of partitioning

Verification Method Taxonomy

Combining these axes yields a taxonomy of verification approaches:

Complete Methods

1. SMT-Based

Approach: Encode network as logical constraints, solve with SMT
Example: Marabou/Reluplex
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Hundreds to ~1000 neurons
Best for: Small networks requiring definitive answers

2. MILP-Based

Approach: Encode as mixed-integer linear program
Example: Gurobi/CPLEX formulations
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Similar to SMT
Best for: Finding minimum perturbations, optimization objectives

3. Branch-and-Bound

Approach: Partition input space, verify each partition with incomplete methods
Example: alpha-beta-CROWN with branching
Tightness: Exact (complete) if given sufficient time
Speed: Moderate to slow (depends on branching)
Scalability: Larger than SMT/MILP (tens of thousands of neurons)
Best for: When incomplete methods return “unknown” but complete needed

Incomplete Methods: Bound Propagation

1. Interval Bound Propagation (IBP)

Approach: Propagate interval bounds layer-by-layer
Tightness: Loosest
Speed: Fastest (O(network size))
Scalability: Millions of neurons
Best for: Rapid screening, certified training

2. Linear Bound Propagation (CROWN)

Approach: Backward linear bound propagation with optimized coefficients
Tightness: Moderate to tight
Speed: Fast (O(network size))
Scalability: Millions of neurons
Best for: General verification, production use

3. Abstract Interpretation (DeepPoly)

Approach: Abstract interpretation with polyhedral domains
Tightness: Tight
Speed: Fast to moderate
Scalability: Large networks
Best for: When tighter than CROWN needed, faster than LP

4. Multi-Neuron Relaxation (PRIMA)

Approach: Consider multiple neurons jointly via product terms
Tightness: Very tight
Speed: Moderate (O(network size + k pairs))
Scalability: Thousands to tens of thousands
Best for: When single-neuron bounds too loose

Incomplete Methods: Optimization-Based

1. Linear Programming

Approach: Encode network as LP, solve for bounds
Tightness: Moderate to tight
Speed: Moderate (O(n^3) LP solving)
Scalability: Thousands of neurons
Best for: When explicit optimization formulation needed

2. Semi-Definite Programming

Approach: Encode as SDP with lifted quadratic variables
Tightness: Very tight
Speed: Slow (O(n^4.5))
Scalability: Hundreds to ~1000 neurons
Best for: Small networks needing very tight bounds

3. Lipschitz-Based

Approach: Bound outputs via network Lipschitz constant
Tightness: Loose (product bound) to tight (LipSDP)
Speed: Fast (product) to slow (SDP)
Scalability: Depends on method
Best for: Networks with controlled Lipschitz constants

Probabilistic Methods

Randomized Smoothing

Approach: Smooth classifier with Gaussian noise, statistical certification
Tightness: Tight (often tightest certified radius)
Speed: Slow (requires many samples)
Scalability: Excellent (works for any network size)
Best for: Large networks where deterministic verification intractable

Tightness-Speed-Scalability Tradeoffs

Every verification method navigates a three-way tradeoff:

The Tradeoff Triangle

                    Tightness
                       /|\
                      / | \
                     /  |  \
                    /   |   \
          Complete /    |    \ SDP/PRIMA
           (SMT)  /     |     \
                 /      |      \
                /   Branch&     \
               /    Bound        \
              /        |          \
             /         |           \
            /    CROWN/DeepPoly     \
           /           |             \
          /            |              \
         /             |               \
    Speed -------------|---------------- Scalability
         \             |               /
          \           IBP             /
           \           |             /
            \          |            /
             \   Randomized        /
              \   Smoothing       /
               \        |        /
                \       |       /
                 \      |      /
                  \     |     /
                   \    |    /
                    \   |   /
                     \  |  /
                      \ | /
                       \|/

You can optimize for at most two:

Tight + Fast: Impossible for large networks (NP-completeness barrier)
Tight + Scalable: Randomized smoothing (probabilistic)
Fast + Scalable: IBP, CROWN (loose to moderate tightness)

Middle ground methods (CROWN, DeepPoly, PRIMA): Balance all three reasonably well.

Quantitative Comparison

Method	Tightness	Time Complexity	Max Network Size	Certified CIFAR-10 (epsilon=8/255)
Marabou (SMT)	Exact	Exponential	~1K neurons	N/A (too slow)
MILP	Exact	Exponential	~1K neurons	N/A (too slow)
Branch&Bound	Exact	Exp (avg better)	~100K neurons	~65%
IBP	Loose	O(n)	Millions	~35%
CROWN	Moderate	O(n)	Millions	~55%
DeepPoly	Tight	O(n)	Millions	~60%
PRIMA	Very Tight	O(n + k)	~100K	~62%
LP	Moderate	O(n^3)	~10K	~58%
SDP	Very Tight	O(n^4.5)	~1K	~63%
LipSDP	Tight	O(n^3+)	~10K	~50%
Rand. Smooth	Tight (prob)	O(samples)	Unlimited	~70%

Decision Guide: Choosing a Method

Given your requirements, which method should you use?

By Use Case

Safety-Critical Application (need absolute guarantees):

Try incomplete methods (CROWN, DeepPoly) first
If unknown, use branch-and-bound
If still too slow, use Marabou for small networks
If network too large, redesign network to be smaller/easier to verify

Production Deployment (need fast verification):

Use CROWN or DeepPoly
If too loose, try PRIMA
If still insufficient, use randomized smoothing
Accept “unknown” results gracefully

Research / Benchmarking (need tightest bounds):

Try multi-neuron relaxations (PRIMA)
For small networks, use SDP
For completeness, branch-and-bound
Compare multiple methods to understand tradeoffs

Large-Scale Networks (millions of parameters):

Use IBP or CROWN
If certification needed, use randomized smoothing
Accept that complete verification is intractable
Focus on certified training instead

By Network Size

Small (fewer than 1000 neurons):

Best: Marabou, SDP, branch-and-bound
Rationale: Can afford complete/expensive methods

Medium (1K-100K neurons):

Best: CROWN, DeepPoly, PRIMA
Rationale: Need balance between tightness and speed

Large (>100K neurons):

Best: IBP, CROWN, randomized smoothing
Rationale: Only scalable methods work

By Tightness Requirement

Need exact answers:

Use complete methods: Marabou, branch-and-bound

Need tight bounds, can accept “unknown”:

Use PRIMA, SDP, or alpha-beta-CROWN

Moderate bounds acceptable:

Use CROWN or DeepPoly

Coarse bounds sufficient:

Use IBP for maximum speed

Evolution of Verification Methods

Understanding the historical progression clarifies why current methods exist:

2017: Complete methods emerge

Reluplex/Marabou: First practical SMT-based verifier
Proved NP-completeness of verification
Established fundamental complexity barriers

2018: Fast incomplete methods

CROWN, Fast-Lin: Linear bound propagation
Polynomial-time, scalable to large networks
Trade completeness for speed

2019: Certified training

IBP training: First scalable certified training
TRADES: Accuracy-robustness tradeoff
Shift from verification-only to training integration

2020: Tighter incomplete methods

alpha-beta-CROWN: Optimized bound parameters
GPU acceleration: Massive speedups
Improved certified accuracy on benchmarks

2021-2022: Multi-neuron and hybrid

PRIMA: Multi-neuron relaxations
beta-CROWN: Branch-and-bound with tight bounds
Complete verification at larger scale

Current frontier:

Scaling to vision transformers and larger architectures
Tighter relaxations without sacrificing speed
Integration with foundation models

Remaining Open Problems

Despite progress, fundamental challenges remain:

Scalability-Tightness Gap --- The gap between what we can verify tightly (thousands of neurons) and what we want to verify (billions of parameters in LLMs) remains enormous.

Complete Verification Barrier --- NP-completeness means complete verification will always face exponential worst-case. Better heuristics help but don’t eliminate this barrier.

Non-ReLU Activations --- Most methods assume ReLU. Modern architectures use GeLU, SiLU, attention---extending verification to these is challenging.

Non- $\ell_p$ Perturbations --- Most verification focuses on $\ell_p$ norms. Semantic perturbations, physical adversarial examples require different approaches.

Certified Accuracy Gap --- Even state-of-the-art achieves ~70% certified accuracy on CIFAR-10, far below 95%+ clean accuracy. Closing this gap remains elusive.

Final Thoughts

The taxonomy of neural network verification reveals a rich landscape: complete methods providing definitive answers at high cost, incomplete methods scaling efficiently with partial guarantees, and probabilistic methods offering alternative tradeoffs.

Understanding this taxonomy helps you navigate the verification literature, choose methods appropriately, and appreciate the fundamental tradeoffs shaping the field. There’s no single “best” verification method---only methods suited to different requirements, constraints, and applications.

As verification research advances, the taxonomy will evolve. New methods may occupy unexplored regions (e.g., probabilistic complete verification, fast tight bounds). But the fundamental axes---completeness, determinism, tightness-speed-scalability---will likely persist, providing enduring structure to organize verification approaches.