Verification Taxonomy: A Systematic View

Verification Taxonomy: A Systematic View¶

January 2026 NNV Guide 9 min read 1,727 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks. It presents a comprehensive taxonomy of verification methods, including scalability measures and tightness rankings.

Foundations Taxonomy Classification

Neural network verification encompasses dozens of methods—SMT solvers [Katz et al., 2017, Katz et al., 2019], bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018], randomized smoothing [Cohen et al., 2019], and more. Each method makes different tradeoffs between tightness, speed, and scalability. Understanding how these methods relate helps you choose the right tool for your application.

This guide presents a systematic taxonomy of neural network verification methods, organizing them along key dimensions: complete vs incomplete, deterministic vs probabilistic, and bound propagation vs optimization-based. By understanding this landscape, you can navigate verification research and select methods intelligently.

Why Taxonomy Matters¶

The verification landscape is vast: Researchers have developed numerous approaches, each with strengths and limitations. Without organization, this diversity is overwhelming.

Taxonomy provides structure: By categorizing methods along fundamental axes, we can:

Understand relationships: How does CROWN relate to DeepPoly? Both are incomplete bound propagation, but use different abstract domains.
Identify gaps: What combinations of properties are unexplored? (e.g., complete probabilistic verification)
Guide selection: Given requirements (speed, tightness, scalability), which category fits best?
Predict performance: Methods in the same category often have similar characteristics.

Taxonomy Benefits

A well-organized taxonomy helps:

Researchers: Identify open problems and unexplored method combinations
Practitioners: Choose verification methods matching their constraints
Students: Build mental models of the verification landscape
Tool builders: Position new methods relative to existing work

Primary Classification Axes¶

Axis 1: Complete vs Incomplete¶

The most fundamental distinction: does the method always give definitive answers?

Complete verification [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019]:

Definition: Always returns VERIFIED or FALSIFIED (with counterexample)
Never returns: “Unknown”
Guarantee: If it says verified, the property holds; if falsified, here’s proof
Cost: Exponential worst-case complexity
Examples: SMT (Marabou), MILP, branch-and-bound

Incomplete verification [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:

Definition: May return VERIFIED, FALSIFIED, or UNKNOWN
Guarantee: If verified, property holds (sound). If unknown, property might hold or not.
Cost: Polynomial-time complexity
Examples: IBP, CROWN, DeepPoly, LP relaxation, SDP

Table 15 Complete vs Incomplete¶
Property	Complete	Incomplete
Possible answers	Verified, Falsified	Verified, Falsified, Unknown
Soundness	Yes (always correct)	Yes (verified → true)
Completeness	Yes (always decides)	No (may return unknown)
Time complexity	Exponential (worst-case)	Polynomial
Scalability	Small networks (~1000 neurons)	Large networks (millions)

Why incomplete methods dominate: The NP-completeness [Katz et al., 2017, Weng et al., 2018] of verification means complete methods face fundamental scalability barriers. For practical large-scale verification, incomplete methods are necessary.

Axis 2: Deterministic vs Probabilistic¶

How are guarantees provided?

Deterministic verification [Weng et al., 2018, Zhang et al., 2018, Katz et al., 2019]:

Guarantee: Definite bounds on outputs for all inputs in region
Example: “For all \(x\) in \(\epsilon\)-ball, output stays in class A”
Methods: All bound propagation, SMT, MILP, LP/SDP
Advantage: No uncertainty in guarantee
Limitation: Conservative (may be loose)

Probabilistic verification [Cohen et al., 2019]:

Guarantee: Probabilistic bound with confidence level
Example: “With 99% confidence, for all \(x\) in radius \(r\), output stays in class A”
Methods: Randomized smoothing
Advantage: Tighter bounds than deterministic for same cost
Limitation: Probabilistic (not absolute certainty)

Deterministic vs Probabilistic

Deterministic:

Absolute guarantee
May be conservative (loose bounds)
Required for safety-critical systems needing provable guarantees

Probabilistic:

Statistical guarantee with confidence level
Often tighter for same computational cost
Acceptable when probabilistic guarantees suffice

Most verification methods are deterministic. Randomized smoothing [Cohen et al., 2019] is the primary probabilistic approach, offering better scalability at the cost of statistical rather than absolute guarantees.

Axis 3: Bound Propagation vs Optimization¶

How are bounds computed?

Bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:

Approach: Propagate bounds layer-by-layer through network
Mechanism: Abstract interpretation, interval arithmetic, linear relaxations
Complexity: O(network size) typically
Examples: IBP, CROWN, DeepPoly
Advantage: Fast, scales well
Limitation: May be loose (accumulates approximation errors)

Optimization-based [Raghunathan et al., 2018, Katz et al., 2019, Salman et al., 2019]:

Approach: Formulate bound computation as optimization problem
Mechanism: LP, SDP, SMT, MILP
Complexity: O(n³) for LP/SDP, exponential for SMT/MILP
Examples: LP relaxation, SDP, Marabou
Advantage: Can be tighter (explicit optimization)
Limitation: Slower (solving optimization expensive)

Hybrid: Branch-and-bound [Wang et al., 2021, Zhang et al., 2022]:

Uses bound propagation for initial bounds
Branches (partitions input space) when bounds too loose
Applies bound propagation to each partition
Combines speed of propagation with refinement of partitioning

Verification Method Taxonomy¶

Combining these axes yields a taxonomy of verification approaches:

Complete Methods¶

1. SMT-Based [Katz et al., 2017, Katz et al., 2019]

Approach: Encode network as logical constraints, solve with SMT
Example: Marabou/Reluplex
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Hundreds to ~1000 neurons
Best for: Small networks requiring definitive answers

2. MILP-Based [Tjeng et al., 2019, Cheng et al., 2017]

Approach: Encode as mixed-integer linear program
Example: Gurobi/CPLEX formulations
Tightness: Exact (complete)
Speed: Slow (exponential worst-case)
Scalability: Similar to SMT
Best for: Finding minimum perturbations, optimization objectives

3. Branch-and-Bound [Wang et al., 2021, Zhang et al., 2022]

Approach: Partition input space, verify each partition with incomplete methods
Example: α-β-CROWN with branching
Tightness: Exact (complete) if given sufficient time
Speed: Moderate to slow (depends on branching)
Scalability: Larger than SMT/MILP (tens of thousands of neurons)
Best for: When incomplete methods return “unknown” but complete needed

Incomplete Methods: Bound Propagation¶

1. Interval Bound Propagation (IBP) [Gowal et al., 2019]

Approach: Propagate interval bounds layer-by-layer
Tightness: Loosest
Speed: Fastest (O(network size))
Scalability: Millions of neurons
Best for: Rapid screening, certified training

2. Linear Bound Propagation (CROWN) [Weng et al., 2018, Zhang et al., 2018]

Approach: Backward linear bound propagation with optimized coefficients
Tightness: Moderate to tight
Speed: Fast (O(network size))
Scalability: Millions of neurons
Best for: General verification, production use

3. Abstract Interpretation (DeepPoly) [Singh et al., 2019]

Approach: Abstract interpretation with polyhedral domains
Tightness: Tight
Speed: Fast to moderate
Scalability: Large networks
Best for: When tighter than CROWN needed, faster than LP

4. Multi-Neuron Relaxation (PRIMA) [Müller et al., 2022]

Approach: Consider multiple neurons jointly via product terms
Tightness: Very tight
Speed: Moderate (O(network size + k pairs))
Scalability: Thousands to tens of thousands
Best for: When single-neuron bounds too loose

Incomplete Methods: Optimization-Based¶

1. Linear Programming [Salman et al., 2019]

Approach: Encode network as LP, solve for bounds
Tightness: Moderate to tight
Speed: Moderate (O(n³) LP solving)
Scalability: Thousands of neurons
Best for: When explicit optimization formulation needed

2. Semi-Definite Programming [Raghunathan et al., 2018]

Approach: Encode as SDP with lifted quadratic variables
Tightness: Very tight
Speed: Slow (O(n^4.5))
Scalability: Hundreds to ~1000 neurons
Best for: Small networks needing very tight bounds

3. Lipschitz-Based [Fazlyab et al., 2019]

Approach: Bound outputs via network Lipschitz constant
Tightness: Loose (product bound) to tight (LipSDP)
Speed: Fast (product) to slow (SDP)
Scalability: Depends on method
Best for: Networks with controlled Lipschitz constants

Probabilistic Methods¶

Randomized Smoothing [Cohen et al., 2019]

Approach: Smooth classifier with Gaussian noise, statistical certification
Tightness: Tight (often tightest certified radius)
Speed: Slow (requires many samples)
Scalability: Excellent (works for any network size)
Best for: Large networks where deterministic verification intractable

Tightness-Speed-Scalability Tradeoffs¶

Every verification method navigates a three-way tradeoff:

The Tradeoff Triangle¶

                Tightness
                   △
                  /│\
                 / │ \
                /  │  \
               /   │   \
      Complete /    │    \ SDP/PRIMA
       (SMT)  /     │     \
             /      │      \
            /   Branch&     \
           /    Bound        \
          /        │          \
         /         │           \
        /    CROWN/DeepPoly     \
       /           │             \
      /            │              \
     /             │               \
Speed ◁────────────┼────────────────▷ Scalability
     \             │               /
      \           IBP             /
       \           │             /
        \          │            /
         \   Randomized        /
          \   Smoothing       /
           \        │        /
            \       │       /
             \      │      /
              \     │     /
               \    │    /
                \   │   /
                 \  │  /
                  \ │ /
                   \│/
                    ▽

You can optimize for at most two:

Tight + Fast: Impossible for large networks (NP-completeness barrier)
Tight + Scalable: Randomized smoothing (probabilistic)
Fast + Scalable: IBP, CROWN (loose to moderate tightness)

Middle ground methods (CROWN, DeepPoly, PRIMA): Balance all three reasonably well.

Quantitative Comparison¶

Table 16 Method Characteristics¶
Method	Tightness	Time Complexity	Max Network Size	Certified CIFAR-10 (ε=8/255)
Marabou (SMT)	Exact	Exponential	~1K neurons	N/A (too slow)
MILP	Exact	Exponential	~1K neurons	N/A (too slow)
Branch&Bound	Exact	Exp (avg better)	~100K neurons	~65%
IBP	Loose	O(n)	Millions	~35%
CROWN	Moderate	O(n)	Millions	~55%
DeepPoly	Tight	O(n)	Millions	~60%
PRIMA	Very Tight	O(n + k)	~100K	~62%
LP	Moderate	O(n³)	~10K	~58%
SDP	Very Tight	O(n^4.5)	~1K	~63%
LipSDP	Tight	O(n^3+)	~10K	~50%
Rand. Smooth	Tight (prob)	O(samples)	Unlimited	~70%

Decision Guide: Choosing a Method¶

Given your requirements, which method should you use?

By Use Case¶

Safety-Critical Application (need absolute guarantees):

Try incomplete methods (CROWN, DeepPoly) first
If unknown, use branch-and-bound [Wang et al., 2021]
If still too slow, use Marabou [Katz et al., 2019] for small networks
If network too large, redesign network to be smaller/easier to verify

Production Deployment (need fast verification):

Use CROWN [Zhang et al., 2018] or DeepPoly [Singh et al., 2019]
If too loose, try PRIMA [Müller et al., 2022]
If still insufficient, use randomized smoothing [Cohen et al., 2019]
Accept “unknown” results gracefully

Research / Benchmarking (need tightest bounds):

Try multi-neuron relaxations (PRIMA) [Müller et al., 2022]
For small networks, use SDP [Raghunathan et al., 2018]
For completeness, branch-and-bound [Wang et al., 2021, Zhang et al., 2022]
Compare multiple methods to understand tradeoffs

Large-Scale Networks (millions of parameters):

Use IBP [Gowal et al., 2019] or CROWN [Zhang et al., 2018]
If certification needed, use randomized smoothing [Cohen et al., 2019]
Accept that complete verification is intractable
Focus on certified training [Gowal et al., 2019, Zhang et al., 2020] instead

By Network Size¶

Small (<1000 neurons):

Best: Marabou [Katz et al., 2019], SDP [Raghunathan et al., 2018], branch-and-bound
Rationale: Can afford complete/expensive methods

Medium (1K-100K neurons):

Best: CROWN [Zhang et al., 2018], DeepPoly [Singh et al., 2019], PRIMA [Müller et al., 2022]
Rationale: Need balance between tightness and speed

Large (>100K neurons):

Best: IBP [Gowal et al., 2019], CROWN [Zhang et al., 2018], randomized smoothing [Cohen et al., 2019]
Rationale: Only scalable methods work

By Tightness Requirement¶

Need exact answers:

Use complete methods: Marabou [Katz et al., 2019], branch-and-bound [Wang et al., 2021]

Need tight bounds, can accept “unknown”:

Use PRIMA [Müller et al., 2022], SDP [Raghunathan et al., 2018], or α-β-CROWN

Moderate bounds acceptable:

Use CROWN [Zhang et al., 2018] or DeepPoly [Singh et al., 2019]

Coarse bounds sufficient:

Use IBP [Gowal et al., 2019] for maximum speed

Evolution of Verification Methods¶

Understanding the historical progression clarifies why current methods exist:

2017: Complete methods emerge [Katz et al., 2017]

Reluplex/Marabou: First practical SMT-based verifier
Proved NP-completeness of verification
Established fundamental complexity barriers

2018: Fast incomplete methods [Weng et al., 2018, Zhang et al., 2018, Singh et al., 2018]

CROWN, Fast-Lin: Linear bound propagation
Polynomial-time, scalable to large networks
Trade completeness for speed

2019: Certified training [Gowal et al., 2019, Zhang et al., 2019]

IBP training: First scalable certified training
TRADES: Accuracy-robustness tradeoff
Shift from verification-only to training integration

2020: Tighter incomplete methods [Xu et al., 2020, Zhang et al., 2020]

α-β-CROWN: Optimized bound parameters
GPU acceleration: Massive speedups
Improved certified accuracy on benchmarks

2021-2022: Multi-neuron and hybrid [Müller et al., 2022, Wang et al., 2021, Zhang et al., 2022]

PRIMA: Multi-neuron relaxations
β-CROWN: Branch-and-bound with tight bounds
Complete verification at larger scale

Current frontier:

Scaling to vision transformers and larger architectures
Tighter relaxations without sacrificing speed
Integration with foundation models

Remaining Open Problems¶

Despite progress, fundamental challenges remain:

Scalability-Tightness Gap The gap between what we can verify tightly (thousands of neurons) and what we want to verify (billions of parameters in LLMs) remains enormous.

Complete Verification Barrier NP-completeness [Katz et al., 2017, Weng et al., 2018] means complete verification will always face exponential worst-case. Better heuristics help but don’t eliminate this barrier.

Non-ReLU Activations Most methods assume ReLU. Modern architectures use GeLU, SiLU, attention—extending verification to these is challenging.

Non-\(\ell_p\) Perturbations

Most verification focuses on \(\ell_p\) norms. Semantic perturbations, physical adversarial examples require different approaches.

Certified Accuracy Gap Even state-of-the-art achieves ~70% certified accuracy on CIFAR-10, far below 95%+ clean accuracy. Closing this gap remains elusive.

Final Thoughts¶

The taxonomy of neural network verification reveals a rich landscape: complete methods [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019] providing definitive answers at high cost, incomplete methods [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018] scaling efficiently with partial guarantees, and probabilistic methods [Cohen et al., 2019] offering alternative tradeoffs.

Understanding this taxonomy helps you navigate the verification literature, choose methods appropriately, and appreciate the fundamental tradeoffs shaping the field. There’s no single “best” verification method—only methods suited to different requirements, constraints, and applications.

As verification research advances, the taxonomy will evolve. New methods may occupy unexplored regions (e.g., probabilistic complete verification, fast tight bounds). But the fundamental axes—completeness, determinism, tightness-speed-scalability—will likely persist, providing enduring structure to organize verification approaches.

Next Guide

Continue to Theoretical Barriers in Neural Network Verification to understand the fundamental mathematical limits and complexity barriers in neural network verification.

[1] (1,2)

Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, 2017.

[2] (1,2,3,4,5,6,7,8,9)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[3] (1,2)

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.

[4] (1,2,3,4,5,6,7,8,9,10,11,12)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[5] (1,2,3,4,5,6,7,8,9)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[6] (1,2,3,4,5,6,7,8,9,10)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[7] (1,2,3,4,5,6,7)

Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.

[8] (1,2,3,4,5,6)

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[9] (1,2,3)

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[10] (1,2,3,4,5,6,7,8,9)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[11]

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802–10813. 2018.

[12] (1,2,3,4)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[13] (1,2,3,4,5,6,7)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[14] (1,2,3,4,5,6,7,8,9,10,11)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[15] (1,2)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[16] (1,2)

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR, 2019.

[17] (1,2,3)

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.

[18] (1,2,3,4,5)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[19] (1,2,3,4,5,6,7,8,9,10,11,12,13)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.