Verification Taxonomy: A Systematic View

Neural network verification encompasses dozens of methods—SMT solvers [Katz et al., 2017, Katz et al., 2019], bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018], randomized smoothing [Cohen et al., 2019], and more. Each method makes different tradeoffs between tightness, speed, and scalability. Understanding how these methods relate helps you choose the right tool for your application.

This guide presents a systematic taxonomy of neural network verification methods, organizing them along key dimensions: complete vs incomplete, deterministic vs probabilistic, and bound propagation vs optimization-based. By understanding this landscape, you can navigate verification research and select methods intelligently.

Why Taxonomy Matters

The verification landscape is vast: Researchers have developed numerous approaches, each with strengths and limitations. Without organization, this diversity is overwhelming.

Taxonomy provides structure: By categorizing methods along fundamental axes, we can:

  • Understand relationships: How does CROWN relate to DeepPoly? Both are incomplete bound propagation, but use different abstract domains.

  • Identify gaps: What combinations of properties are unexplored? (e.g., complete probabilistic verification)

  • Guide selection: Given requirements (speed, tightness, scalability), which category fits best?

  • Predict performance: Methods in the same category often have similar characteristics.

Taxonomy Benefits

A well-organized taxonomy helps:

  • Researchers: Identify open problems and unexplored method combinations

  • Practitioners: Choose verification methods matching their constraints

  • Students: Build mental models of the verification landscape

  • Tool builders: Position new methods relative to existing work

Primary Classification Axes

Axis 1: Complete vs Incomplete

The most fundamental distinction: does the method always give definitive answers?

Complete verification [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019]:

  • Definition: Always returns VERIFIED or FALSIFIED (with counterexample)

  • Never returns: “Unknown”

  • Guarantee: If it says verified, the property holds; if falsified, here’s proof

  • Cost: Exponential worst-case complexity

  • Examples: SMT (Marabou), MILP, branch-and-bound

Incomplete verification [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:

  • Definition: May return VERIFIED, FALSIFIED, or UNKNOWN

  • Guarantee: If verified, property holds (sound). If unknown, property might hold or not.

  • Cost: Polynomial-time complexity

  • Examples: IBP, CROWN, DeepPoly, LP relaxation, SDP

Table 15 Complete vs Incomplete

Property

Complete

Incomplete

Possible answers

Verified, Falsified

Verified, Falsified, Unknown

Soundness

Yes (always correct)

Yes (verified → true)

Completeness

Yes (always decides)

No (may return unknown)

Time complexity

Exponential (worst-case)

Polynomial

Scalability

Small networks (~1000 neurons)

Large networks (millions)

Why incomplete methods dominate: The NP-completeness [Katz et al., 2017, Weng et al., 2018] of verification means complete methods face fundamental scalability barriers. For practical large-scale verification, incomplete methods are necessary.

Axis 2: Deterministic vs Probabilistic

How are guarantees provided?

Deterministic verification [Weng et al., 2018, Zhang et al., 2018, Katz et al., 2019]:

  • Guarantee: Definite bounds on outputs for all inputs in region

  • Example: “For all \(x\) in \(\epsilon\)-ball, output stays in class A”

  • Methods: All bound propagation, SMT, MILP, LP/SDP

  • Advantage: No uncertainty in guarantee

  • Limitation: Conservative (may be loose)

Probabilistic verification [Cohen et al., 2019]:

  • Guarantee: Probabilistic bound with confidence level

  • Example: “With 99% confidence, for all \(x\) in radius \(r\), output stays in class A”

  • Methods: Randomized smoothing

  • Advantage: Tighter bounds than deterministic for same cost

  • Limitation: Probabilistic (not absolute certainty)

Deterministic vs Probabilistic

Deterministic:

  • Absolute guarantee

  • May be conservative (loose bounds)

  • Required for safety-critical systems needing provable guarantees

Probabilistic:

  • Statistical guarantee with confidence level

  • Often tighter for same computational cost

  • Acceptable when probabilistic guarantees suffice

Most verification methods are deterministic. Randomized smoothing [Cohen et al., 2019] is the primary probabilistic approach, offering better scalability at the cost of statistical rather than absolute guarantees.

Axis 3: Bound Propagation vs Optimization

How are bounds computed?

Bound propagation [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018]:

  • Approach: Propagate bounds layer-by-layer through network

  • Mechanism: Abstract interpretation, interval arithmetic, linear relaxations

  • Complexity: O(network size) typically

  • Examples: IBP, CROWN, DeepPoly

  • Advantage: Fast, scales well

  • Limitation: May be loose (accumulates approximation errors)

Optimization-based [Raghunathan et al., 2018, Katz et al., 2019, Salman et al., 2019]:

  • Approach: Formulate bound computation as optimization problem

  • Mechanism: LP, SDP, SMT, MILP

  • Complexity: O(n³) for LP/SDP, exponential for SMT/MILP

  • Examples: LP relaxation, SDP, Marabou

  • Advantage: Can be tighter (explicit optimization)

  • Limitation: Slower (solving optimization expensive)

Hybrid: Branch-and-bound [Wang et al., 2021, Zhang et al., 2022]:

  • Uses bound propagation for initial bounds

  • Branches (partitions input space) when bounds too loose

  • Applies bound propagation to each partition

  • Combines speed of propagation with refinement of partitioning

Verification Method Taxonomy

Combining these axes yields a taxonomy of verification approaches:

Complete Methods

1. SMT-Based [Katz et al., 2017, Katz et al., 2019]

  • Approach: Encode network as logical constraints, solve with SMT

  • Example: Marabou/Reluplex

  • Tightness: Exact (complete)

  • Speed: Slow (exponential worst-case)

  • Scalability: Hundreds to ~1000 neurons

  • Best for: Small networks requiring definitive answers

2. MILP-Based [Tjeng et al., 2019, Cheng et al., 2017]

  • Approach: Encode as mixed-integer linear program

  • Example: Gurobi/CPLEX formulations

  • Tightness: Exact (complete)

  • Speed: Slow (exponential worst-case)

  • Scalability: Similar to SMT

  • Best for: Finding minimum perturbations, optimization objectives

3. Branch-and-Bound [Wang et al., 2021, Zhang et al., 2022]

  • Approach: Partition input space, verify each partition with incomplete methods

  • Example: α-β-CROWN with branching

  • Tightness: Exact (complete) if given sufficient time

  • Speed: Moderate to slow (depends on branching)

  • Scalability: Larger than SMT/MILP (tens of thousands of neurons)

  • Best for: When incomplete methods return “unknown” but complete needed

Incomplete Methods: Bound Propagation

1. Interval Bound Propagation (IBP) [Gowal et al., 2019]

  • Approach: Propagate interval bounds layer-by-layer

  • Tightness: Loosest

  • Speed: Fastest (O(network size))

  • Scalability: Millions of neurons

  • Best for: Rapid screening, certified training

2. Linear Bound Propagation (CROWN) [Weng et al., 2018, Zhang et al., 2018]

  • Approach: Backward linear bound propagation with optimized coefficients

  • Tightness: Moderate to tight

  • Speed: Fast (O(network size))

  • Scalability: Millions of neurons

  • Best for: General verification, production use

3. Abstract Interpretation (DeepPoly) [Singh et al., 2019]

  • Approach: Abstract interpretation with polyhedral domains

  • Tightness: Tight

  • Speed: Fast to moderate

  • Scalability: Large networks

  • Best for: When tighter than CROWN needed, faster than LP

4. Multi-Neuron Relaxation (PRIMA) [Müller et al., 2022]

  • Approach: Consider multiple neurons jointly via product terms

  • Tightness: Very tight

  • Speed: Moderate (O(network size + k pairs))

  • Scalability: Thousands to tens of thousands

  • Best for: When single-neuron bounds too loose

Incomplete Methods: Optimization-Based

1. Linear Programming [Salman et al., 2019]

  • Approach: Encode network as LP, solve for bounds

  • Tightness: Moderate to tight

  • Speed: Moderate (O(n³) LP solving)

  • Scalability: Thousands of neurons

  • Best for: When explicit optimization formulation needed

2. Semi-Definite Programming [Raghunathan et al., 2018]

  • Approach: Encode as SDP with lifted quadratic variables

  • Tightness: Very tight

  • Speed: Slow (O(n^4.5))

  • Scalability: Hundreds to ~1000 neurons

  • Best for: Small networks needing very tight bounds

3. Lipschitz-Based [Fazlyab et al., 2019]

  • Approach: Bound outputs via network Lipschitz constant

  • Tightness: Loose (product bound) to tight (LipSDP)

  • Speed: Fast (product) to slow (SDP)

  • Scalability: Depends on method

  • Best for: Networks with controlled Lipschitz constants

Probabilistic Methods

Randomized Smoothing [Cohen et al., 2019]

  • Approach: Smooth classifier with Gaussian noise, statistical certification

  • Tightness: Tight (often tightest certified radius)

  • Speed: Slow (requires many samples)

  • Scalability: Excellent (works for any network size)

  • Best for: Large networks where deterministic verification intractable

Tightness-Speed-Scalability Tradeoffs

Every verification method navigates a three-way tradeoff:

The Tradeoff Triangle

                Tightness
                   △
                  /│\
                 / │ \
                /  │  \
               /   │   \
      Complete /    │    \ SDP/PRIMA
       (SMT)  /     │     \
             /      │      \
            /   Branch&     \
           /    Bound        \
          /        │          \
         /         │           \
        /    CROWN/DeepPoly     \
       /           │             \
      /            │              \
     /             │               \
Speed ◁────────────┼────────────────▷ Scalability
     \             │               /
      \           IBP             /
       \           │             /
        \          │            /
         \   Randomized        /
          \   Smoothing       /
           \        │        /
            \       │       /
             \      │      /
              \     │     /
               \    │    /
                \   │   /
                 \  │  /
                  \ │ /
                   \│/
                    ▽

You can optimize for at most two:

  • Tight + Fast: Impossible for large networks (NP-completeness barrier)

  • Tight + Scalable: Randomized smoothing (probabilistic)

  • Fast + Scalable: IBP, CROWN (loose to moderate tightness)

Middle ground methods (CROWN, DeepPoly, PRIMA): Balance all three reasonably well.

Quantitative Comparison

Table 16 Method Characteristics

Method

Tightness

Time Complexity

Max Network Size

Certified CIFAR-10 (ε=8/255)

Marabou (SMT)

Exact

Exponential

~1K neurons

N/A (too slow)

MILP

Exact

Exponential

~1K neurons

N/A (too slow)

Branch&Bound

Exact

Exp (avg better)

~100K neurons

~65%

IBP

Loose

O(n)

Millions

~35%

CROWN

Moderate

O(n)

Millions

~55%

DeepPoly

Tight

O(n)

Millions

~60%

PRIMA

Very Tight

O(n + k)

~100K

~62%

LP

Moderate

O(n³)

~10K

~58%

SDP

Very Tight

O(n^4.5)

~1K

~63%

LipSDP

Tight

O(n^3+)

~10K

~50%

Rand. Smooth

Tight (prob)

O(samples)

Unlimited

~70%

Decision Guide: Choosing a Method

Given your requirements, which method should you use?

By Use Case

Safety-Critical Application (need absolute guarantees):

  1. Try incomplete methods (CROWN, DeepPoly) first

  2. If unknown, use branch-and-bound [Wang et al., 2021]

  3. If still too slow, use Marabou [Katz et al., 2019] for small networks

  4. If network too large, redesign network to be smaller/easier to verify

Production Deployment (need fast verification):

  1. Use CROWN [Zhang et al., 2018] or DeepPoly [Singh et al., 2019]

  2. If too loose, try PRIMA [Müller et al., 2022]

  3. If still insufficient, use randomized smoothing [Cohen et al., 2019]

  4. Accept “unknown” results gracefully

Research / Benchmarking (need tightest bounds):

  1. Try multi-neuron relaxations (PRIMA) [Müller et al., 2022]

  2. For small networks, use SDP [Raghunathan et al., 2018]

  3. For completeness, branch-and-bound [Wang et al., 2021, Zhang et al., 2022]

  4. Compare multiple methods to understand tradeoffs

Large-Scale Networks (millions of parameters):

  1. Use IBP [Gowal et al., 2019] or CROWN [Zhang et al., 2018]

  2. If certification needed, use randomized smoothing [Cohen et al., 2019]

  3. Accept that complete verification is intractable

  4. Focus on certified training [Gowal et al., 2019, Zhang et al., 2020] instead

By Network Size

Small (<1000 neurons):

Medium (1K-100K neurons):

Large (>100K neurons):

By Tightness Requirement

Need exact answers:

Need tight bounds, can accept “unknown”:

Moderate bounds acceptable:

Coarse bounds sufficient:

Evolution of Verification Methods

Understanding the historical progression clarifies why current methods exist:

2017: Complete methods emerge [Katz et al., 2017]

  • Reluplex/Marabou: First practical SMT-based verifier

  • Proved NP-completeness of verification

  • Established fundamental complexity barriers

2018: Fast incomplete methods [Weng et al., 2018, Zhang et al., 2018, Singh et al., 2018]

  • CROWN, Fast-Lin: Linear bound propagation

  • Polynomial-time, scalable to large networks

  • Trade completeness for speed

2019: Certified training [Gowal et al., 2019, Zhang et al., 2019]

  • IBP training: First scalable certified training

  • TRADES: Accuracy-robustness tradeoff

  • Shift from verification-only to training integration

2020: Tighter incomplete methods [Xu et al., 2020, Zhang et al., 2020]

  • α-β-CROWN: Optimized bound parameters

  • GPU acceleration: Massive speedups

  • Improved certified accuracy on benchmarks

2021-2022: Multi-neuron and hybrid [Müller et al., 2022, Wang et al., 2021, Zhang et al., 2022]

  • PRIMA: Multi-neuron relaxations

  • β-CROWN: Branch-and-bound with tight bounds

  • Complete verification at larger scale

Current frontier:

  • Scaling to vision transformers and larger architectures

  • Tighter relaxations without sacrificing speed

  • Integration with foundation models

Remaining Open Problems

Despite progress, fundamental challenges remain:

Scalability-Tightness Gap The gap between what we can verify tightly (thousands of neurons) and what we want to verify (billions of parameters in LLMs) remains enormous.

Complete Verification Barrier NP-completeness [Katz et al., 2017, Weng et al., 2018] means complete verification will always face exponential worst-case. Better heuristics help but don’t eliminate this barrier.

Non-ReLU Activations Most methods assume ReLU. Modern architectures use GeLU, SiLU, attention—extending verification to these is challenging.

Non-\(\ell_p\) Perturbations

Most verification focuses on \(\ell_p\) norms. Semantic perturbations, physical adversarial examples require different approaches.

Certified Accuracy Gap Even state-of-the-art achieves ~70% certified accuracy on CIFAR-10, far below 95%+ clean accuracy. Closing this gap remains elusive.

Final Thoughts

The taxonomy of neural network verification reveals a rich landscape: complete methods [Katz et al., 2017, Tjeng et al., 2019, Katz et al., 2019] providing definitive answers at high cost, incomplete methods [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018] scaling efficiently with partial guarantees, and probabilistic methods [Cohen et al., 2019] offering alternative tradeoffs.

Understanding this taxonomy helps you navigate the verification literature, choose methods appropriately, and appreciate the fundamental tradeoffs shaping the field. There’s no single “best” verification method—only methods suited to different requirements, constraints, and applications.

As verification research advances, the taxonomy will evolve. New methods may occupy unexplored regions (e.g., probabilistic complete verification, fast tight bounds). But the fundamental axes—completeness, determinism, tightness-speed-scalability—will likely persist, providing enduring structure to organize verification approaches.

Further Reading

This guide provides a systematic taxonomy of neural network verification methods. For readers interested in diving deeper, we recommend exploring specific methods within each category:

Complete Verification Methods:

SMT-based verification with Marabou [Katz et al., 2017, Katz et al., 2019] pioneered complete verification, while MILP approaches [Tjeng et al., 2019, Cheng et al., 2017] offer alternative complete formulations. Branch-and-bound [Wang et al., 2021, Zhang et al., 2022] extends complete verification to larger networks through hybrid incomplete-complete strategies.

Incomplete Bound Propagation:

IBP [Gowal et al., 2019] provides the fastest bounds, CROWN [Weng et al., 2018, Zhang et al., 2018] balances speed and tightness, and DeepPoly [Singh et al., 2019] offers tight abstract interpretation. Multi-neuron relaxations [Müller et al., 2022] capture correlations for even tighter bounds.

Optimization-Based Incomplete Methods:

LP relaxation [Salman et al., 2019] and SDP [Raghunathan et al., 2018] provide tighter bounds through explicit optimization. Lipschitz-based methods [Fazlyab et al., 2019] offer alternative perspectives on robustness certification.

Probabilistic Certification:

Randomized smoothing [Cohen et al., 2019] represents the primary probabilistic approach, providing excellent scalability through statistical guarantees.

Theoretical Foundations:

The NP-completeness result [Katz et al., 2017, Weng et al., 2018] establishes fundamental complexity barriers. Understanding this helps set realistic expectations about verification’s capabilities and limitations.

Certified Training:

IBP training [Gowal et al., 2019], CROWN training [Zhang et al., 2020], and GPU-accelerated methods [Xu et al., 2020] show how verification integrates with training. TRADES [Zhang et al., 2019] balances natural and robust accuracy.

Related Topics:

For mathematical foundations of the verification problem, see The Verification Problem: Mathematical Formulation. For understanding soundness and completeness, see Soundness and Completeness. For complexity barriers, see Theoretical Barriers in Neural Network Verification. For practical method selection, see Robustness Testing Guide.

Next Guide

Continue to Theoretical Barriers in Neural Network Verification to understand the fundamental mathematical limits and complexity barriers in neural network verification.

[1] (1,2)

Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, 2017.

[2] (1,2,3,4,5,6,7,8,9)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[3] (1,2)

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, 11423–11434. 2019.

[4] (1,2,3,4,5,6,7,8,9,10,11,12)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[5] (1,2,3,4,5,6,7,8,9)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[6] (1,2,3,4,5,6,7,8,9,10)

Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.

[7] (1,2,3,4,5,6,7)

Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.

[8] (1,2,3,4,5,6)

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[9] (1,2,3)

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.

[10] (1,2,3,4,5,6,7,8,9)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[11]

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802–10813. 2018.

[12] (1,2,3,4)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[13] (1,2,3,4,5,6,7)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[14] (1,2,3,4,5,6,7,8,9,10,11)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[15] (1,2)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[16] (1,2)

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR, 2019.

[17] (1,2,3)

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.

[18] (1,2,3,4,5)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[19] (1,2,3,4,5,6,7,8,9,10,11,12,13)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.