Verification Scalability

Verification Scalability¶

January 2026 NNV Guide 9 min read 1,748 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks. It covers scalability challenges in verification, including the scalability-tightness tradeoff and practical scaling techniques.

Advanced Topics Scalability Performance

Your verifier works great on small networks. Then you try a ResNet and it times out. Scale breaks verification tools. As networks grow deeper, wider, and more complex, verification becomes increasingly difficult. Understanding these barriers and knowing which techniques can help is essential for applying verification to real-world models.

Why Verification is Hard at Scale¶

Verification involves analyzing the space of all possible inputs and computing bounds on outputs. For small networks, this is tedious but tractable. For large networks, several factors make it intractable—both fundamentally and practically.

Fundamental Complexity Barriers¶

NP-Completeness: The verification problem for ReLU networks is NP-complete [Katz et al., 2017, Weng et al., 2018]. This means that unless P=NP, no polynomial-time algorithm can solve all verification instances. Worse, Weng et al. [2018] proved that even approximating the certified robustness radius within any constant factor is NP-hard. This establishes a fundamental theoretical barrier—the scalability-tightness tradeoff is not just an engineering challenge but a mathematical certainty.

Theoretical Barrier

The NP-completeness of neural network verification [Katz et al., 2017, Weng et al., 2018] means that complete verification has worst-case exponential time complexity. All complete verification approaches—SMT [Katz et al., 2017], MILP [Tjeng et al., 2019], branch-and-bound [Wang et al., 2021]—face this fundamental barrier.

Depth compounds approximation error. In bound propagation, each layer propagates uncertainty forward. As you pass through many layers, bounds accumulate and grow looser. By the time you reach the output, bounds might be so loose they’re useless. A network with 50 layers faces much worse approximation than a network with 5 layers analyzing the same input region.

\[\text{Approximation error} \approx C \cdot \text{depth}\]

where \(C\) depends on layer-specific factors. The relationship isn’t always linear, but it’s consistently unfavorable for deep networks.

Width creates memory explosion. Wide layers mean many neurons to track. A single convolutional layer in a ResNet might have millions of activations. Tracking bounds for all of them consumes memory and computation. Complete methods that require storing decision trees explode exponentially in the number of neurons.

Non-linearities accumulate conservatism. Each ReLU, sigmoid, or other non-linearity requires approximation in bound propagation. Conservative approximations preserve soundness but grow loose. With hundreds of non-linear layers, approximations compound dramatically.

Fundamental vs Engineering Challenges

Some scalability challenges are fundamental (inherent to the verification problem). Others are engineering challenges (implementation and algorithm improvements). Understanding which is which helps you choose realistic goals.

The Curse of Dimensionality¶

High-dimensional input spaces create combinatorial explosion. An image of size 224×224×3 has roughly 150,000 dimensions. Complete verification methods that partition the input space face exponential growth in partitions needed.

Even with sophisticated refinement strategies like branch-and-bound, the number of regions to analyze can become astronomical. You might be exploring a tree with millions or billions of nodes. For incomplete methods using approximations, high dimensions mean more possible input changes to account for, making approximations looser.

This isn’t just about network size—it’s about input dimensionality. Small networks on high-dimensional inputs are as hard as large networks on low-dimensional inputs.

Input Dimension	Network Size	Difficulty
Low (< 100)	Small (< 1000 neurons)	Moderate, complete verification feasible
Low (< 100)	Large (millions of neurons)	Hard, incomplete methods preferred
High (> 10000)	Small	Hard, curse of dimensionality
High (> 10000)	Large	Very hard, only approximate methods practical

Modern Architecture Challenges¶

Modern neural networks use architectures beyond simple feedforward networks. These introduce new verification challenges.

Transformers and attention mechanisms create quadratic complexity in sequence length. Computing bounds on attention weights involves non-standard operations—softmax, matrix multiplication with computed values. These aren’t standard neural network operations, and verifiers don’t handle them well yet. A transformer with moderate sequence length can be as hard to verify as a much larger feedforward network.

ResNets and skip connections complicate verification in subtle ways. Skip connections mean information bypasses many layers. This can actually help in some cases—bounds don’t accumulate through all layers—but creates complex dataflow that complicates bound propagation. The interaction between main path and skip connections introduces additional approximation points.

EfficientNets and complex scaling use non-standard layer arrangements and normalization schemes. Each design choice (channel scaling, depth multipliers, feature pyramid networks) requires special handling in verification. Generic verifiers struggle because they assume standard architectures.

Modern Architecture Challenges

State-of-the-art architectures were designed for accuracy and speed, not verifiability. Verification tools are catching up, but often months to years behind architectural innovations. Expect modern architectures to be much harder to verify than simple CNNs.

Techniques for Scaling Verification¶

Several techniques help verification scale to larger networks.

Input space partitioning: Divide the input space into smaller regions and verify each separately. This is the basis of branch-and-bound methods [Wang et al., 2021, Zhang et al., 2022]. By checking smaller regions individually, approximation error is reduced—but the number of regions grows exponentially.

Layer-wise refinement: Focus computational effort on the most uncertain layers. If bounds at layer 5 are loose, refine them further. If bounds at layer 25 are already tight, skip refinement. Adaptive allocation of verification effort improves efficiency.

GPU acceleration [Xu et al., 2020]: Parallelizing bound propagation across many inputs or internal computations. Modern verifiers use GPU-based implementations for significant speedup. The auto_LiRPA library provides automatic GPU-accelerated bound propagation. But algorithmic improvements matter as much as hardware acceleration.

Sparse verification: Only verify critical neurons or layers. For large networks, not all neurons matter equally for the property being verified. Identify important neurons and focus verification effort there. Skip verification of neurons that don’t affect the final decision for your property.

Compositional verification: Verify components separately and compose results. Break a large network into smaller sub-networks, verify each, then combine guarantees. This requires careful handling of interfaces, but can reduce overall complexity.

Technique	Complexity Reduction	Applicability	Implementation Difficulty
Input space partitioning	Per-region reduction, exponential blowup in regions	Most methods	Moderate
Layer-wise refinement	Focus on tight layers	Incomplete methods	Moderate
GPU acceleration	Parallel computation	All methods	Easy
Sparse verification	Skip irrelevant neurons	Property-dependent	Hard
Compositional verification	Modular structure reduction	Specific architectures	Very hard

Incomplete Methods at Scale¶

Complete verification methods don’t scale due to NP-completeness [Katz et al., 2017, Weng et al., 2018]. They guarantee reaching a definitive answer, but that answer might take unbounded time. For practical large networks, incomplete methods—those using approximations but returning “unknown” rather than exploring exponentially—are necessary.

Methods like CROWN [Weng et al., 2018, Zhang et al., 2018], DeepPoly [Singh et al., 2019], and IBP [Gowal et al., 2019] are incomplete but fast. They scale to large networks because they avoid exhaustive exploration. The tradeoff: some properties return “unknown” rather than verified or unsafe.

Hybrid approaches [Wang et al., 2021] combine both: use incomplete methods for most of the input space, then refine and use complete methods (branch-and-bound) on smaller regions of uncertainty. This balances speed with higher confidence, as demonstrated by α-β-CROWN’s success in VNN-COMP.

Hint

For networks beyond moderate size, accepting “unknown” results is often the most practical approach. Rather than waiting for complete verification that may never finish, get bounds from incomplete methods and interpret “unknown” as “requires more work.”

Case Study: Verifying ImageNet Models¶

Practical limits become clear when attempting to verify large vision models. A ResNet-50 trained on ImageNet has millions of parameters. Current verifiers can certify only small perturbation budgets and typically on small image crops or downsampled inputs.

What works: Verifying small networks on MNIST or CIFAR-10. Certifying robustness to small perturbations (epsilon=1/255 in normalized coordinates). Incomplete methods on modest-sized networks (networks with thousands to tens of thousands of parameters).

What doesn’t work: Complete verification of large models. Verification of large perturbation budgets on high-resolution images. General-purpose verification of modern architectures (transformers, efficient networks) at production scale.

What’s emerging: Randomized smoothing for large networks (scales well). Learning-based verification assistance. Hybrid human-machine verification combining formal methods with testing. Combined falsification and verification workflows.

Realistic expectations matter. For current tools, practical verification handles small to medium networks or large networks with limited perturbation budgets. Production-scale verification of all inputs is not yet feasible. This is actively improving, but incremental.

Hint

When planning verification, set realistic goals. Certify small models completely. Certify large models’ critical components. Use verification to guide training even if complete verification is infeasible. Combine with adversarial testing for comprehensive confidence.

Future Directions¶

The field is actively addressing scalability. Learning-based verification uses neural networks to assist verification—training networks to predict the output of verifiers, or to suggest which regions need refinement. Symbolic execution and abstract interpretation research continues improving bound tightness. Specialized verifiers for specific architectures (transformers, RNNs) are emerging. Approximate but guaranteed methods trade completeness for better scalability.

The trend is toward better approximate methods rather than complete methods scaling. Instead of verifying everything completely, we’ll likely see frameworks that certify what’s certifiable and test what isn’t.

Practical Recommendations¶

Choose architectures with verification in mind. Simple feedforward and convolutional networks are more verifiable than complex attention-based or residual architectures. If verification matters, influence architecture choices early.

Start small and scale up gradually. Verify toy networks, then gradually increase size. Understand which properties can be verified before attempting production-scale verification.

Accept incomplete verification. Don’t demand complete certainty on large networks. Incomplete sound methods are a practical alternative.

Combine approaches. Use falsification (adversarial attacks) and verification together. Use incomplete verification to certify what’s certifiable, testing for the rest.

Invest in relevant benchmarks. Test your verification pipeline on networks and properties you care about. Don’t rely on toy benchmarks alone.

Final Thoughts¶

Verification scalability remains an open challenge. The fundamental tradeoffs between soundness, completeness, and speed create real barriers at scale. But practical verification of reasonable-sized networks is achievable with appropriate methods and expectations.

Understanding these barriers helps you set realistic goals, choose appropriate tools, and design systems that are actually verifiable. As the field matures, verification will likely become practical for larger networks, but it will always involve tradeoffs. Choose tools and methods aligned with your requirements and constraints.

Next Guide

Continue to Beyond \ell_p: Alternative Threat Models to explore threat models beyond ℓp perturbations including semantic transformations and physical adversarial examples.

[1] (1,2)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[2] (1,2,3,4,5,6)

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[3] (1,2)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[4] (1,2)

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[5] (1,2,3,4,5,6)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[6] (1,2,3,4,5,6,7)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[7] (1,2)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[8] (1,2,3)

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.

[9] (1,2)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.