The Verification Problem: Mathematical Formulation
The mathematical formulation of the verification problem, NP-completeness results, and fundamental complexity analysis.
Verification asks a simple question: Does the network always behave correctly?
More precisely: given an input region (e.g., ±0.03 pixel change), does the network always satisfy some property (e.g., correct classification) for every input in that region?
This is hard because:
- Input region is infinite (uncountably many points)
- Can’t test them all
- Need mathematical proof instead
The verification problem has three possible answers:
- VERIFIED: Yes, property holds everywhere in the region.
- FALSIFIED: No, I found a counterexample that violates it.
- UNKNOWN: I don’t know (some tools).
The core challenge: determining which answer is true, efficiently.
Robustness Verification as Search
Robustness verification is really an optimization problem in disguise.
Concrete example: Input is a stop sign. True label: “stop.” Perturbation budget: ±3% brightness.
The verifier must check: “For all perturbed versions of this stop sign, does the network still output ‘stop’ with confidence > all other classes?”
Equivalently: “What’s the minimum confidence margin (stop confidence - other-class confidence) over all perturbed images?”
If minimum margin > 0: verified robust. If minimum margin < 0: found a counterexample. If we can’t compute the minimum exactly: we get “uncertain” or bounds on it.
The challenge: this minimum is hard to find. The search space is huge. Network output is non-convex (no nice structure to exploit).
Why Verification Is Hard (Fundamentally)
It’s not that we haven’t found good algorithms. The problem itself is hard.
ReLU networks partition input space into regions (exponentially many). Each region is linear, so behavior is predictable there. But with k ReLU activations, you get 2^k regions. Verification must check all regions. That’s exponential complexity.
What this means:
- No algorithm can solve all verification instances quickly (provably)
- You have to choose: exact answers (slow) or approximate answers (fast)
- This isn’t engineering limitation; it’s mathematical fact
Two Strategies: Exact vs Fast
The computational hardness forces a choice.
Exact verification (small networks): Search exhaustively. Partition input space, check each region. Eventually, you either prove robustness or find a counterexample. Takes exponential time.
Fast verification (large networks): Don’t try to be exact. Compute bounds: “Output is in range [a, b].” If bounds guarantee safety, great. If not, say “unknown.” Runs in polynomial time but misses some verifiable networks.
The tradeoff is fundamental, not an engineering limitation. You can have:
- Exact + slow (small networks only)
- Fast + approximate (large networks, but sometimes “unknown”)
Can’t have both exact and fast simultaneously.
Sound vs Unsound
If a verification method says “verified,” the guarantee must be real.
Sound: Approximations can be conservative (“I don’t know” when it’s actually provable), but never wrong. If sound method says “verified,” you can trust it.
Unsound: Says “verified” but it’s actually false. Unacceptable. You’d deploy a vulnerable network thinking it’s safe.
All formal verification must be sound. Test-based methods and adversarial attacks are not sound verification (finding no counterexamples doesn’t prove robustness). But they’re sound falsifiers (finding a counterexample proves vulnerability).
Bound Computation as Core Primitive
Instead of finding exact outputs, we track ranges.
The idea: Input is in range [a, b]. What output range is possible?
- Layer 1 maps [a,b] to [c, d]
- Layer 2 maps [c, d] to [e, f]
- … continue to output
- If final range [e, f] proves the property (e.g., correct class dominates), robustness is proven.
Why ranges instead of exact values? Because computing exact output for all inputs is impossible. But computing achievable ranges is feasible—and if ranges are tight enough, they’re sufficient.
The tradeoff: Fast algorithms produce loose ranges (maybe “unknown”). Slow algorithms produce tight ranges (stronger proofs). Tighter bounds require more computation. No free lunch.
Certified Radius
Instead of “verify robustness with budget 0.03,” ask: what’s the maximum budget for which robustness holds?
Search for the breaking point: start with a small budget (verified). Increase it until verification fails. That’s your certified radius.
Larger radius = stronger robustness proof.
Probabilistic Certification
Alternative approach: instead of proving exact robustness, prove probabilistic robustness.
Idea: Add random noise to inputs, test many times, estimate: “With 99% confidence, robustness radius is at least R.”
Trades exactness for speed. Works well at scale.
Scaling to Large Networks
Why does verification slow down with network size?
- Dimension explosion: For a 224×224×3 image, the input space is enormous. Partitioning it requires exponentially many subspaces.
- Error accumulation: As bounds propagate layer-by-layer, errors compound. Deep networks accumulate more error.
- Nonlinearity: ReLU networks split space into regions. More ReLUs = more regions = exponential combinations.
Practical solutions: Use incomplete methods (accept “unknown”). Use GPUs (parallelize). Use hybrid: fast incomplete filter, then exact refinement on promising regions.
Core Takeaway
Verification is mathematically well-defined but fundamentally hard. Complete verification takes exponential time. Fast verification sometimes says “unknown.” You have to choose. Understand the tradeoff, pick appropriate methods.
Next Guide
Continue to Soundness and Completeness to understand the fundamental properties of soundness and completeness in verification methods.