Soundness and Completeness
The fundamental concepts of soundness and completeness in verification, including complete vs. incomplete methods and deterministic vs. probabilistic approaches.
Your verification tool returns “unknown.” You run it again with a different method and get “verified.” What does this mean for your network’s actual robustness? And more importantly, why can’t verification tools just give you a yes-or-no answer like a test suite?
The answer lies in understanding two fundamental properties that govern all verification methods: soundness and completeness. These aren’t just academic concepts---they determine whether you can trust your verification results, how much computational effort you need to invest, and which tools are appropriate for your use case. This guide explores these formal properties and their practical implications.
Two Dimensions of Verification Quality
Soundness: If the verifier says “safe,” is it actually safe?
Completeness: Does the verifier always give a yes/no answer, or sometimes “I don’t know”?
These are independent. You can be sound without being complete, complete without being sound (which is useless), or both. Understanding this clarifies why different tools make different tradeoffs.
Soundness: The Guarantee You Can Trust
Imagine your verification tool says “this network is robust.” Can you trust that answer? Not always---it depends on whether the tool is sound.
What Soundness Means
A verification method is sound if, whenever it says “the property holds,” the property actually holds for the real network. More formally:
Sound methods never produce false positives. If the verifier proves your network is safe, you can bank on it.
Why Soundness Matters
Safety-critical applications---autonomous vehicles, medical devices, aviation systems---cannot tolerate false positives. A verification tool that occasionally lies about safety is worse than useless; it’s dangerous. Sound methods provide the mathematical guarantee that “if we say it’s safe, it’s actually safe.”
Attacks vs Verification
Adversarial attacks find bugs (sound for bug-finding). When they succeed, the bug is real. When they fail, you’ve learned nothing---another attack might succeed.
Verification tools prove properties (sound for proof). When they say “safe,” you can trust it. When they say “unknown,” the property might still hold.
Different tools, different jobs. Don’t confuse them.
Completeness: Finding All the Answers
Even if a verification method is sound, it might not always give you a definite answer. Some verifiers can return “unknown”---meaning they couldn’t determine whether the property holds or not.
What Completeness Means
A verification method is complete if it can always determine whether a property holds or not. No “unknown” results---just “safe” or “unsafe.”
Complete methods always reach a conclusion, though they might take arbitrarily long to do so.
Complete methods (exact solvers): Always give YES or NO. But take exponential time. Work only for small networks.
Incomplete methods (fast solvers): Fast, scale to large networks. But sometimes say “unknown.” That’s fine---it just means “my approximations weren’t tight enough to prove it.”
“Unknown” is not “maybe unsafe.” It’s “can’t prove it from where I’m standing.”
The Four Quadrants
Sound + Complete (exact solvers): YES or NO answer, always correct. Exponential time. For small networks only. Gold standard if you can wait.
Sound + Incomplete (fast solvers): Fast, large networks, but sometimes “unknown.” Most practical verification tools live here.
Unsound verification: Never acceptable. If a tool might lie about safety, its guarantee is worthless.
Sound falsification (attacks): Find real bugs fast, but can’t prove safety. Valuable for development, not certification.
Deterministic vs. Probabilistic Verification
Beyond the sound/complete dichotomy, verification methods also differ in whether they provide deterministic or probabilistic guarantees.
Deterministic Verification:
Traditional verification methods (SMT solvers, MILP, bound propagation) are deterministic: given the same input, they always produce the same result. When a deterministic verifier says “verified,” the property holds with certainty, not just high probability.
Probabilistic Verification:
Randomized smoothing represents a fundamentally different paradigm. Instead of verifying the original network , it verifies a smoothed version where noise is sampled from a distribution (typically Gaussian). The verification guarantee holds with high probability (e.g., 99.9%) where the randomness is independent of the input.
Probabilistic Certification Guarantee
For randomized smoothing, the certification states: “With probability at least (e.g., ), the smoothed classifier is robust within radius at input .”
This is different from deterministic methods which state: “The network is provably robust within radius at .”
Trade-offs:
- Deterministic methods: Stronger guarantees (certainty vs. high probability), but typically worse scalability or looser bounds
- Probabilistic methods: Better scalability to large networks and larger perturbations, but guarantees hold only with high probability and apply to smoothed classifiers rather than the original network
Importantly, all existing probabilistic verification approaches are incomplete. The sound/complete and deterministic/probabilistic dimensions are orthogonal, creating a richer taxonomy of verification approaches.
Soundness in Practice: Over-Approximation
How do sound methods achieve their guarantees? They rely on over-approximation.
Imagine computing the set of all possible outputs your network can produce given an input region. This “reachable set” is what we want to analyze. But computing it exactly is intractable. Sound methods instead compute an approximation that contains the true set:
By containing the true set, any property that holds for the approximation holds for the true outputs. This is the foundation of soundness.
Interval Arithmetic as an Example
The simplest sound method is interval arithmetic: track intervals at each neuron. For input interval [l, u], compute output intervals layer by layer. Each neuron’s output is bounded by the minimum and maximum over its possible inputs.
This is sound because the computed intervals always contain the true outputs. But it’s conservative---the intervals grow as you propagate through layers, becoming looser and looser. Eventually, the intervals become so large that they provide no useful information.
Tightness vs Soundness
Sound methods must over-approximate, so all sound methods are conservative to some degree. But they vary in how conservative they are. Tight methods produce approximations close to the true set (good for proving robustness). Loose methods are very conservative (might say “unknown” when safety actually holds).
Importantly, tightness and soundness are orthogonal properties. A method can be:
- Sound and tight (CROWN on favorable networks)
- Sound and loose (interval arithmetic)
- Sound falsification and effective (adversarial attacks that cleverly find real weaknesses)
Tip: When choosing between sound methods, prioritize tightness. A sound but loose verifier may waste time returning “unknown” for actually-safe regions. Tighter methods (CROWN) are preferable to looser methods (interval arithmetic) when computational budget allows.
Completeness in Practice: Refinement and Branching
Complete methods guarantee they’ll reach a definitive answer, but how? Through refinement strategies.
Branch and Bound
The classic approach is branch-and-bound: recursively split the input domain into smaller regions. For each region, apply an incomplete method (like CROWN) to get an upper bound on the loss or output value. If the bound proves the property holds, that region is done. If not, split it further and recurse.
Eventually, either all regions are verified (property holds globally) or you find a region where the bound suggests checking for counterexamples (property fails). This exhaustive exploration guarantees completeness.
Abstraction Refinement
SMT solvers use a different strategy: abstract interpretation with refinement. They start with coarse abstractions of the network, verify against those, and if verification fails, refine the abstraction to be more precise. They iterate until they either prove the property or find a real counterexample.
Why Completeness is Expensive
The problem is exponential blow-up. With branch-and-bound, the recursion tree can grow exponentially in depth and branching factor. For a modestly sized network and input region, the number of regions to analyze can become astronomical.
Practical Completeness: Timeouts
In practice, “complete” methods hit computational limits. They might have a timeout or memory limit. When the limit is reached, they might return “unknown” (making them incomplete in practice) or keep the best answer found so far. True completeness is theoretically guaranteed but practically limited by computational resources.
Hint: Only invest in complete verification when necessary: small networks, small input regions, or when certification is mandatory. For most development and testing, incomplete sound methods suffice and are far more practical.
Real-World Implications: Choosing Your Guarantees
How do these properties affect your actual verification workflow?
Safety-Critical Systems
Autonomous vehicles, medical devices, aviation systems---these require sound methods. The consequences of false positives (reporting something as safe when it’s not) are unacceptable. Whether you need completeness depends on regulations and liability. Often, sound + incomplete is sufficient if the certified regions are large enough.
Development and Research
During model development, adversarial attacks are perfectly appropriate. They quickly find real bugs and guide improvements. The lack of verification guarantees doesn’t matter because you’re not relying on attacks to certify the final product---you’re using them to find and fix problems. Any bug found by an attack is real and actionable.
Regulatory Compliance and Certification
Some regulations (aviation, medical) may require formal guarantees. Sound verification is necessary. Completeness may be required depending on the standard, though practical incomplete methods increasingly meet regulatory requirements when the verified regions are comprehensive enough.
Performance Tradeoffs
Your choice of soundness and completeness is really a choice about tradeoff priorities:
- Need fast bug-finding? Use adversarial attacks (fast, finds real bugs, but can’t prove safety)
- Need practical robustness without absolute guarantees? Use sound, incomplete (CROWN, DeepPoly)
- Need certification for critical systems? Use sound, complete (SMT solvers, though slower)
| Scenario | Soundness | Completeness | Reasoning |
|---|---|---|---|
| Research/development | Not required | Not required | Speed and iteration matter more |
| Production robustness | Required | Not required | Need guarantees but “unknown” is acceptable |
| Safety-critical certification | Required | Often required | Regulations may demand completeness |
| Regulatory compliance | Required | Case-by-case | Depends on specific standards |
Common Misconceptions
Misconception 1: “Unknown means unsafe”
False. “Unknown” from an incomplete verifier means the verifier couldn’t prove safety with its approximations. The network might be perfectly safe. It just means you need tighter bounds, a smaller input region, or a complete method to determine the answer.
Misconception 2: “Sound methods are always better”
Not always. Sound verification methods are necessary when you need safety proofs, but adversarial attacks are faster for finding bugs. For development iteration, attacks quickly find real issues that verification methods would take hours to process---and any bug found is actionable.
Misconception 3: “Complete methods always terminate quickly”
No. “Complete” means they’ll eventually reach a conclusion, but that conclusion might take unbounded time. In practice, timeouts make them incomplete, or they return the best answer found so far.
Misconception 4: “Tight bounds are unsound”
These properties are independent. A method can be both sound and tight (good!), or sound and loose (conservative). Tightness measures approximation quality; soundness measures correctness.
Misconception 5: “All verifiers are sound”
Many tools make unsound approximations or use unsound-by-default methods. Some papers claim soundness while using heuristics that violate it. Always check whether the tool’s guarantees are proven or just claimed.
Critical Thinking About Tool Claims
When evaluating verification tools, ask: Is soundness proven mathematically, or just claimed? Does completeness mean theoretical completeness (eventually, with infinite time) or practical completeness (under stated assumptions)? Does “unknown” mean the method is incomplete, or does the method not return “unknown” at all? The answers determine whether you can trust the tool for critical applications.
Core Takeaway
Soundness: Can you trust the tool when it says “safe”? Completeness: Does it always give YES/NO, or sometimes “unknown”?
Pick two priorities: you can’t have all three (soundness + completeness + speed). Safety-critical? Prioritize soundness. Fast iteration? Use falsification (attacks). Production? Sound + incomplete suffices.
Next Guide
Continue to Verification Taxonomy to explore the different categories and approaches in neural network verification methods.