Soundness and Completeness

Your verification tool returns “unknown.” You run it again with a different method and get “verified.” What does this mean for your network’s actual robustness? And more importantly, why can’t verification tools just give you a yes-or-no answer like a test suite?

The answer lies in understanding two fundamental properties that govern all verification methods: soundness and completeness. These aren’t just academic concepts---they determine whether you can trust your verification results, how much computational effort you need to invest, and which tools are appropriate for your use case. This guide explores these formal properties and their practical implications.

Two Dimensions of Verification Quality

Soundness: If the verifier says “safe,” is it actually safe?

Completeness: Does the verifier always give a yes/no answer, or sometimes “I don’t know”?

These are independent. You can be sound without being complete, complete without being sound (which is useless), or both. Understanding this clarifies why different tools make different tradeoffs.

Soundness: The Guarantee You Can Trust

Imagine your verification tool says “this network is robust.” Can you trust that answer? Not always---it depends on whether the tool is sound.

What Soundness Means

A verification method is sound if, whenever it says “the property holds,” the property actually holds for the real network. More formally:

$\text{Sound method: If verifier says SAFE, then the network is truly SAFE}$

Sound methods never produce false positives. If the verifier proves your network is safe, you can bank on it.

Why Soundness Matters

Safety-critical applications---autonomous vehicles, medical devices, aviation systems---cannot tolerate false positives. A verification tool that occasionally lies about safety is worse than useless; it’s dangerous. Sound methods provide the mathematical guarantee that “if we say it’s safe, it’s actually safe.”

Attacks vs Verification

Adversarial attacks find bugs (sound for bug-finding). When they succeed, the bug is real. When they fail, you’ve learned nothing---another attack might succeed.

Verification tools prove properties (sound for proof). When they say “safe,” you can trust it. When they say “unknown,” the property might still hold.

Different tools, different jobs. Don’t confuse them.

Completeness: Finding All the Answers

Even if a verification method is sound, it might not always give you a definite answer. Some verifiers can return “unknown”---meaning they couldn’t determine whether the property holds or not.

What Completeness Means

A verification method is complete if it can always determine whether a property holds or not. No “unknown” results---just “safe” or “unsafe.”

$\text{Complete method: } \forall x \in \text{Input Domain}, \text{ decides SAFE or UNSAFE}$

Complete methods always reach a conclusion, though they might take arbitrarily long to do so.

Complete methods (exact solvers): Always give YES or NO. But take exponential time. Work only for small networks.

Incomplete methods (fast solvers): Fast, scale to large networks. But sometimes say “unknown.” That’s fine---it just means “my approximations weren’t tight enough to prove it.”

“Unknown” is not “maybe unsafe.” It’s “can’t prove it from where I’m standing.”

The Four Quadrants

Sound + Complete (exact solvers): YES or NO answer, always correct. Exponential time. For small networks only. Gold standard if you can wait.

Sound + Incomplete (fast solvers): Fast, large networks, but sometimes “unknown.” Most practical verification tools live here.

Unsound verification: Never acceptable. If a tool might lie about safety, its guarantee is worthless.

Sound falsification (attacks): Find real bugs fast, but can’t prove safety. Valuable for development, not certification.

Deterministic vs. Probabilistic Verification

Beyond the sound/complete dichotomy, verification methods also differ in whether they provide deterministic or probabilistic guarantees.

Deterministic Verification:

Traditional verification methods (SMT solvers, MILP, bound propagation) are deterministic: given the same input, they always produce the same result. When a deterministic verifier says “verified,” the property holds with certainty, not just high probability.

Probabilistic Verification:

Randomized smoothing represents a fundamentally different paradigm. Instead of verifying the original network $f_\theta$ , it verifies a smoothed version $g(x) = \mathbb{E}_{\text{noise}}[f_\theta(x + \text{noise})]$ where noise is sampled from a distribution (typically Gaussian). The verification guarantee holds with high probability (e.g., 99.9%) where the randomness is independent of the input.

Probabilistic Certification Guarantee

For randomized smoothing, the certification states: “With probability at least $1 - \alpha$ (e.g., $\alpha = 0.001$ ), the smoothed classifier $g$ is robust within radius $\epsilon$ at input $x_0$ .”

This is different from deterministic methods which state: “The network $f_\theta$ is provably robust within radius $\epsilon$ at $x_0$ .”

Trade-offs:

Deterministic methods: Stronger guarantees (certainty vs. high probability), but typically worse scalability or looser bounds
Probabilistic methods: Better scalability to large networks and larger perturbations, but guarantees hold only with high probability and apply to smoothed classifiers rather than the original network

Importantly, all existing probabilistic verification approaches are incomplete. The sound/complete and deterministic/probabilistic dimensions are orthogonal, creating a richer taxonomy of verification approaches.

Soundness in Practice: Over-Approximation

How do sound methods achieve their guarantees? They rely on over-approximation.

Imagine computing the set of all possible outputs your network can produce given an input region. This “reachable set” is what we want to analyze. But computing it exactly is intractable. Sound methods instead compute an approximation that contains the true set:

$\text{Approximated Set} \supseteq \text{True Reachable Set}$

By containing the true set, any property that holds for the approximation holds for the true outputs. This is the foundation of soundness.

Interval Arithmetic as an Example

The simplest sound method is interval arithmetic: track intervals at each neuron. For input interval [l, u], compute output intervals layer by layer. Each neuron’s output is bounded by the minimum and maximum over its possible inputs.

This is sound because the computed intervals always contain the true outputs. But it’s conservative---the intervals grow as you propagate through layers, becoming looser and looser. Eventually, the intervals become so large that they provide no useful information.

Tightness vs Soundness

Sound methods must over-approximate, so all sound methods are conservative to some degree. But they vary in how conservative they are. Tight methods produce approximations close to the true set (good for proving robustness). Loose methods are very conservative (might say “unknown” when safety actually holds).

Importantly, tightness and soundness are orthogonal properties. A method can be:

Sound and tight (CROWN on favorable networks)
Sound and loose (interval arithmetic)
Sound falsification and effective (adversarial attacks that cleverly find real weaknesses)

Tip: When choosing between sound methods, prioritize tightness. A sound but loose verifier may waste time returning “unknown” for actually-safe regions. Tighter methods (CROWN) are preferable to looser methods (interval arithmetic) when computational budget allows.

Complete methods guarantee they’ll reach a definitive answer, but how? Through refinement strategies.

Branch and Bound

The classic approach is branch-and-bound: recursively split the input domain into smaller regions. For each region, apply an incomplete method (like CROWN) to get an upper bound on the loss or output value. If the bound proves the property holds, that region is done. If not, split it further and recurse.

Eventually, either all regions are verified (property holds globally) or you find a region where the bound suggests checking for counterexamples (property fails). This exhaustive exploration guarantees completeness.

Abstraction Refinement

SMT solvers use a different strategy: abstract interpretation with refinement. They start with coarse abstractions of the network, verify against those, and if verification fails, refine the abstraction to be more precise. They iterate until they either prove the property or find a real counterexample.

Why Completeness is Expensive

The problem is exponential blow-up. With branch-and-bound, the recursion tree can grow exponentially in depth and branching factor. For a modestly sized network and input region, the number of regions to analyze can become astronomical.

Practical Completeness: Timeouts

In practice, “complete” methods hit computational limits. They might have a timeout or memory limit. When the limit is reached, they might return “unknown” (making them incomplete in practice) or keep the best answer found so far. True completeness is theoretically guaranteed but practically limited by computational resources.

Hint: Only invest in complete verification when necessary: small networks, small input regions, or when certification is mandatory. For most development and testing, incomplete sound methods suffice and are far more practical.

Real-World Implications: Choosing Your Guarantees

How do these properties affect your actual verification workflow?

Safety-Critical Systems

Autonomous vehicles, medical devices, aviation systems---these require sound methods. The consequences of false positives (reporting something as safe when it’s not) are unacceptable. Whether you need completeness depends on regulations and liability. Often, sound + incomplete is sufficient if the certified regions are large enough.

Development and Research

During model development, adversarial attacks are perfectly appropriate. They quickly find real bugs and guide improvements. The lack of verification guarantees doesn’t matter because you’re not relying on attacks to certify the final product---you’re using them to find and fix problems. Any bug found by an attack is real and actionable.

Regulatory Compliance and Certification

Some regulations (aviation, medical) may require formal guarantees. Sound verification is necessary. Completeness may be required depending on the standard, though practical incomplete methods increasingly meet regulatory requirements when the verified regions are comprehensive enough.

Performance Tradeoffs

Your choice of soundness and completeness is really a choice about tradeoff priorities:

Need fast bug-finding? Use adversarial attacks (fast, finds real bugs, but can’t prove safety)
Need practical robustness without absolute guarantees? Use sound, incomplete (CROWN, DeepPoly)
Need certification for critical systems? Use sound, complete (SMT solvers, though slower)

Scenario	Soundness	Completeness	Reasoning
Research/development	Not required	Not required	Speed and iteration matter more
Production robustness	Required	Not required	Need guarantees but “unknown” is acceptable
Safety-critical certification	Required	Often required	Regulations may demand completeness
Regulatory compliance	Required	Case-by-case	Depends on specific standards

Common Misconceptions

Misconception 1: “Unknown means unsafe”

False. “Unknown” from an incomplete verifier means the verifier couldn’t prove safety with its approximations. The network might be perfectly safe. It just means you need tighter bounds, a smaller input region, or a complete method to determine the answer.

Misconception 2: “Sound methods are always better”

Not always. Sound verification methods are necessary when you need safety proofs, but adversarial attacks are faster for finding bugs. For development iteration, attacks quickly find real issues that verification methods would take hours to process---and any bug found is actionable.

Misconception 3: “Complete methods always terminate quickly”

No. “Complete” means they’ll eventually reach a conclusion, but that conclusion might take unbounded time. In practice, timeouts make them incomplete, or they return the best answer found so far.

Misconception 4: “Tight bounds are unsound”

These properties are independent. A method can be both sound and tight (good!), or sound and loose (conservative). Tightness measures approximation quality; soundness measures correctness.

Misconception 5: “All verifiers are sound”

Many tools make unsound approximations or use unsound-by-default methods. Some papers claim soundness while using heuristics that violate it. Always check whether the tool’s guarantees are proven or just claimed.

Critical Thinking About Tool Claims

When evaluating verification tools, ask: Is soundness proven mathematically, or just claimed? Does completeness mean theoretical completeness (eventually, with infinite time) or practical completeness (under stated assumptions)? Does “unknown” mean the method is incomplete, or does the method not return “unknown” at all? The answers determine whether you can trust the tool for critical applications.

Core Takeaway

Soundness: Can you trust the tool when it says “safe”? Completeness: Does it always give YES/NO, or sometimes “unknown”?

Pick two priorities: you can’t have all three (soundness + completeness + speed). Safety-critical? Prioritize soundness. Fast iteration? Use falsification (attacks). Production? Sound + incomplete suffices.

Next Guide

Continue to Verification Taxonomy to explore the different categories and approaches in neural network verification methods.