Formal Specifications¶
You know you want to verify robustness, but how do you write that formally? The gap between intuition (“my network should be robust”) and mathematical precision is surprisingly large. Specifications are how you bridge that gap. A poorly written specification might verify something you didn’t intend, or miss important requirements entirely. This guide teaches you to think formally about properties and write specifications that verifiers can actually check.
Why Specifications Matter¶
Natural language is ambiguous. “The network should be robust” could mean different things: robust to small pixel perturbations? Robust to semantic transformations? Robust only on correctly classified inputs? These differences matter enormously for verification.
Formal specifications remove ambiguity. They’re mathematical statements about what the network must do. A verifier can either prove the specification holds or find a counterexample. But first, you must express your requirement as a specification.
This is harder than it sounds. Common mistakes include:
Specifications that are vacuous: So weak they’re trivially true, proving nothing meaningful.
Specifications that are unsatisfiable: So strict no network could ever satisfy them.
Specifications that are too narrow: Verifying only special cases while missing the real requirement.
Specifications that don’t match your intent: Formally correct but not what you actually wanted to check.
Specification as Contract
A specification is a contract between you and the verifier. You’re saying: “If you prove this property, I’ll trust the result.” Write specifications carefully, as they determine whether verification confirms what you care about.
The Epsilon-Ball Specification¶
The simplest and most common specification is local robustness [Singh et al., 2019, Weng et al., 2018]. For an input \(x_0\), the network is locally robust if all inputs within some distance \(\epsilon\) produce the same classification.
Definition: Local Robustness Specification
Given a neural network \(f_\theta: \mathcal{X} \to \mathbb{R}^C\), input \(x_0\), ground-truth label \(y_0\), and perturbation radius \(\epsilon > 0\), the local robustness specification is:
where \(B_{p,\epsilon}(x_0) = \{x : \|x - x_0\|_p \leq \epsilon\}\) is the perturbation region.
Equivalently, this can be expressed as a margin condition [Weng et al., 2018]:
This says: for input \(x_0\), if another input \(x\) is within distance \(\epsilon\), then \(f_\theta(x)\) must produce the same class as \(f_\theta(x_0)\).
The norm matters. Common choices are:
\(\ell_\infty\) norm: \(\|x' - x\|_\infty = \max_i |x'_i - x_i|\). Used for bounded pixel perturbations.
\(\ell_2\) norm: \(\|x' - x\|_2 = \sqrt{\sum_i (x'_i - x_i)^2}\). Used for Euclidean distance constraints.
\(\ell_1\) norm: \(\|x' - x\|_1 = \sum_i |x'_i - x_i|\). Used for sparse perturbations.
Norm |
Use Case |
Intuition |
|---|---|---|
\(\ell_\infty\) |
Pixel perturbations, bounded changes per dimension |
Maximum change in any single dimension |
\(\ell_2\) |
Overall perturbation budget |
Total Euclidean distance allowed |
\(\ell_1\) |
Sparse perturbations, changing few features |
Total absolute change across dimensions |
Choosing the right norm is essential. For image adversarial robustness, \(\ell_\infty\) or \(\ell_2\) are standard. For other domains, the choice depends on what perturbations are meaningful. Don’t default to \(\ell_\infty\) without thinking about your application.
Output Constraint Specifications¶
Beyond fixed output classification, you might need specifications about output values or ranges. For example, a confidence score should stay above a threshold, or a regression output should stay within safe bounds.
This says: for all perturbed inputs, the output stays within bounds \([L, U]\).
Multi-output constraints can be specified independently:
Combine constraints using logical operators: AND (both hold), OR (at least one holds), IMPLIES (conditional constraints).
Example: For autonomous vehicles, you might specify that the network’s steering angle should stay within safe bounds and that the confidence in lane detection should exceed a minimum threshold, even under sensor noise.
Hint
Output constraints are underutilized. Don’t just verify classification robustness—specify and verify that safety-critical outputs stay in safe ranges even under perturbation.
Relational Properties¶
Some properties involve relationships between outputs on different inputs, not just robustness of a single input.
Monotonicity: If input A is “more severe” than input B, the network should rank them accordingly. For example, in risk assessment, a higher risk input should score higher.
Fairness properties: The network should treat similar inputs similarly. If two inputs differ only in a protected attribute, they should have similar outputs.
Consistency: The network should make consistent decisions for semantically equivalent inputs. For vision, rotated or slightly scaled images should get the same classification.
These properties are harder to specify precisely than simple robustness, but they’re critical for real-world deployment.
Temporal and Sequential Properties¶
For networks that process sequences (RNNs, transformers), properties might involve behavior over time.
Eventually property: The network eventually enters a safe state.
Always property: The network maintains a safety invariant throughout sequence processing.
Example: In medical monitoring, a recurrent network should eventually raise an alarm if measurements exceed safe bounds, and should maintain memory of critical events.
These properties are complex to specify and verify. Many tools don’t yet support them well. When working with sequential models, keep specifications simpler and fall back to testing for complex temporal properties.
Specification Languages and Formats¶
Different tools use different specification syntax. Understanding common formats helps you write specifications that tools can parse.
VNNLIB format: Used by most modern verifiers [Wang et al., 2021, Zhang et al., 2022]. Specifies linear constraints on inputs and outputs using first-order logic over intervals. Standard format for VNN-COMP verification competition.
SMT-LIB format: The standard input language for SMT solvers [De Moura and Bjørner, 2008]. Expressive and precise but verbose. Used by Marabou [Katz et al., 2019], Z3, and other solvers.
ACAS Xu property format [Katz et al., 2017]: A domain-specific format for collision avoidance verification. Precise property definitions for the ACAS Xu benchmark, which became a foundational verification benchmark.
Tool-specific formats: Some verifiers have their own syntax. α-β-CROWN [Wang et al., 2021] and auto_LiRPA [Xu et al., 2020] use their own property definition formats.
Format |
Expressiveness |
Ease of Use |
Tools |
|---|---|---|---|
VNNLIB |
Linear constraints, Boolean logic |
Moderate (readable) |
Most verifiers |
SMT-LIB |
Arbitrary first-order logic |
Hard (verbose) |
SMT solvers, Marabou |
ACAS Xu format |
Domain-specific collision properties |
Easy (specialized) |
ACAS Xu benchmarks |
Tool-specific |
Varies |
Varies |
Individual tools |
When starting with a new verifier, check its documentation for specification examples. Copy and modify existing properties rather than writing from scratch.
From Domain Requirements to Formal Specifications¶
The real skill is translating domain requirements into formal specifications. This requires understanding both your domain and verification concepts.
Start with a requirement: “The network should recognize faces under lighting variations.”
Break it into verifiable components: 1. It should correctly classify normal lighting. 2. It should correctly classify darkened images (simulate with bounded pixel reduction). 3. It should correctly classify brightened images (simulate with bounded pixel increase).
For each component, write a formal specification using an epsilon-ball robustness property with appropriate epsilon value. Verify each separately.
Tip
Iterative refinement works better than trying to write the perfect specification upfront. Write a simple specification, verify it, and refine based on results. If verification succeeds but you don’t trust the result, the specification was too weak. If it times out, the specification was too broad.
Common Pitfalls¶
Vacuous specifications: A specification that’s trivially true provides no confidence. Example: “the network’s output is a vector of numbers.” Technically true, but meaningless. All specifications should be constraining.
Unsatisfiable specifications: Specifications that are impossible to satisfy. Example: “the network must classify all MNIST images correctly with adversarial perturbations of epsilon=255 (full input range).” No network can achieve this. Unsatisfiable specs waste verification effort.
Over-specification: Requiring more than necessary. Example: specifying the exact output values instead of just the classification. The verifier then needs to prove tighter properties than required.
Mismatch with intent: A formally correct specification that doesn’t match your actual requirement. Example: verifying robustness for one image when you meant to verify a representative sample. Always double-check that the specification captures what you care about.
Hint
Test your specifications. For simple ones, manually verify that they capture your intent on a few examples. For complex ones, run your network on test inputs and check whether the specification would have been satisfied. This sanity-checks the specification before verification.
Common Specification Patterns¶
Certain patterns appear frequently. Building a library of patterns helps you write specifications faster.
Local robustness: Epsilon-ball around specific points.
Regional properties: Constraints on outputs for entire input regions. Example: if pixel values are in a certain range, output classification should be X.
Safety thresholds: Output confidence must exceed minimum or be within bounds.
Fairness constraints: Outputs should be similar for inputs differing only in protected attributes.
Stability constraints: Similar inputs should produce similar outputs (Lipschitz-style properties).
Once you recognize a pattern, you can adapt previous specifications rather than starting from scratch.
Final Thoughts¶
Formal specifications are the interface between your requirements and the verifier. Good specifications are precise, meaningful, and achievable. Writing them requires both domain knowledge (what matters in your application) and verification knowledge (what verifiers can actually check).
Start with simple specifications. Verify them. Refine based on what you learn. Over time, you’ll develop intuition for which specifications are realistic and which are too ambitious.
Remember: a verified specification is only as valuable as the specification itself. Verify something meaningful, and verification becomes a powerful tool for confidence. Verify something vacuous, and you’ve wasted effort on a meaningless guarantee.
Further Reading
This guide provides comprehensive coverage of formal specifications for neural network verification. For readers interested in diving deeper, we recommend the following resources organized by topic:
Formal Robustness Definitions:
The mathematical foundations of robustness specifications are established in the verification literature [Singh et al., 2019, Weng et al., 2018]. The local robustness specification with \((\ell_p, \epsilon)\)-adversary provides the standard framework for expressing perturbation-bounded properties.
Specification Formats and Standards:
The VNNLIB format has emerged as the standard for neural network verification specifications, used in VNN-COMP competitions [Wang et al., 2021, Zhang et al., 2022]. SMT-LIB provides a more expressive but verbose alternative [Katz et al., 2019, De Moura and Bjørner, 2008]. The ACAS Xu benchmark [Katz et al., 2017] introduced domain-specific property specifications that became widely used in verification research.
Verification Tools and Specifications:
Modern verification tools like α-β-CROWN [Wang et al., 2021, Zhang et al., 2022], auto_LiRPA [Xu et al., 2020], and Marabou [Katz et al., 2019] each support different specification formats. Understanding tool-specific requirements helps write specifications that verifiers can parse efficiently.
Theoretical Foundations:
The connection between specifications and verification guarantees is formalized through soundness and completeness properties [Katz et al., 2017, Weng et al., 2018]. Understanding these theoretical foundations helps write specifications that leverage verifier strengths while avoiding common pitfalls.
Related Topics:
For understanding how verifiers check specifications, see Falsifiers and Verifiers. For the theoretical guarantees verification provides, see Soundness and Completeness. For practical verification workflows, see Robustness Testing Guide.
Next Guide
Continue to Falsifiers and Verifiers to understand the spectrum between adversarial attacks (falsifiers) and formal verification (verifiers).
Leonardo De Moura and Nikolaj Bjørner. Z3: an efficient smt solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, 337–340. Springer, 2008.
Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.
Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.
Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.
Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.
Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.
Comments & Discussion