Beyond \(\ell_p\): Alternative Threat Models¶
Most neural network verification [Gowal et al., 2019, Weng et al., 2018, Zhang et al., 2018, Katz et al., 2019] focuses on \(\ell_p\) norm-bounded perturbations—small pixel changes invisible to humans. But real-world adversaries aren’t constrained to \(\ell_p\) balls. They can modify semantics (change object pose, lighting), add physical patches, or exploit distribution shift.
These alternative threat models require different verification approaches. Semantic perturbations need invariance to meaningful transformations. Physical adversarial examples demand robustness under real-world constraints. Distributional robustness requires guarantees across data distributions. Each threat model brings unique challenges and opportunities for verification.
This guide explores threat models beyond \(\ell_p\), how they differ from traditional adversarial perturbations, and what verification methods exist for each.
Limitations of \(\ell_p\) Threat Models¶
Why \(\ell_p\) Norms Dominate¶
Standard formulation: Adversarial robustness typically considers \(\ell_p\) balls:
Advantages:
Simple mathematical formulation
Well-studied optimization theory
Efficient verification methods [Gowal et al., 2019, Weng et al., 2018, Zhang et al., 2018]
Natural interpretability (maximum per-pixel change for \(\ell_\infty\))
Widespread adoption: FGSM [Goodfellow et al., 2015], PGD [Madry et al., 2018], C&W attacks [Carlini and Wagner, 2017] all use \(\ell_p\) norms. Most verification benchmarks (VNN-COMP, certified accuracy metrics) use \(\ell_p\).
What \(\ell_p\) Misses¶
Semantic meaningfulness: \(\ell_p\) perturbations often aren’t semantically meaningful. Small \(\ell_2\) changes might move “airplane” to “nonsense image” rather than to another valid airplane.
Perceptual realism: Humans don’t perceive \(\ell_p\) distance. Two images with small \(\ell_2\) distance might look very different; two images with large \(\ell_2\) distance might look similar.
Physical realizability: \(\ell_p\) perturbations assume arbitrary pixel control. Physical attackers face constraints (lighting, viewpoint, printability).
Distribution shift: Real-world deployment faces natural distribution shift (different demographics, weather, hardware) not captured by \(\ell_p\) perturbations.
The \(\ell_p\) Paradox
\(\ell_p\) balls are:
Mathematically convenient: Easy to formulate, verify, optimize
Practically limited: Don’t capture many real-world threats
Result: Standard verification provides guarantees against threats that may not occur, while missing threats that do occur.
Semantic Perturbations¶
Semantic perturbations modify meaningful attributes (pose, lighting, color) rather than arbitrary pixels.
What Are Semantic Perturbations?¶
Definition: Perturbations that change semantic properties while maintaining object identity.
Examples:
Rotation: Rotate object by angle \(\theta\)
Translation: Shift object position
Scaling: Resize object
Illumination: Change brightness, contrast, color temperature
3D pose: Change viewpoint in 3D space
Weather: Add rain, snow, fog effects
Key property: Semantic perturbations produce valid, realistic images. Unlike \(\ell_p\) noise, they correspond to real-world variations.
where \(T\) is a semantic transformation parameterized by \(\theta\) (e.g., rotation angle, brightness factor).
Verification Approaches¶
Data augmentation bounds [Cohen et al., 2019]:
Define distribution over transformations \(T \sim \mathcal{D}\)
Use randomized smoothing with transformation distribution
Certified radius in transformation space (e.g., rotation up to \(\pm 30°\))
Geometric verification:
For specific transformations (rotation, translation), use geometric reasoning
Example: Verify rotation invariance by checking all rotations in a range
Abstraction refinement:
Abstract semantic transformations into over-approximations
Use standard verification [Singh et al., 2019] on abstracted space
Refine if bounds too loose
Challenge: Semantic transformations are non-linear (rotation involves trigonometry, scaling changes image structure). Standard linear verification [Weng et al., 2018, Zhang et al., 2018] doesn’t directly apply.
Current state: Verification for semantic perturbations is less mature than \(\ell_p\). Randomized smoothing [Cohen et al., 2019] with transformation distributions is most practical approach.
Physical Adversarial Examples¶
Physical adversarial examples exist in the physical world—patches, stickers, objects that fool cameras/sensors.
Physical Realizability Constraints¶
Printability: Perturbations must be printable with available printers (discrete color gamut, limited resolution).
Viewpoint robustness: Must fool the network from multiple viewing angles, distances.
Lighting robustness: Must work under various lighting conditions.
Environmental factors: Shadows, reflections, weather affect appearance.
Physical vs Digital Attacks
Digital (\(\ell_p\)):
Arbitrary pixel control
Perfect precision
Works in digital space only
Physical:
Limited to printable/fabricable perturbations
Subject to environmental variations
Works in real world (cameras, sensors)
Physical attacks are constrained but more dangerous: They work against deployed systems in the real world.
Adversarial Patches¶
Adversarial patch [Brown et al., 2017]: A localized region (sticker, poster) designed to fool a network when placed in the scene.
Formulation: Optimize a patch \(p\) to cause misclassification when applied to image:
where:
\(p\): Patch content
\(x\): Base image
\(t\): Patch location/transformation
\(l\): Lighting/environmental conditions
\(A\): Applies patch to image
Verification: Proving robustness to patches is hard—must show no patch of size \(k \times k\) anywhere in the image can fool the network.
Approaches:
Certified patch defense: For small patches (single pixel), certify using masking and aggregation
Randomized ablation: Randomly mask portions of input, aggregate predictions
Segmentation-based: Use object segmentation to identify and ignore suspicious regions
Current state: Verification for general patch attacks is largely open. Most work focuses on specific patch sizes or defensive strategies rather than complete verification.
Physical World Attacks¶
Stop sign attacks: Stickers on stop signs causing misclassification [Eykholt et al., 2018].
3D-printed objects: Adversarial 3D objects (turtles, cars) designed to fool classifiers from multiple angles.
Wearable adversarial examples: Clothing, glasses designed to fool facial recognition.
Verification challenge: Must account for:
Camera optics (distortion, blur)
Lighting variations
Viewing angle changes
Environmental noise
No standard verification framework yet. Research explores:
Simulation-based testing (render scenes under various conditions)
Differentiable rendering for optimization
Statistical guarantees via sampling (probabilistic certification)
Sparse Perturbations (\(\ell_0\) Attacks)¶
\(\ell_0\) attacks** modify a limited number of pixels, potentially by large amounts.
\(\ell_0\) Threat Model¶
Definition: Adversarial example with at most \(k\) pixels changed:
where \(\|x - x_0\|_0\) counts non-zero entries (number of changed pixels).
Difference from \(\ell_p\):
\(\ell_p\): Small changes to many pixels
\(\ell_0\): Large changes to few pixels
Physical intuition: Corresponds to localized damage (dead pixels, sensor noise, occlusions).
Verification for \(\ell_0\)¶
Combinatorial challenge: With \(d\)-dimensional input and budget \(k\), there are \(\binom{d}{k}\) possible pixel sets to perturb. For images (\(d \sim 3000\)), even \(k=10\) gives billions of combinations.
Complete verification: Enumerate all \(\binom{d}{k}\) combinations, verify each. Intractable for realistic \(d, k\).
Incomplete approaches:
Greedy search: Iteratively select most influential pixels to perturb
Heuristic bounds: Use pixel importance scores to focus verification
Randomized certification: Sample pixel subsets, aggregate results probabilistically
Specialized defenses:
Median filtering: Replacing pixels with neighborhood medians mitigates sparse attacks
Randomized masking: Randomly mask pixels during inference, reducing attack success
Current state: Limited verification methods exist for \(\ell_0\). Most work focuses on attack generation and specific defenses rather than general certification.
Distributional Robustness¶
Distributional robustness ensures performance across related but distinct data distributions.
Distribution Shift Problem¶
Training distribution \(\mathcal{D}_{\text{train}}\) vs test distribution \(\mathcal{D}_{\text{test}}\):
Different demographics, ages, skin tones (facial recognition)
Different weather, lighting (autonomous driving)
Different hardware, cameras (medical imaging)
Standard ML assumption: \(\mathcal{D}_{\text{train}} = \mathcal{D}_{\text{test}}\). In practice, this often fails.
Robustness goal: Ensure good performance on \(\mathcal{D}_{\text{test}}\) even when different from \(\mathcal{D}_{\text{train}}\).
Formulations¶
Worst-case distribution in neighborhood of training distribution:
where \(d\) is a distribution distance (e.g., Wasserstein, KL divergence).
Distributionally robust optimization (DRO):
where \(\mathcal{U}\) is an uncertainty set around training distribution.
Verification and Certification¶
Challenge: Verifying distributional robustness requires reasoning about sets of distributions, not individual inputs.
Approaches:
DRO training: Train networks to be distributionally robust. Verification comes from training objective guarantees.
Statistical testing: Sample from candidate test distributions, check performance empirically.
Causal reasoning: Use causal models to bound performance under interventions/shifts.
Connection to \(\ell_p\): Individual \(\ell_p\) robustness can be viewed as distributional robustness under uniform perturbations. Distributional robustness generalizes this to structured shifts.
Current state: Distributional robustness is active research area. Practical verification methods are limited; most work focuses on training for distributional robustness rather than verifying it post-hoc.
Domain-Specific Threat Models¶
Different application domains have unique threat models:
Autonomous Driving¶
Threats:
Weather conditions (rain, fog, snow)
Lighting changes (day/night, shadows)
Sensor noise (camera, LiDAR)
Adversarial objects (misleading signs, fake lane markings)
Verification needs:
Safety properties (don’t hit pedestrians, stay in lane)
Continuous safety (properties hold over time, not just single frames)
Multi-modal robustness (camera + LiDAR consistency)
Approaches: Simulation-based testing, scenario coverage, formal methods for control systems.
Medical Imaging¶
Threats:
Scanner variations (different manufacturers, models)
Imaging artifacts (noise, contrast variations)
Patient variations (age, anatomy)
Adversarial manipulations (malicious image modification)
Verification needs:
Diagnostic accuracy across populations
Robustness to imaging protocols
Detection of manipulated images
Approaches: Cross-scanner validation, adversarial training for artifacts, uncertainty quantification.
Facial Recognition¶
Threats:
Demographic variations (age, race, gender)
Expression changes (smiling, frowning)
Accessories (glasses, hats, makeup)
Adversarial glasses/makeup
Verification needs:
Fairness across demographics
Robustness to accessories
Detection of adversarial examples
Approaches: Fairness-constrained training, adversarial robustness for accessories, liveness detection.
Composing Threat Models¶
Real-world robustness requires defending against multiple threat models simultaneously.
Multi-Perturbation Robustness¶
Union of threats: Robust to \(\ell_p\) AND semantic AND physical:
Challenge: Verifying union is harder than verifying each individually.
Approaches:
Modular verification: Verify each threat separately, compose results
Joint training: Train for multiple threats simultaneously
Hierarchical robustness: Ensure robustness to most general threat (implies robustness to specific ones)
Tradeoffs: Robustness to multiple threats may come with larger accuracy drop than single-threat robustness.
Verification Challenges for Alternative Threats¶
Why Alternative Threats Are Harder¶
Non-convexity: Semantic transformations (rotation, scaling) are non-linear. Standard linear verification [Weng et al., 2018, Zhang et al., 2018] doesn’t apply.
High dimensionality: Physical attacks involve many parameters (lighting, viewpoint, patch content). Verification must reason about this high-dimensional space.
Lack of structure: \(\ell_p\) balls have nice mathematical structure (convex, symmetric). Alternative threats often lack this, making verification harder.
Limited tooling: Most verification tools [Katz et al., 2019, Xu et al., 2020] assume \(\ell_p\). Adapting them to alternative threats requires significant modification.
Current State and Future Directions¶
Semantic perturbations: Randomized smoothing [Cohen et al., 2019] with transformation distributions is most mature. Geometric verification for specific transformations (rotation) emerging.
Physical adversarial examples: Certified patch defenses for small patches exist. General physical verification largely open problem.
Sparse perturbations: Limited verification methods. Mostly heuristic defenses and empirical evaluation.
Distributional robustness: DRO training provides some guarantees. Post-hoc verification largely unsolved.
Multi-threat robustness: Very early stage. Composing verification across threat models is open challenge.
Practical Recommendations¶
For Researchers¶
Define threat model carefully: What perturbations can adversaries actually perform? \(\ell_p\) may not capture real threats.
Develop domain-specific verification: Generic \(\ell_p\) verification isn’t enough. Build verification methods for your domain’s threats.
Combine approaches: Use \(\ell_p\) verification as baseline, augment with domain-specific testing/verification.
For Practitioners¶
Understand deployment threats: What attacks will your system face in practice? Physical patches? Distribution shift?
Multi-layered defense:
\(\ell_p\) adversarial training (baseline robustness)
Data augmentation for semantic/physical robustness
Runtime monitoring for anomalies
Regular re-training for distribution shift
Don’t rely solely on :math:`ell_p` certification: Use it as one component, not complete solution.
For Tool Builders¶
Extend beyond \(\ell_p\): Add support for semantic perturbations, sparse attacks, custom threat models.
Modular design: Allow users to define custom threat models, plug in verification methods.
Simulation support: Integrate with renderers, physics engines for physical adversarial example verification.
Final Thoughts¶
\(\ell_p\) threat models dominate neural network verification [Gowal et al., 2019, Weng et al., 2018, Zhang et al., 2018, Katz et al., 2019] for good reason: mathematical convenience, well-studied theory, and practical tools. But real-world threats extend far beyond \(\ell_p\).
Semantic perturbations, physical adversarial examples, sparse attacks, and distributional robustness each require different verification approaches. Some, like randomized smoothing for semantic perturbations [Cohen et al., 2019], are maturing. Others, like general physical verification, remain largely open.
The future of neural network verification lies in domain-aware methods: understanding the specific threats each application faces and building verification tailored to those threats. \(\ell_p\) verification provides a foundation, but comprehensive robustness demands going beyond it.
Understanding alternative threat models clarifies verification’s scope and limitations. \(\ell_p\) certification is valuable but incomplete. Building truly robust systems requires addressing the full spectrum of threats, each with its own verification challenges and opportunities.
Further Reading
This guide provides comprehensive coverage of threat models beyond \(\ell_p\) norms. For readers interested in diving deeper, we recommend the following resources organized by topic:
Semantic Perturbations:
Randomized smoothing [Cohen et al., 2019] extends naturally to semantic transformations by using transformation distributions instead of additive noise. This provides certified robustness to rotations, translations, and other meaningful perturbations with theoretical guarantees.
Physical Adversarial Examples:
Adversarial patch attacks [Brown et al., 2017] demonstrated that localized, printable perturbations can fool networks in the physical world. Real-world attacks on stop signs [Eykholt et al., 2018] showed practical deployment of these threats. Certified defenses remain limited to specific scenarios like small patch sizes.
Standard \(\ell_p\) Attacks for Context:
Understanding classical \(\ell_p\) attacks—FGSM [Goodfellow et al., 2015], PGD [Madry et al., 2018], C&W [Carlini and Wagner, 2017]—provides essential context for alternative threats. These represent the baseline against which alternative threat models are compared.
Verification Methods for \(\ell_p\):
Standard verification methods—CROWN [Weng et al., 2018, Zhang et al., 2018], DeepPoly [Singh et al., 2019], IBP [Gowal et al., 2019], Marabou [Katz et al., 2019]—primarily target \(\ell_p\) threats. Understanding their design helps clarify what modifications are needed for alternative threats.
Certified Training:
Training for robustness—whether adversarial [Madry et al., 2018], certified [Gowal et al., 2019, Zhang et al., 2020], or randomized smoothing-based [Cohen et al., 2019]—typically focuses on \(\ell_p\). Extending certified training to alternative threats is active research.
GPU Acceleration:
GPU-accelerated verification [Xu et al., 2020] has primarily benefited \(\ell_p\) verification. Adapting these techniques to alternative threat models could enable practical verification at scale for non-\(\ell_p\) threats.
Theoretical Barriers:
The NP-completeness of verification [Katz et al., 2017, Weng et al., 2018] applies regardless of threat model. Alternative threats don’t escape computational barriers but may present different practical challenges.
Related Topics:
For understanding \(\ell_p\) threat models that alternative threats extend, see Threat Models in Neural Network Verification. For verification methods primarily designed for \(\ell_p\), see Bound Propagation Approaches, Marabou and Reluplex: Extended Simplex for Verification, and Branch-and-Bound Verification. For certified defenses including randomized smoothing, see Certified Defenses and Randomized Smoothing. For practical robustness testing across threat models, see Robustness Testing Guide.
Next Guide
Continue to Verifying Diverse Architectures to explore unique verification challenges for Transformers, RNNs, Graph Neural Networks, and other modern architectures.
TB Brown, D Mané, A Roy, M Abadi, and J Gilmer. Adversarial patch. arxiv e-prints (dec. 2017). arXiv preprint cs.CV/1712.09665, 1(2):4, 2017.
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), 39–57. IEEE, 2017.
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1625–1634. 2018.
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations. 2015.
Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.
Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.
Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.
Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.
Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.
Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.
Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.
Comments & Discussion