Training Robust and Verifiable Neural Networks#
You’ve just verified your neural network and discovered it’s vulnerable. A small adversarial perturbation can fool your model, despite impressive accuracy on clean data. Now the critical question arises: how do you fix this? The answer isn’t to simply test more—it’s to train differently from the start.
Standard training optimizes for one thing: minimizing loss on clean data. It says nothing about how the network behaves under adversarial perturbation. This is the training-verification gap. You can verify a network after it’s trained, but the real power comes from building robustness into the training process itself. This guide explores how to do exactly that.
The Challenge: Why Standard Training Fails#
Neural networks trained with standard cross-entropy loss learn decision boundaries that are surprisingly fragile. Small perturbations can flip predictions entirely. This isn’t a flaw in the architecture—it’s a consequence of the training objective.
Standard training minimizes:
This objective only cares about performance on unperturbed inputs. The network has zero incentive to remain robust to adversarial examples because the training data contains no adversarial examples. The brittleness isn’t a surprise; it’s inevitable given the objective.
There’s also a persistent myth about robustness and accuracy: they’re completely at odds. In reality, while there is a tradeoff, it’s manageable with the right approach. You can achieve both reasonable clean accuracy and meaningful robustness.
Key Insight
Training objective determines model properties. Want a robust network? Your loss function must reflect robustness.
Adversarial Training: Learning from Attacks#
The most straightforward approach to robustness is intuitive: train on adversarial examples. If the network encounters adversarial perturbations during training, it might learn to resist them.
How it works: During training, you generate adversarial examples using an attack method (typically PGD—Projected Gradient Descent). You then train the network to minimize loss on these perturbed inputs, not the original ones.
The adversarial training objective is:
This is a minimax formulation: for each input, you find the strongest adversarial perturbation within the epsilon ball, then train to minimize loss on that worst-case input. Practically, you approximate this with PGD (run gradient descent steps to find strong attacks).
Why it works: - Simple to understand and implement - Empirically effective: networks trained this way show meaningful robustness - Works with any architecture - Widely used and benchmarked
Limitations: - No formal guarantees: you’ve only tested a finite number of attacks - An attack failing to find adversarial examples doesn’t prove robustness - Computationally expensive compared to standard training - May not generalize to attacks different from those seen during training - Can overfit to the specific epsilon value used
When Adversarial Training Works Well
Use PGD-based adversarial training when you need practical robustness without formal guarantees, can afford the computational cost, and want a quick start to robustness exploration.
Tip
Epsilon scheduling improves results: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features first, then refine them under stronger attacks.
Certified Training: Provable Robustness by Design#
If you need formal guarantees, adversarial training isn’t enough. Certified training uses verification bounds during training to ensure provable robustness. Rather than hoping your network is robust, you prove it.
The key insight: Include verified bounds in your loss function. When the network’s prediction is provably correct within an epsilon ball, the training can focus on harder regions.
TRADES: Balancing Accuracy and Robustness
The TRADES (Trade-off Adjusted Loss for Decoupling Exploration and Exploitation) method formalizes the accuracy-robustness tradeoff explicitly:
where x’ is an adversarially perturbed version of x, and \(\beta\) controls the weight of the robustness term. The first term maintains clean accuracy; the second encourages consistent predictions under perturbation.
The beta parameter is crucial: larger values prioritize robustness over clean accuracy. Finding the right beta requires experimentation for your specific domain.
IBP Training: Interval Bound Propagation
IBP-based training is more direct: propagate interval bounds through your network and use worst-case loss. For each layer, compute lower and upper bounds on neuron outputs given the input perturbation region. Use the worst-case (most pessimistic) bounds to define a verified loss.
This approach guarantees that if training succeeds, the network is verifiably robust. The tradeoff: IBP bounds are often conservative, so the certified robustness might be weaker than what empirical testing could achieve. However, what you get is a formal proof.
CROWN-based Training: Tighter Bounds
CROWN (Certified Robustness for Deep Neural Networks) uses backward bound propagation to compute tighter bounds than IBP. The tighter bounds mean training can achieve better accuracy-robustness tradeoffs. Tools like auto_LiRPA make this practical for larger networks. CROWN-based training is more computationally intensive but produces networks with higher certified accuracy at the same epsilon.
Comparison of Certified Methods
Method |
Bound Type |
Tightness |
Computational Cost |
Typical Use |
|---|---|---|---|---|
TRADES |
Implicit KL |
Variable |
Moderate |
Fast certified training, practical choice |
IBP Training |
Interval |
Conservative |
Reasonable |
Large networks, when speed matters |
CROWN-based |
Backward propagation |
Tight |
Expensive |
Critical applications, smaller networks |
Randomized Smoothing |
Probabilistic |
Statistical |
Very expensive |
Highest guarantees, offline certification |
Certified vs Empirical
Certified training trades computational resources for formal guarantees. A network trained with IBP is provably robust to within computed bounds. A network trained with PGD-adversarial is empirically robust to the attacks it survived, but nothing more.
Hint
For production systems, start with TRADES or IBP training. Only switch to CROWN-based training if you’ve exhausted other options and still need tighter bounds.
Note
Further reading:
TRADES: Theoretically Principled Trade-off for Robust Classification (ICML 2019) - Zhang et al.
Scalable Verified Training for Provably Robust Image Classification (ICML 2020) - Gowal et al.
A Convex Relaxation Barrier to Tight Robustness Verification - Discussion of IBP limitations
Certified Defenses for Data Poisoning Attacks (NeurIPS 2019) - Using verification in adversarial contexts
Architecture Choices for Trainability#
Not all architectures are equally amenable to robust training. Some designs make robust learning easier; others fight against it.
Activation Functions
ReLU networks are easiest to train robustly because they’re piecewise linear and amenable to tight bound propagation. However, ReLU can suffer from dead neurons. Smooth activations (Sigmoid, Tanh, GeLU) provide better gradient flow but create verification challenges—bounds become looser. If using certified training, ReLU remains the safest default, though modern tools can handle contemporary activation functions reasonably well. See the activation functions blog for deeper technical details on verification tradeoffs.
Depth vs Width
Deeper networks are harder to verify (bounds accumulate error through layers) but often learn more efficiently. Wider networks are easier to verify but require more parameters. For robust training, consider your computational budget: deeper networks need tighter verification methods, while wider networks can use faster methods.
Skip Connections and Residual Networks
Residual connections (x → x + f(x)) help training convergence and work well with certified methods because bounds can exploit the skip connection structure. Modern architectures like ResNets are acceptable for certified training—the benefits of better training dynamics outweigh the verification complications.
Normalization Layers
Batch normalization during training affects verification at deployment (you typically use running statistics, not batch statistics). This mismatch can impact certified robustness. Consider this during architecture design: if you need formal guarantees at test time, document how normalization interacts with verification.
Design Choice |
Impact on Training |
Impact on Verification |
Recommendation |
|---|---|---|---|
ReLU activation |
Good gradients, some dead neurons |
Tight bounds, piecewise linear |
Best for certified training |
Smooth activations |
Excellent gradients |
Looser bounds |
Acceptable with modern tools |
Deeper networks |
Better feature learning |
Harder verification |
Use tight methods (CROWN) for certification |
Wider networks |
More parameters needed |
Easier verification |
Use faster methods (IBP) for scale |
Skip connections |
Faster convergence |
Exploitable by bounds |
Recommended |
Batch normalization |
Essential for training |
Train-test mismatch |
Document carefully |
Tip
For verifiable architectures: prefer ReLU or modern activations with adequate tools, use reasonable depth, add width where needed, include skip connections, and carefully document normalization behavior.
Training vs Deployment
The architecture you train with might differ from what you deploy. Training-time architectures can use BatchNorm, dropout, and layer-specific optimizations. Deployment architectures should match what you actually verified.
The Training-Verification Loop#
The most effective approach combines training and verification in an iterative cycle. Train, verify, analyze failures, retrain focusing on problems—repeat until targets are met.
Step 1: Train with Robustness Objective
Start with either adversarial training (for quick iteration) or certified training (for guarantees). Train on your full training set, tracking both clean and robust accuracy.
Step 2: Verify on Representative Samples
Run verification on a representative subset of your test set. Use a fast method first to get initial estimates.
Step 3: Analyze Failures
For each sample that couldn’t be verified, ask: Is it a real vulnerability or a verification artifact? Generate adversarial examples to check. If you can find an actual adversarial example, it’s a real problem. If you can’t despite trying hard, the verifier is likely being conservative.
Step 4: Retrain with Focus
Include the problematic samples in your next training run. Use refined hyperparameters to fine-tune robustness in these regions.
Step 5: Iterate
Repeat steps 2-4 until you hit your robustness targets.
This loop typically takes several iterations to see substantial improvements. Early iterations fix obvious vulnerabilities; later iterations refine robustness on edge cases.
Iteration is Key
The training-verification loop compounds improvements. Each iteration focuses on real weaknesses found by verification, making training more targeted than adversarial training alone.
Hint
Know when to stop iterating. If certified accuracy plateaus, it’s time to either accept current robustness, try a different verification method, or reconsider your threat model.
Practical Considerations and Resource Management#
Robust training is computationally expensive. A single step of PGD-based adversarial training requires multiple forward-backward passes. Certified training adds bound computation overhead. Understanding the costs helps you budget appropriately.
Computational Costs
Standard training: Baseline reference point
PGD-adversarial: Significantly more expensive than baseline
TRADES: Moderate overhead compared to full adversarial training
IBP training: Moderate to significant overhead
CROWN-based training: Most expensive approach
Certified methods aren’t just slower; they often struggle on large networks. If you have substantial computational resources and flexibility, CROWN-based training is powerful. If resources are limited, TRADES or IBP are more practical.
Hyperparameter Selection
The epsilon value is most critical. Choose it based on: - Perceptual plausibility: Is the perturbation human-imperceptible? - Domain knowledge: What perturbations are realistic in your application? - Benchmark standards: Existing literature provides guidance for your domain
Your learning rate likely needs adjustment. Robust training has a different loss landscape—gradients behave differently. Start with standard learning rates, then adjust if training becomes unstable.
Batch size affects robustness. Larger batches tend to produce more robust models than small batches. If your budget allows, increase batch size for robust training.
Data Augmentation and Curriculum Learning
Standard augmentation (flips, crops, color jittering) complements robust training. You can also use curriculum learning: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features before facing harder perturbations.
Early Stopping
Track both clean and robust validation accuracy throughout training. Stop when robust accuracy plateaus, not when clean accuracy maxes out. A network balancing clean accuracy with meaningful robustness is often more valuable than maximizing clean accuracy with poor robustness.
Training Method |
Computational Cost |
Memory Overhead |
Typical Scale |
When to Use |
|---|---|---|---|---|
Standard |
Baseline |
Minimal |
Large datasets |
Baseline only |
PGD-Adversarial |
High |
Moderate |
Medium datasets |
Practical robustness |
TRADES |
Moderate |
Minimal |
Large datasets |
Production systems |
IBP Training |
Moderate |
Moderate |
Medium to large |
Certified robustness |
CROWN-based |
Very high |
High |
Small to medium |
Safety-critical systems |
Tip
Start with TRADES: it offers a good balance of cost and robustness. Only move to more expensive methods if TRADES isn’t achieving your targets.
Tip
Epsilon scheduling: Begin training with small epsilon, then increase gradually toward your target. This allows progressive learning and often improves final robustness.
Hint
If training is slow, profile where time is spent. Different methods have different bottlenecks: attack generation, bound computation, or gradient calculation. Identifying the bottleneck guides optimization.
Comparing Training Approaches#
Each method has different tradeoffs. The right choice depends on your requirements, computational budget, and how much guarantee you need.
Method |
Robustness Guarantee |
Accuracy Impact |
Cost vs Standard |
When to Use |
|---|---|---|---|---|
Standard |
None |
Baseline |
Baseline |
Prototyping only |
PGD-Adversarial |
Empirical |
Moderate reduction |
High |
Need robustness, limited guarantees |
TRADES |
Implicit |
Modest reduction |
Moderate |
Production, moderate guarantees |
IBP Training |
Certified |
Moderate reduction |
Moderate |
Certified robustness needed |
CROWN-based |
Certified (Tight) |
Moderate reduction |
High |
Safety-critical, budget available |
Hybrid Approach |
Empirical + Certified |
Moderate reduction |
High |
Maximum confidence, flexible budget |
Hybrid Approaches
You don’t have to choose just one. Train with adversarial training for robustness, then apply certified training for final polish. Or train with TRADES, then verify and retrain weak samples. These hybrid approaches cost more but can yield the best results by combining the strengths of each method.
Note
Further reading:
RobustBench: A Standardized Adversarial Robustness Benchmark - Comprehensive benchmark comparing training methods
The Tradeoff Between Accuracy and Robustness - Theoretical analysis of the accuracy-robustness relationship
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples - Fundamental limitations
Common Pitfalls and Best Practices#
Learn from those who’ve walked this path before.
Pitfalls to Avoid
Epsilon without justification: Choosing epsilon arbitrarily leads to meaningless robustness claims. Ground it in your threat model and domain characteristics.
Not verifying the final model: You verified checkpoints, but is the final deployed model actually robust? Always verify the exact model you deploy.
Overfitting to adversarial examples: Sometimes networks “memorize” adversarial features rather than learning robust representations. This fails to generalize to new perturbations.
Ignoring the accuracy-robustness tradeoff: There is a tradeoff, but understanding it is important. Trading some clean accuracy for meaningful robustness is usually worthwhile.
Wrong perturbation norm: Verifying against one norm when your actual threats use a different norm wastes effort and provides false confidence.
Verification artifact confusion: Not all unverified samples are vulnerabilities. Some are just loose bounds. Empirical validation distinguishes them.
Best Practices
Start with adversarial training: Get a quick sense of the robustness landscape before investing in certified methods.
Use certified training for critical applications: If your system affects safety or security, the computational cost of certified training is justified.
Track both clean and robust accuracy: Don’t just look at clean accuracy—it provides false security.
Verify checkpoints throughout training: Identify when robustness improvement plateaus. Training longer doesn’t always help.
Document everything: Epsilon value, threat model, architectural choices, hyperparameters, final metrics. Future context depends on this documentation.
Test on multiple epsilon values: Robustness at one epsilon value doesn’t predict robustness at others. Test broadly.
Combine with interpretability: Understanding where your network is weak (via verification) can guide architecture changes.
Hint
Debugging training failures: If robust accuracy is stuck near random performance, either epsilon is too large for your network capacity, or your training method isn’t converging properly. Try smaller epsilon, adjusted learning rates, or extended training.
Verification Doesn’t Replace Testing
Formal verification is powerful, but it’s not magic. Always combine it with adversarial attack evaluation, interpretability analysis, and domain expertise.
Final Thoughts#
The marriage of training and verification represents a fundamental shift in thinking about robustness. Rather than hoping networks are robust and testing after the fact, we can build robustness in from the start. Rather than seeing verification as an inspection process, we use it as a training objective.
The field is evolving rapidly. Recent advances include:
More efficient certified training: New methods make advanced verification practical for larger networks
Learning-based bounds: Techniques that optimize bound parameters during training, tightening certificates automatically
Architecture search for verifiability: Automated methods to find architectures balancing accuracy, robustness, and verifiability
Hybrid defenses: Combining verified training with other robustness techniques
The right training method depends on your specific requirements. For research and prototyping, adversarial training offers speed and flexibility. For production systems, TRADES provides practical robustness with reasonable computational cost. For safety-critical systems, certified training (IBP or CROWN-based) provides formal guarantees.
No single approach is perfect. Tradeoffs exist between accuracy, robustness, computational cost, and verification tightness. But these tradeoffs are manageable with informed decisions based on your specific context.
The key insight: training objective determines model properties. Choose your loss function to reflect what matters to your application. If robustness matters, make it part of your training objective.
The training-verification loop is more powerful than either alone. Let verification findings guide training improvements. Let training innovations expand what verification can prove. This iterative, complementary relationship is where the real progress happens.
Start verifying your current network. Identify its weaknesses. Then retrain with the methods described here. Iterate. The networks you build this way—robust by design—represent the future of trustworthy machine learning.
Looking Forward
The convergence of better training algorithms, tighter verification methods, and neural architecture search will likely produce a future where robust and verifiable networks become standard practice rather than special-case engineering.
Note
Further reading:
SoK: Certified Robustness for Deep Neural Networks (IEEE S&P 2023) - Comprehensive survey of training and verification approaches
TRADES: Theoretically Robust Adversarial Detraining (ICML 2019) - Influential adversarial training method
Certified Adversarial Robustness via Randomized Smoothing (ICML 2019) - Probabilistic certification foundation
auto_LiRPA: Towards Tight Verified Robustness via Backward Bound Propagation - State-of-the-art certified training approaches
Related Topics in This Series
Robustness Testing Guide - Testing networks before training for robustness
Bound Propagation Approaches - Verification techniques used during training
Certified Defenses and Randomized Smoothing - Advanced certified training methods
Activation Functions and Verification - How activation functions impact verification