Training Robust Networks

You’ve just verified your neural network and discovered it’s vulnerable. A small adversarial perturbation can fool your model, despite impressive accuracy on clean data. Now the critical question arises: how do you fix this? The answer isn’t to simply test more—it’s to train differently from the start.

Standard training optimizes for one thing: minimizing loss on clean data. It says nothing about how the network behaves under adversarial perturbation. This is the training-verification gap. You can verify a network after it’s trained, but the real power comes from building robustness into the training process itself. This guide explores how to do exactly that.

The Challenge: Why Standard Training Fails

Neural networks trained with standard cross-entropy loss learn decision boundaries that are surprisingly fragile. Small perturbations can flip predictions entirely. This isn’t a flaw in the architecture—it’s a consequence of the training objective.

Standard training minimizes:

\[L_{\text{standard}} = \mathbb{E}_{(x,y)}[L(f_\theta(x), y)]\]

This objective only cares about performance on unperturbed inputs. The network has zero incentive to remain robust to adversarial examples because the training data contains no adversarial examples. The brittleness isn’t a surprise; it’s inevitable given the objective.

There’s also a persistent myth about robustness and accuracy: they’re completely at odds. In reality, while there is a tradeoff, it’s manageable with the right approach. You can achieve both reasonable clean accuracy and meaningful robustness.

Key Insight

Training objective determines model properties. Want a robust network? Your loss function must reflect robustness.

Adversarial Training: Learning from Attacks

The most straightforward approach to robustness is intuitive: train on adversarial examples. If the network encounters adversarial perturbations during training, it might learn to resist them.

How it works: During training, you generate adversarial examples using an attack method (typically PGD—Projected Gradient Descent [Madry et al., 2018]). You then train the network to minimize loss on these perturbed inputs, not the original ones.

The adversarial training objective is:

\[L_{\text{adv}} = \mathbb{E}_{(x,y)}\left[\max_{\delta: \|\delta\|_\infty \leq \epsilon} L(f_\theta(x+\delta), y)\right]\]

This is a minimax formulation: for each input, you find the strongest adversarial perturbation within the epsilon ball, then train to minimize loss on that worst-case input. Practically, you approximate this with PGD (run gradient descent steps to find strong attacks).

Why it works: - Simple to understand and implement - Empirically effective: networks trained this way show meaningful robustness - Works with any architecture - Widely used and benchmarked

Limitations: - No formal guarantees: you’ve only tested a finite number of attacks - An attack failing to find adversarial examples doesn’t prove robustness - Computationally expensive compared to standard training - May not generalize to attacks different from those seen during training - Can overfit to the specific epsilon value used

When Adversarial Training Works Well

Use PGD-based adversarial training when you need practical robustness without formal guarantees, can afford the computational cost, and want a quick start to robustness exploration.

Tip

Epsilon scheduling improves results: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features first, then refine them under stronger attacks.

Certified Training: Provable Robustness by Design

If you need formal guarantees, adversarial training isn’t enough. Certified training uses verification bounds during training to ensure provable robustness. Rather than hoping your network is robust, you prove it.

The key insight: Include verified bounds in your loss function. When the network’s prediction is provably correct within an epsilon ball, the training can focus on harder regions.

TRADES: Balancing Accuracy and Robustness [Zhang et al., 2019]

The TRADES (Trade-off Adjusted Loss for Decoupling Exploration and Exploitation) method formalizes the accuracy-robustness tradeoff explicitly:

\[L_{\text{TRADES}} = L(f_\theta(x), y) + \beta \cdot D_{\text{KL}}(f_\theta(x) \| f_\theta(x'))\]

where x’ is an adversarially perturbed version of x, and \(\beta\) controls the weight of the robustness term. The first term maintains clean accuracy; the second encourages consistent predictions under perturbation.

The beta parameter is crucial: larger values prioritize robustness over clean accuracy. Finding the right beta requires experimentation for your specific domain.

IBP Training: Interval Bound Propagation [Gowal et al., 2019]

IBP-based training is more direct: propagate interval bounds through your network and use worst-case loss. For each layer, compute lower and upper bounds on neuron outputs given the input perturbation region. Use the worst-case (most pessimistic) bounds to define a verified loss.

This approach guarantees that if training succeeds, the network is verifiably robust. The tradeoff: IBP bounds are often conservative, so the certified robustness might be weaker than what empirical testing could achieve. However, what you get is a formal proof.

CROWN-based Training: Tighter Bounds [Zhang et al., 2018, Xu et al., 2020]

CROWN (Certified Robustness for Deep Neural Networks) uses backward bound propagation to compute tighter bounds than IBP. The tighter bounds mean training can achieve better accuracy-robustness tradeoffs. Tools like auto_LiRPA [Xu et al., 2020] make this practical for larger networks. CROWN-based training is more computationally intensive but produces networks with higher certified accuracy at the same epsilon.

Comparison of Certified Methods

Method

Bound Type

Tightness

Computational Cost

Typical Use

TRADES

Implicit KL

Variable

Moderate

Fast certified training, practical choice

IBP Training

Interval

Conservative

Reasonable

Large networks, when speed matters

CROWN-based

Backward propagation

Tight

Expensive

Critical applications, smaller networks

Randomized Smoothing [Cohen et al., 2019]

Probabilistic

Statistical

Very expensive

Highest guarantees, offline certification

Certified vs Empirical

Certified training trades computational resources for formal guarantees. A network trained with IBP is provably robust to within computed bounds. A network trained with PGD-adversarial is empirically robust to the attacks it survived, but nothing more.

Hint

For production systems, start with TRADES or IBP training. Only switch to CROWN-based training if you’ve exhausted other options and still need tighter bounds.

Key Papers on Certified Training

TRADES [Zhang et al., 2019] provides theoretically principled trade-off between accuracy and robustness. Scalable certified training with IBP [Gowal et al., 2019, Gowal et al., 2021] enables provable robustness for large networks. CROWN-based training [Zhang et al., 2018, Xu et al., 2020] achieves tighter bounds through backward propagation. Randomized smoothing [Cohen et al., 2019] offers an alternative probabilistic certification approach.

Architecture Choices for Trainability

Not all architectures are equally amenable to robust training. Some designs make robust learning easier; others fight against it.

Activation Functions

ReLU networks are easiest to train robustly because they’re piecewise linear and amenable to tight bound propagation. However, ReLU can suffer from dead neurons. Smooth activations (Sigmoid, Tanh, GeLU) provide better gradient flow but create verification challenges—bounds become looser. If using certified training, ReLU remains the safest default, though modern tools can handle contemporary activation functions reasonably well. See the activation functions blog for deeper technical details on verification tradeoffs.

Depth vs Width

Deeper networks are harder to verify (bounds accumulate error through layers) but often learn more efficiently. Wider networks are easier to verify but require more parameters. For robust training, consider your computational budget: deeper networks need tighter verification methods, while wider networks can use faster methods.

Skip Connections and Residual Networks

Residual connections (x → x + f(x)) help training convergence and work well with certified methods because bounds can exploit the skip connection structure. Modern architectures like ResNets are acceptable for certified training—the benefits of better training dynamics outweigh the verification complications.

Normalization Layers

Batch normalization during training affects verification at deployment (you typically use running statistics, not batch statistics). This mismatch can impact certified robustness. Consider this during architecture design: if you need formal guarantees at test time, document how normalization interacts with verification.

Design Choice

Impact on Training

Impact on Verification

Recommendation

ReLU activation

Good gradients, some dead neurons

Tight bounds, piecewise linear

Best for certified training

Smooth activations

Excellent gradients

Looser bounds

Acceptable with modern tools

Deeper networks

Better feature learning

Harder verification

Use tight methods (CROWN) for certification

Wider networks

More parameters needed

Easier verification

Use faster methods (IBP) for scale

Skip connections

Faster convergence

Exploitable by bounds

Recommended

Batch normalization

Essential for training

Train-test mismatch

Document carefully

Tip

For verifiable architectures: prefer ReLU or modern activations with adequate tools, use reasonable depth, add width where needed, include skip connections, and carefully document normalization behavior.

Training vs Deployment

The architecture you train with might differ from what you deploy. Training-time architectures can use BatchNorm, dropout, and layer-specific optimizations. Deployment architectures should match what you actually verified.

The Training-Verification Loop

The most effective approach combines training and verification in an iterative cycle. Train, verify, analyze failures, retrain focusing on problems—repeat until targets are met.

Step 1: Train with Robustness Objective

Start with either adversarial training (for quick iteration) or certified training (for guarantees). Train on your full training set, tracking both clean and robust accuracy.

Step 2: Verify on Representative Samples

Run verification on a representative subset of your test set. Use a fast method first to get initial estimates.

Step 3: Analyze Failures

For each sample that couldn’t be verified, ask: Is it a real vulnerability or a verification artifact? Generate adversarial examples to check. If you can find an actual adversarial example, it’s a real problem. If you can’t despite trying hard, the verifier is likely being conservative.

Step 4: Retrain with Focus

Include the problematic samples in your next training run. Use refined hyperparameters to fine-tune robustness in these regions.

Step 5: Iterate

Repeat steps 2-4 until you hit your robustness targets.

This loop typically takes several iterations to see substantial improvements. Early iterations fix obvious vulnerabilities; later iterations refine robustness on edge cases.

Iteration is Key

The training-verification loop compounds improvements. Each iteration focuses on real weaknesses found by verification, making training more targeted than adversarial training alone.

Hint

Know when to stop iterating. If certified accuracy plateaus, it’s time to either accept current robustness, try a different verification method, or reconsider your threat model.

Practical Considerations and Resource Management

Robust training is computationally expensive. A single step of PGD-based adversarial training requires multiple forward-backward passes. Certified training adds bound computation overhead. Understanding the costs helps you budget appropriately.

Computational Costs

  • Standard training: Baseline reference point

  • PGD-adversarial: Significantly more expensive than baseline

  • TRADES: Moderate overhead compared to full adversarial training

  • IBP training: Moderate to significant overhead

  • CROWN-based training: Most expensive approach

Certified methods aren’t just slower; they often struggle on large networks. If you have substantial computational resources and flexibility, CROWN-based training is powerful. If resources are limited, TRADES or IBP are more practical.

Hyperparameter Selection

The epsilon value is most critical. Choose it based on: - Perceptual plausibility: Is the perturbation human-imperceptible? - Domain knowledge: What perturbations are realistic in your application? - Benchmark standards: Existing literature provides guidance for your domain

Your learning rate likely needs adjustment. Robust training has a different loss landscape—gradients behave differently. Start with standard learning rates, then adjust if training becomes unstable.

Batch size affects robustness. Larger batches tend to produce more robust models than small batches. If your budget allows, increase batch size for robust training.

Data Augmentation and Curriculum Learning

Standard augmentation (flips, crops, color jittering) complements robust training. You can also use curriculum learning: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features before facing harder perturbations.

Early Stopping

Track both clean and robust validation accuracy throughout training. Stop when robust accuracy plateaus, not when clean accuracy maxes out. A network balancing clean accuracy with meaningful robustness is often more valuable than maximizing clean accuracy with poor robustness.

Training Method

Computational Cost

Memory Overhead

Typical Scale

When to Use

Standard

Baseline

Minimal

Large datasets

Baseline only

PGD-Adversarial

High

Moderate

Medium datasets

Practical robustness

TRADES

Moderate

Minimal

Large datasets

Production systems

IBP Training

Moderate

Moderate

Medium to large

Certified robustness

CROWN-based

Very high

High

Small to medium

Safety-critical systems

Tip

Start with TRADES: it offers a good balance of cost and robustness. Only move to more expensive methods if TRADES isn’t achieving your targets.

Tip

Epsilon scheduling: Begin training with small epsilon, then increase gradually toward your target. This allows progressive learning and often improves final robustness.

Hint

If training is slow, profile where time is spent. Different methods have different bottlenecks: attack generation, bound computation, or gradient calculation. Identifying the bottleneck guides optimization.

Comparing Training Approaches

Each method has different tradeoffs. The right choice depends on your requirements, computational budget, and how much guarantee you need.

Method

Robustness Guarantee

Accuracy Impact

Cost vs Standard

When to Use

Standard

None

Baseline

Baseline

Prototyping only

PGD-Adversarial

Empirical

Moderate reduction

High

Need robustness, limited guarantees

TRADES

Implicit

Modest reduction

Moderate

Production, moderate guarantees

IBP Training

Certified

Moderate reduction

Moderate

Certified robustness needed

CROWN-based

Certified (Tight)

Moderate reduction

High

Safety-critical, budget available

Hybrid Approach

Empirical + Certified

Moderate reduction

High

Maximum confidence, flexible budget

Hybrid Approaches

You don’t have to choose just one. Train with adversarial training for robustness, then apply certified training for final polish. Or train with TRADES, then verify and retrain weak samples. These hybrid approaches cost more but can yield the best results by combining the strengths of each method.

Comparing Training Approaches

Understanding the tradeoffs between different training approaches requires careful evaluation. Adversarial training [Madry et al., 2018] provides empirical robustness without formal guarantees. Certified training methods—TRADES [Zhang et al., 2019], IBP [Gowal et al., 2019], and CROWN [Zhang et al., 2018, Xu et al., 2020]—offer provable robustness with varying computational costs and tightness. Randomized smoothing [Cohen et al., 2019] provides probabilistic guarantees that scale to large networks.

Common Pitfalls and Best Practices

Learn from those who’ve walked this path before.

Pitfalls to Avoid

  1. Epsilon without justification: Choosing epsilon arbitrarily leads to meaningless robustness claims. Ground it in your threat model and domain characteristics.

  2. Not verifying the final model: You verified checkpoints, but is the final deployed model actually robust? Always verify the exact model you deploy.

  3. Overfitting to adversarial examples: Sometimes networks “memorize” adversarial features rather than learning robust representations. This fails to generalize to new perturbations.

  4. Ignoring the accuracy-robustness tradeoff: There is a tradeoff, but understanding it is important. Trading some clean accuracy for meaningful robustness is usually worthwhile.

  5. Wrong perturbation norm: Verifying against one norm when your actual threats use a different norm wastes effort and provides false confidence.

  6. Verification artifact confusion: Not all unverified samples are vulnerabilities. Some are just loose bounds. Empirical validation distinguishes them.

Best Practices

  1. Start with adversarial training: Get a quick sense of the robustness landscape before investing in certified methods.

  2. Use certified training for critical applications: If your system affects safety or security, the computational cost of certified training is justified.

  3. Track both clean and robust accuracy: Don’t just look at clean accuracy—it provides false security.

  4. Verify checkpoints throughout training: Identify when robustness improvement plateaus. Training longer doesn’t always help.

  5. Document everything: Epsilon value, threat model, architectural choices, hyperparameters, final metrics. Future context depends on this documentation.

  6. Test on multiple epsilon values: Robustness at one epsilon value doesn’t predict robustness at others. Test broadly.

  7. Combine with interpretability: Understanding where your network is weak (via verification) can guide architecture changes.

Hint

Debugging training failures: If robust accuracy is stuck near random performance, either epsilon is too large for your network capacity, or your training method isn’t converging properly. Try smaller epsilon, adjusted learning rates, or extended training.

Verification Doesn’t Replace Testing

Formal verification is powerful, but it’s not magic. Always combine it with adversarial attack evaluation, interpretability analysis, and domain expertise.

Final Thoughts

The marriage of training and verification represents a fundamental shift in thinking about robustness. Rather than hoping networks are robust and testing after the fact, we can build robustness in from the start. Rather than seeing verification as an inspection process, we use it as a training objective.

The field is evolving rapidly. Recent advances include:

  • More efficient certified training: New methods make advanced verification practical for larger networks

  • Learning-based bounds: Techniques that optimize bound parameters during training, tightening certificates automatically

  • Architecture search for verifiability: Automated methods to find architectures balancing accuracy, robustness, and verifiability

  • Hybrid defenses: Combining verified training with other robustness techniques

The right training method depends on your specific requirements. For research and prototyping, adversarial training offers speed and flexibility. For production systems, TRADES provides practical robustness with reasonable computational cost. For safety-critical systems, certified training (IBP or CROWN-based) provides formal guarantees.

No single approach is perfect. Tradeoffs exist between accuracy, robustness, computational cost, and verification tightness. But these tradeoffs are manageable with informed decisions based on your specific context.

The key insight: training objective determines model properties. Choose your loss function to reflect what matters to your application. If robustness matters, make it part of your training objective.

The training-verification loop is more powerful than either alone. Let verification findings guide training improvements. Let training innovations expand what verification can prove. This iterative, complementary relationship is where the real progress happens.

Start verifying your current network. Identify its weaknesses. Then retrain with the methods described here. Iterate. The networks you build this way—robust by design—represent the future of trustworthy machine learning.

Looking Forward

The convergence of better training algorithms, tighter verification methods, and neural architecture search will likely produce a future where robust and verifiable networks become standard practice rather than special-case engineering.

Further Reading

This guide provides comprehensive coverage of training robust and verifiable neural networks. For readers interested in diving deeper, we recommend the following resources organized by topic:

Adversarial Training:

PGD-based adversarial training [Madry et al., 2018] established the foundation for empirical robustness through training on strong adversarial examples. This minimax optimization approach produces networks with practical robustness, though without formal guarantees. The method remains widely used for its simplicity and effectiveness.

Certified Training Methods:

TRADES [Zhang et al., 2019] provides a theoretically principled framework for balancing clean accuracy and robustness through a KL-divergence regularization term. IBP-based training [Gowal et al., 2019, Gowal et al., 2021] enables scalable certified training by propagating interval bounds through the network and optimizing worst-case loss. CROWN-based training [Zhang et al., 2018, Xu et al., 2020] achieves tighter certified bounds through backward bound propagation, with auto_LiRPA [Xu et al., 2020] providing efficient GPU-accelerated implementation for practical use.

Probabilistic Certification:

Randomized smoothing [Cohen et al., 2019] offers an alternative to deterministic certification by providing probabilistic robustness guarantees that scale to large networks. This approach samples from noise distributions to certify robustness, trading deterministic guarantees for better scalability.

Verification Methods for Training:

The training-verification loop relies on sound verification methods used as training objectives. Bound propagation approaches [Gowal et al., 2019, Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018] provide the foundation for certified training by computing output bounds given input perturbations. Complete verification methods [Katz et al., 2017, Tjeng et al., 2019] can validate training results, though they’re typically too expensive for use within the training loop itself.

Related Topics:

For testing networks before and after robust training, see Robustness Testing Guide. For understanding the verification techniques used as training objectives, see Bound Propagation Approaches. For advanced certified defense strategies beyond training, see Certified Defenses and Randomized Smoothing. For how activation function choices impact verifiability, see Beyond ReLU: Modern Activation Functions.

Next Guide

Continue to Certified Defenses and Randomized Smoothing to explore randomized smoothing and other probabilistic certified defense mechanisms that scale to large networks.

[1] (1,2,3,4)

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[2] (1,2,3,4,5)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[3] (1,2)

Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data. Advances in Neural Information Processing Systems, 2021.

[4]

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: an efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer, 2017.

[5] (1,2,3)

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.

[6]

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[7]

Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.

[8]

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[9] (1,2,3,4,5,6)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[10] (1,2,3,4)

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR, 2019.

[11] (1,2,3,4,5)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.