Training Robust and Verifiable Neural Networks

Training Robust and Verifiable Neural Networks#

You’ve just verified your neural network and discovered it’s vulnerable. A small adversarial perturbation can fool your model, despite impressive accuracy on clean data. Now the critical question arises: how do you fix this? The answer isn’t to simply test more—it’s to train differently from the start.

Standard training optimizes for one thing: minimizing loss on clean data. It says nothing about how the network behaves under adversarial perturbation. This is the training-verification gap. You can verify a network after it’s trained, but the real power comes from building robustness into the training process itself. This guide explores how to do exactly that.

The Challenge: Why Standard Training Fails#

Neural networks trained with standard cross-entropy loss learn decision boundaries that are surprisingly fragile. Small perturbations can flip predictions entirely. This isn’t a flaw in the architecture—it’s a consequence of the training objective.

Standard training minimizes:

\[L_{\text{standard}} = \mathbb{E}_{(x,y)}[L(f_\theta(x), y)]\]

This objective only cares about performance on unperturbed inputs. The network has zero incentive to remain robust to adversarial examples because the training data contains no adversarial examples. The brittleness isn’t a surprise; it’s inevitable given the objective.

There’s also a persistent myth about robustness and accuracy: they’re completely at odds. In reality, while there is a tradeoff, it’s manageable with the right approach. You can achieve both reasonable clean accuracy and meaningful robustness.

Key Insight

Training objective determines model properties. Want a robust network? Your loss function must reflect robustness.

Adversarial Training: Learning from Attacks#

The most straightforward approach to robustness is intuitive: train on adversarial examples. If the network encounters adversarial perturbations during training, it might learn to resist them.

How it works: During training, you generate adversarial examples using an attack method (typically PGD—Projected Gradient Descent). You then train the network to minimize loss on these perturbed inputs, not the original ones.

The adversarial training objective is:

\[L_{\text{adv}} = \mathbb{E}_{(x,y)}\left[\max_{\delta: \|\delta\|_\infty \leq \epsilon} L(f_\theta(x+\delta), y)\right]\]

This is a minimax formulation: for each input, you find the strongest adversarial perturbation within the epsilon ball, then train to minimize loss on that worst-case input. Practically, you approximate this with PGD (run gradient descent steps to find strong attacks).

Why it works: - Simple to understand and implement - Empirically effective: networks trained this way show meaningful robustness - Works with any architecture - Widely used and benchmarked

Limitations: - No formal guarantees: you’ve only tested a finite number of attacks - An attack failing to find adversarial examples doesn’t prove robustness - Computationally expensive compared to standard training - May not generalize to attacks different from those seen during training - Can overfit to the specific epsilon value used

When Adversarial Training Works Well

Use PGD-based adversarial training when you need practical robustness without formal guarantees, can afford the computational cost, and want a quick start to robustness exploration.

Tip

Epsilon scheduling improves results: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features first, then refine them under stronger attacks.

Certified Training: Provable Robustness by Design#

If you need formal guarantees, adversarial training isn’t enough. Certified training uses verification bounds during training to ensure provable robustness. Rather than hoping your network is robust, you prove it.

The key insight: Include verified bounds in your loss function. When the network’s prediction is provably correct within an epsilon ball, the training can focus on harder regions.

TRADES: Balancing Accuracy and Robustness

The TRADES (Trade-off Adjusted Loss for Decoupling Exploration and Exploitation) method formalizes the accuracy-robustness tradeoff explicitly:

\[L_{\text{TRADES}} = L(f_\theta(x), y) + \beta \cdot D_{\text{KL}}(f_\theta(x) \| f_\theta(x'))\]

where x’ is an adversarially perturbed version of x, and \(\beta\) controls the weight of the robustness term. The first term maintains clean accuracy; the second encourages consistent predictions under perturbation.

The beta parameter is crucial: larger values prioritize robustness over clean accuracy. Finding the right beta requires experimentation for your specific domain.

IBP Training: Interval Bound Propagation

IBP-based training is more direct: propagate interval bounds through your network and use worst-case loss. For each layer, compute lower and upper bounds on neuron outputs given the input perturbation region. Use the worst-case (most pessimistic) bounds to define a verified loss.

This approach guarantees that if training succeeds, the network is verifiably robust. The tradeoff: IBP bounds are often conservative, so the certified robustness might be weaker than what empirical testing could achieve. However, what you get is a formal proof.

CROWN-based Training: Tighter Bounds

CROWN (Certified Robustness for Deep Neural Networks) uses backward bound propagation to compute tighter bounds than IBP. The tighter bounds mean training can achieve better accuracy-robustness tradeoffs. Tools like auto_LiRPA make this practical for larger networks. CROWN-based training is more computationally intensive but produces networks with higher certified accuracy at the same epsilon.

Comparison of Certified Methods

Method	Bound Type	Tightness	Computational Cost	Typical Use
TRADES	Implicit KL	Variable	Moderate	Fast certified training, practical choice
IBP Training	Interval	Conservative	Reasonable	Large networks, when speed matters
CROWN-based	Backward propagation	Tight	Expensive	Critical applications, smaller networks
Randomized Smoothing	Probabilistic	Statistical	Very expensive	Highest guarantees, offline certification

Certified vs Empirical

Certified training trades computational resources for formal guarantees. A network trained with IBP is provably robust to within computed bounds. A network trained with PGD-adversarial is empirically robust to the attacks it survived, but nothing more.

Hint

For production systems, start with TRADES or IBP training. Only switch to CROWN-based training if you’ve exhausted other options and still need tighter bounds.

Note

Architecture Choices for Trainability#

Not all architectures are equally amenable to robust training. Some designs make robust learning easier; others fight against it.

Activation Functions

ReLU networks are easiest to train robustly because they’re piecewise linear and amenable to tight bound propagation. However, ReLU can suffer from dead neurons. Smooth activations (Sigmoid, Tanh, GeLU) provide better gradient flow but create verification challenges—bounds become looser. If using certified training, ReLU remains the safest default, though modern tools can handle contemporary activation functions reasonably well. See the activation functions blog for deeper technical details on verification tradeoffs.

Depth vs Width

Deeper networks are harder to verify (bounds accumulate error through layers) but often learn more efficiently. Wider networks are easier to verify but require more parameters. For robust training, consider your computational budget: deeper networks need tighter verification methods, while wider networks can use faster methods.

Skip Connections and Residual Networks

Residual connections (x → x + f(x)) help training convergence and work well with certified methods because bounds can exploit the skip connection structure. Modern architectures like ResNets are acceptable for certified training—the benefits of better training dynamics outweigh the verification complications.

Normalization Layers

Batch normalization during training affects verification at deployment (you typically use running statistics, not batch statistics). This mismatch can impact certified robustness. Consider this during architecture design: if you need formal guarantees at test time, document how normalization interacts with verification.

Design Choice	Impact on Training	Impact on Verification	Recommendation
ReLU activation	Good gradients, some dead neurons	Tight bounds, piecewise linear	Best for certified training
Smooth activations	Excellent gradients	Looser bounds	Acceptable with modern tools
Deeper networks	Better feature learning	Harder verification	Use tight methods (CROWN) for certification
Wider networks	More parameters needed	Easier verification	Use faster methods (IBP) for scale
Skip connections	Faster convergence	Exploitable by bounds	Recommended
Batch normalization	Essential for training	Train-test mismatch	Document carefully

Tip

For verifiable architectures: prefer ReLU or modern activations with adequate tools, use reasonable depth, add width where needed, include skip connections, and carefully document normalization behavior.

Training vs Deployment

The architecture you train with might differ from what you deploy. Training-time architectures can use BatchNorm, dropout, and layer-specific optimizations. Deployment architectures should match what you actually verified.

The Training-Verification Loop#

The most effective approach combines training and verification in an iterative cycle. Train, verify, analyze failures, retrain focusing on problems—repeat until targets are met.

Step 1: Train with Robustness Objective

Start with either adversarial training (for quick iteration) or certified training (for guarantees). Train on your full training set, tracking both clean and robust accuracy.

Step 2: Verify on Representative Samples

Run verification on a representative subset of your test set. Use a fast method first to get initial estimates.

Step 3: Analyze Failures

For each sample that couldn’t be verified, ask: Is it a real vulnerability or a verification artifact? Generate adversarial examples to check. If you can find an actual adversarial example, it’s a real problem. If you can’t despite trying hard, the verifier is likely being conservative.

Step 4: Retrain with Focus

Include the problematic samples in your next training run. Use refined hyperparameters to fine-tune robustness in these regions.

Step 5: Iterate

Repeat steps 2-4 until you hit your robustness targets.

This loop typically takes several iterations to see substantial improvements. Early iterations fix obvious vulnerabilities; later iterations refine robustness on edge cases.

Iteration is Key

The training-verification loop compounds improvements. Each iteration focuses on real weaknesses found by verification, making training more targeted than adversarial training alone.

Hint

Know when to stop iterating. If certified accuracy plateaus, it’s time to either accept current robustness, try a different verification method, or reconsider your threat model.

Practical Considerations and Resource Management#

Robust training is computationally expensive. A single step of PGD-based adversarial training requires multiple forward-backward passes. Certified training adds bound computation overhead. Understanding the costs helps you budget appropriately.

Computational Costs

Standard training: Baseline reference point
PGD-adversarial: Significantly more expensive than baseline
TRADES: Moderate overhead compared to full adversarial training
IBP training: Moderate to significant overhead
CROWN-based training: Most expensive approach

Certified methods aren’t just slower; they often struggle on large networks. If you have substantial computational resources and flexibility, CROWN-based training is powerful. If resources are limited, TRADES or IBP are more practical.

Hyperparameter Selection

The epsilon value is most critical. Choose it based on: - Perceptual plausibility: Is the perturbation human-imperceptible? - Domain knowledge: What perturbations are realistic in your application? - Benchmark standards: Existing literature provides guidance for your domain

Your learning rate likely needs adjustment. Robust training has a different loss landscape—gradients behave differently. Start with standard learning rates, then adjust if training becomes unstable.

Batch size affects robustness. Larger batches tend to produce more robust models than small batches. If your budget allows, increase batch size for robust training.

Data Augmentation and Curriculum Learning

Standard augmentation (flips, crops, color jittering) complements robust training. You can also use curriculum learning: start with small epsilon and gradually increase during training. This allows the network to learn basic robust features before facing harder perturbations.

Early Stopping

Track both clean and robust validation accuracy throughout training. Stop when robust accuracy plateaus, not when clean accuracy maxes out. A network balancing clean accuracy with meaningful robustness is often more valuable than maximizing clean accuracy with poor robustness.

Training Method	Computational Cost	Memory Overhead	Typical Scale	When to Use
Standard	Baseline	Minimal	Large datasets	Baseline only
PGD-Adversarial	High	Moderate	Medium datasets	Practical robustness
TRADES	Moderate	Minimal	Large datasets	Production systems
IBP Training	Moderate	Moderate	Medium to large	Certified robustness
CROWN-based	Very high	High	Small to medium	Safety-critical systems

Tip

Start with TRADES: it offers a good balance of cost and robustness. Only move to more expensive methods if TRADES isn’t achieving your targets.

Tip

Epsilon scheduling: Begin training with small epsilon, then increase gradually toward your target. This allows progressive learning and often improves final robustness.

Hint

If training is slow, profile where time is spent. Different methods have different bottlenecks: attack generation, bound computation, or gradient calculation. Identifying the bottleneck guides optimization.

Comparing Training Approaches#

Each method has different tradeoffs. The right choice depends on your requirements, computational budget, and how much guarantee you need.

Method	Robustness Guarantee	Accuracy Impact	Cost vs Standard	When to Use
Standard	None	Baseline	Baseline	Prototyping only
PGD-Adversarial	Empirical	Moderate reduction	High	Need robustness, limited guarantees
TRADES	Implicit	Modest reduction	Moderate	Production, moderate guarantees
IBP Training	Certified	Moderate reduction	Moderate	Certified robustness needed
CROWN-based	Certified (Tight)	Moderate reduction	High	Safety-critical, budget available
Hybrid Approach	Empirical + Certified	Moderate reduction	High	Maximum confidence, flexible budget

Hybrid Approaches

You don’t have to choose just one. Train with adversarial training for robustness, then apply certified training for final polish. Or train with TRADES, then verify and retrain weak samples. These hybrid approaches cost more but can yield the best results by combining the strengths of each method.

Note

Common Pitfalls and Best Practices#

Learn from those who’ve walked this path before.

Pitfalls to Avoid

Epsilon without justification: Choosing epsilon arbitrarily leads to meaningless robustness claims. Ground it in your threat model and domain characteristics.
Not verifying the final model: You verified checkpoints, but is the final deployed model actually robust? Always verify the exact model you deploy.
Overfitting to adversarial examples: Sometimes networks “memorize” adversarial features rather than learning robust representations. This fails to generalize to new perturbations.
Ignoring the accuracy-robustness tradeoff: There is a tradeoff, but understanding it is important. Trading some clean accuracy for meaningful robustness is usually worthwhile.
Wrong perturbation norm: Verifying against one norm when your actual threats use a different norm wastes effort and provides false confidence.
Verification artifact confusion: Not all unverified samples are vulnerabilities. Some are just loose bounds. Empirical validation distinguishes them.

Best Practices

Start with adversarial training: Get a quick sense of the robustness landscape before investing in certified methods.
Use certified training for critical applications: If your system affects safety or security, the computational cost of certified training is justified.
Track both clean and robust accuracy: Don’t just look at clean accuracy—it provides false security.
Verify checkpoints throughout training: Identify when robustness improvement plateaus. Training longer doesn’t always help.
Document everything: Epsilon value, threat model, architectural choices, hyperparameters, final metrics. Future context depends on this documentation.
Test on multiple epsilon values: Robustness at one epsilon value doesn’t predict robustness at others. Test broadly.
Combine with interpretability: Understanding where your network is weak (via verification) can guide architecture changes.

Hint

Debugging training failures: If robust accuracy is stuck near random performance, either epsilon is too large for your network capacity, or your training method isn’t converging properly. Try smaller epsilon, adjusted learning rates, or extended training.

Verification Doesn’t Replace Testing

Formal verification is powerful, but it’s not magic. Always combine it with adversarial attack evaluation, interpretability analysis, and domain expertise.

Final Thoughts#

The marriage of training and verification represents a fundamental shift in thinking about robustness. Rather than hoping networks are robust and testing after the fact, we can build robustness in from the start. Rather than seeing verification as an inspection process, we use it as a training objective.

The field is evolving rapidly. Recent advances include:

More efficient certified training: New methods make advanced verification practical for larger networks
Learning-based bounds: Techniques that optimize bound parameters during training, tightening certificates automatically
Architecture search for verifiability: Automated methods to find architectures balancing accuracy, robustness, and verifiability
Hybrid defenses: Combining verified training with other robustness techniques

The right training method depends on your specific requirements. For research and prototyping, adversarial training offers speed and flexibility. For production systems, TRADES provides practical robustness with reasonable computational cost. For safety-critical systems, certified training (IBP or CROWN-based) provides formal guarantees.

No single approach is perfect. Tradeoffs exist between accuracy, robustness, computational cost, and verification tightness. But these tradeoffs are manageable with informed decisions based on your specific context.

The key insight: training objective determines model properties. Choose your loss function to reflect what matters to your application. If robustness matters, make it part of your training objective.

The training-verification loop is more powerful than either alone. Let verification findings guide training improvements. Let training innovations expand what verification can prove. This iterative, complementary relationship is where the real progress happens.

Start verifying your current network. Identify its weaknesses. Then retrain with the methods described here. Iterate. The networks you build this way—robust by design—represent the future of trustworthy machine learning.

Looking Forward

The convergence of better training algorithms, tighter verification methods, and neural architecture search will likely produce a future where robust and verifiable networks become standard practice rather than special-case engineering.

Note

Training Robust and Verifiable Neural Networks

Contents

Training Robust and Verifiable Neural Networks#

The Challenge: Why Standard Training Fails#

Adversarial Training: Learning from Attacks#

Certified Training: Provable Robustness by Design#

Architecture Choices for Trainability#

The Training-Verification Loop#

Practical Considerations and Resource Management#

Comparing Training Approaches#

Common Pitfalls and Best Practices#

Final Thoughts#