Certified Adversarial Training
Certified adversarial training methods that provide provable robustness guarantees during training, including IBP, CROWN, and TRADES variants
Certified Adversarial Training
Standard adversarial training improves empirical robustness—networks perform better against known attacks. But it provides no guarantees: a network trained on PGD attacks might still be vulnerable to stronger attacks or different perturbation types. What if you could train networks with provable robustness certificates?
Certified adversarial training combines adversarial training with formal verification: instead of training on empirically-generated adversarial examples, train to maximize verified robustness bounds. The result: networks with certified guarantees—definitive proof that no perturbation within a specified radius can fool the network.
This guide explores certified adversarial training: how to integrate verification into training, what methods exist (IBP, CROWN-based, TRADES variants), and when certified training is worth the computational cost.
The Core Idea: Train on Verified Bounds
Standard Adversarial Training
PGD adversarial training minimizes worst-case loss over empirically-generated adversarial examples:
The inner maximization finds adversarial perturbations via PGD (projected gradient descent):
Limitation: PGD only finds approximate worst-case examples (local maxima). True worst-case might be worse, so there are no certified guarantees.
Certified Adversarial Training
Certified training replaces empirical worst-case with verified worst-case bounds:
where the inner maximization is computed via formal verification (sound upper bound on worst-case loss).
Key difference: Instead of PGD-generated examples (which might miss the true worst-case), use verified bounds that provably bound all possible perturbations.
Certified vs Empirical Training
Standard Adversarial Training:
- Inner loop: PGD finds adversarial examples
- Guarantees: None (empirical robustness only)
- Robustness: Good against known attacks, may fail on stronger attacks
- Speed: Fast (PGD is efficient)
Certified Adversarial Training:
- Inner loop: Verification computes worst-case bounds
- Guarantees: Certified robustness (provable)
- Robustness: Guaranteed against all perturbations in -ball
- Speed: Slower (verification more expensive than PGD)
Result: Certified training produces networks with provable robustness certificates, at the cost of longer training time.
IBP Training: Interval Bound Propagation
IBP (Interval Bound Propagation) training uses interval arithmetic to compute certified bounds during training.
How IBP Training Works
Bound propagation: For input region , propagate bounds through layers:
Certified loss: Use worst-case margin from IBP bounds:
where margins are computed from output bounds .
Training objective:
Why it works: Maximizing the certified margin directly encourages robustness that verification can prove.
Practical IBP Training
Warmup schedule: Start with (standard training), gradually increase to target :
def train_ibp(model, dataloader, epochs=100, epsilon_target=0.3):
"""IBP certified training with epsilon warmup."""
for epoch in range(epochs):
# Warmup: gradually increase epsilon
if epoch < epochs // 2:
epsilon = epsilon_target * (epoch / (epochs // 2))
else:
epsilon = epsilon_target
for x, y in dataloader:
# Compute IBP bounds
lower_bounds, upper_bounds = ibp_forward(model, x, epsilon)
# Certified loss: worst-case margin
logits_lower = lower_bounds[-1] # Output layer lower bounds
logits_upper = upper_bounds[-1] # Output layer upper bounds
# For correct class y, maximize lower bound
# For other classes, minimize upper bound
margin_y = logits_lower[range(len(y)), y]
margin_others = torch.max(
torch.cat([
logits_upper[:, :y],
logits_upper[:, y+1:]
], dim=1),
dim=1
).values
# Certified margin loss
loss_certified = F.relu(margin_others - margin_y + kappa).mean()
# Optional: combine with standard loss
loss_standard = F.cross_entropy(model(x), y)
loss_total = loss_certified + lambda_std * loss_standard
optimizer.zero_grad()
loss_total.backward()
optimizer.step()
Key hyperparameters:
- Warmup schedule: Linear or exponential increase of
- Kappa: Margin buffer (e.g., )
- Lambda: Weight between certified and standard loss
Advantages:
- Fast: IBP is O(network size), very efficient
- Scalable: Works for large networks (CNNs, ResNets)
- Simple: Straightforward implementation
Disadvantages:
- Loose bounds: IBP bounds can be very conservative
- Requires small : Only works well for small perturbations
- Warmup critical: Direct training at large often fails
CROWN-Based Training
CROWN (Certified ROBustness with UNified Neural network verification) provides tighter bounds than IBP through backward linear relaxation.
CROWN Bound Propagation
Backward propagation: Instead of propagating interval bounds forward, CROWN propagates linear bounds backward:
where are computed via backward pass through the network.
Tightness: CROWN bounds are strictly tighter than or equal to IBP bounds. The optimization of parameters reduces conservativeness.
CROWN Training Objective
Certified loss with CROWN:
where the certified margin is computed via CROWN lower/upper bounds on class logits.
Training workflow:
def train_crown(model, dataloader, epsilon, method='crown'):
"""CROWN-based certified training."""
for x, y in dataloader:
# Compute CROWN bounds
lower_bounds, upper_bounds = crown_bounds(model, x, epsilon)
# Certified margin (same as IBP but with tighter bounds)
margin_y = lower_bounds[range(len(y)), y]
margin_others = upper_bounds.scatter(
1, y.unsqueeze(1), -float('inf')
).max(dim=1).values
loss_certified = F.relu(margin_others - margin_y + kappa).mean()
# Backward pass
optimizer.zero_grad()
loss_certified.backward()
optimizer.step()
Advantages over IBP:
- Tighter bounds leads to better certified accuracy
- Works at larger than IBP
- Still polynomial-time (though slower than IBP)
Challenges:
- More complex implementation
- Slower than IBP (backward propagation overhead)
- Still incomplete (provides upper bound on worst-case, not exact)
alpha-CROWN and beta-CROWN
alpha-CROWN: Optimizes the linear relaxation parameters for each layer to minimize output bounds. This tightens CROWN bounds significantly.
beta-CROWN: Extends alpha-CROWN with split constraints (beta parameters), providing even tighter bounds.
Training with alpha-beta-CROWN:
This nested optimization (outer loop over , inner loop over ) is expensive but provides the tightest certified training bounds.
TRADES with Certified Bounds
TRADES originally uses empirical adversarial examples. Combining TRADES with certified bounds yields certified-TRADES:
where is a certified divergence measure (e.g., KL divergence using verified bounds).
Certified KL divergence: Use CROWN/IBP to compute worst-case KL divergence:
approximated via verified bounds on .
Benefit: Combines TRADES’ accuracy-robustness balance with certified guarantees.
Training-Verification Loop
Certified training creates a virtuous cycle between training and verification:
Training phase:
- Compute certified bounds for current network
- Maximize worst-case certified margin
- Update network parameters
Verification phase:
- Use trained network for deployment
- Verify properties using the same verification method (IBP, CROWN)
- Certified accuracy should match or exceed training-time estimates
Why the Loop Works
Networks trained to maximize verified bounds are easier to verify:
- Activations tend to be clearly active/inactive (fewer uncertain ReLUs)
- Larger certified margins (tighter verification succeeds)
- Better Lipschitz constants (verification bounds tighter)
This creates a positive feedback loop: better training leads to easier verification leads to tighter bounds leads to better training.
Practical Considerations
Epsilon Scheduling
Challenge: Training at large from scratch often fails (network cannot learn anything robust).
Solution: Epsilon warmup schedule:
- Linear warmup:
- Exponential warmup:
- Step schedule: Increase in discrete steps every N epochs
Typical schedule: Start at , ramp to target over 50-100 epochs.
Combining Certified and Standard Loss
Pure certified training can hurt clean accuracy. Combining helps:
Typical weights: or anneal from 1 to 0 during training.
Computational Cost
Training time multiplier:
- IBP training: 1.5-3x standard training (depends on network depth)
- CROWN training: 2-5x standard training
- alpha-beta-CROWN training: 5-10x standard training (due to inner optimization)
Memory: Verification requires storing intermediate bounds, leading to 2-3x memory usage.
Mitigation: Use mixed-precision training, gradient checkpointing, or train on GPUs with large memory.
Results and State-of-the-Art
Certified Accuracy Benchmarks
MNIST with :
- Standard training: ~0% certified accuracy
- PGD adversarial training: ~0% (no certification)
- IBP training: ~85% certified accuracy
- CROWN training: ~92% certified accuracy
CIFAR-10 with :
- Standard training: ~0%
- PGD-AT: ~0% (empirical robustness only)
- IBP training: ~35%
- CROWN training: ~55%
- alpha-beta-CROWN: ~60%
Trend: Tighter verification during training leads to higher certified accuracy. The gap between IBP and CROWN shows that bound tightness matters significantly.
Comparison Table
| Method | Bound Type | Training Cost | Certified Acc (CIFAR-10) | Best For |
|---|---|---|---|---|
| Standard | None | 1x | 0% | Clean accuracy only |
| PGD-AT | Empirical | 7-10x | 0% (no cert) | Empirical robustness |
| IBP | Interval bounds | 1.5-3x | ~35% | Fast certified training |
| CROWN | Linear bounds | 2-5x | ~55% | Tight certified training |
| alpha-beta-CROWN | Optimized bounds | 5-10x | ~60% | State-of-the-art certified |
When to Use Certified Training
Use certified adversarial training when:
- Need provable robustness guarantees (safety-critical applications)
- Deploying in adversarial environments where empirical defense insufficient
- Willing to accept accuracy-robustness tradeoff
- Have computational budget for verification during training
- Target perturbation radius is known (can train at specific )
Use standard adversarial training when:
- Only need empirical robustness (defense against known attacks)
- Certification not required or too expensive
- Maximizing adversarial accuracy more important than certified accuracy
- Very large networks where verification doesn’t scale
Hybrid approach:
- Pretrain with standard/adversarial training for good initialization
- Fine-tune with certified training for certification guarantees
- Use IBP for initial training, switch to CROWN for final epochs (balance cost and tightness)
Sweet Spot
Certified training works best for:
- Medium-sized networks (CNNs with 10K-1M parameters)
- Small to moderate (0.1-0.3 for normalized to [0,1])
- Applications where certified accuracy in 50-70% range is acceptable
- Domains requiring provable guarantees (medical, autonomous systems)
Current Research and Future Directions
Tighter training bounds: Developing even tighter verification methods for training (SDP-based, multi-neuron relaxations).
Scalability: Extending certified training to large networks (ResNet-50, Vision Transformers) through efficient approximations.
Better epsilon scheduling: Adaptive schedules that optimize certified accuracy rather than following fixed warmup curves.
Architecture co-design: Designing network architectures specifically for certified training (e.g., architectures easier to verify).
Multi-perturbation certification: Training with certified bounds for multiple perturbation types simultaneously (, , semantic).
Limitations
Lower certified accuracy: Even state-of-the-art certified training achieves lower certified accuracy (~60% on CIFAR-10) than standard accuracy (~95%).
Computational cost: Verification during training is expensive, especially for tighter bounds (CROWN, alpha-beta-CROWN).
Epsilon limitations: Certified training works well for small but struggles at large perturbations.
Clean accuracy drop: Certified training often hurts clean accuracy (5-15% drop typical).
Scalability barriers: Current methods scale to networks with millions of parameters but struggle with billions (large vision models, LLMs).
Final Thoughts
Certified adversarial training represents a fundamental shift in robust training: from empirical robustness (training on attacks we can generate) to provable robustness (training on bounds we can verify). This shift brings the rigor of formal verification into the training loop itself.
While certified training doesn’t yet match the clean accuracy or empirical robustness of standard or adversarially-trained networks, it provides something neither can: provable guarantees. For applications where “probably robust” isn’t good enough—safety-critical systems, adversarial environments with sophisticated attackers—certified training is essential.
The progression from IBP to CROWN to alpha-beta-CROWN demonstrates steady progress: each generation achieves higher certified accuracy through tighter verification bounds. As verification methods improve, so will certified training, gradually closing the gap between provable and empirical robustness.
Understanding certified training illuminates the deep connection between training and verification: they’re not separate phases but intertwined processes. Better verification enables better training; better training yields networks easier to verify. This symbiotic relationship drives progress toward the ultimate goal: networks that are provably robust, verifiably correct, and practically deployable.
Further Reading
This guide provides comprehensive coverage of certified adversarial training for neural networks. For readers interested in diving deeper, we recommend the following resources organized by topic:
IBP Training - Fast Certified Training:
IBP training pioneered scalable certified training using interval bound propagation. Despite loose bounds, IBP’s efficiency enables training large networks with certified guarantees. The key innovation is the epsilon warmup schedule, allowing networks to gradually learn robust features.
CROWN-Based Training - Tighter Bounds:
CROWN training improves upon IBP through tighter linear bound propagation. By optimizing linear relaxation parameters, CROWN achieves significantly higher certified accuracy with moderate computational overhead. This represents the current practical sweet spot for certified training.
alpha-beta-CROWN - State-of-the-Art:
alpha-beta-CROWN provides the tightest training bounds through optimized bound propagation and split constraints. GPU acceleration makes the computational cost manageable, enabling state-of-the-art certified accuracy on standard benchmarks.
TRADES - Accuracy-Robustness Balance:
TRADES introduced explicit tradeoff control between natural and robust accuracy. Variants combining TRADES with certified bounds provide both empirical and provable robustness, representing best-practice for practical deployment.
Comparison with Standard Adversarial Training:
PGD adversarial training remains the gold standard for empirical robustness, providing important context for certified training’s tradeoffs. Understanding why adversarial training doesn’t provide certificates motivates the need for verified bounds during training.
Verification Methods for Training:
The bound propagation methods used in certified training—IBP, CROWN, DeepPoly—are covered in depth in verification literature. Understanding these methods’ tightness-speed tradeoffs explains certified training’s performance characteristics.
Advanced Verification for Tighter Training:
SDP-based verification and multi-neuron relaxations provide even tighter bounds but at higher computational cost. Future certified training may leverage these for improved certified accuracy.
Probabilistic Certification Alternative:
Randomized smoothing offers an alternative path to provable robustness through probabilistic guarantees. While certified training provides deterministic guarantees, randomized smoothing scales better to very large networks.
Related Topics:
For comparison with regularization-based training, see Regularization-Based Robust Training. For adversarial training that certified training extends, see Training Robust Networks. For verification of trained networks, see Robustness Testing Guide. For randomized smoothing as an alternative, see Certified Defenses.
Next Phase
Congratulations on completing Phase 3: Robust Training & Practical Implementation! Continue to Phase 4: Advanced Topics to explore advanced topics including state-of-the-art defenses, scalability challenges, and real-world deployment.