Certified Adversarial Training

Certified Adversarial Training¶

January 2026 NNV Guide 10 min read 1,901 words

Zhongkui Ma This guide is based on SoK: Certified Robustness for Deep Neural Networks. It covers certified adversarial training methods that provide provable robustness guarantees during training.

Practical Implementation Certified Training Robustness

Standard adversarial training [Madry et al., 2018] improves empirical robustness—networks perform better against known attacks. But it provides no guarantees: a network trained on PGD attacks might still be vulnerable to stronger attacks or different perturbation types. What if you could train networks with provable robustness certificates?

Certified adversarial training combines adversarial training with formal verification [Gowal et al., 2019, Xu et al., 2020, Zhang et al., 2020]: instead of training on empirically-generated adversarial examples, train to maximize verified robustness bounds. The result: networks with certified guarantees—definitive proof that no perturbation within a specified radius can fool the network.

This guide explores certified adversarial training: how to integrate verification into training, what methods exist (IBP, CROWN-based, TRADES variants), and when certified training is worth the computational cost.

The Core Idea: Train on Verified Bounds¶

Standard Adversarial Training¶

PGD adversarial training [Madry et al., 2018] minimizes worst-case loss over empirically-generated adversarial examples:

\[\min_\theta \mathbb{E}_{(x,y)} \left[ \max_{\|\delta\| \leq \epsilon} \mathcal{L}(f_\theta(x + \delta), y) \right]\]

The inner maximization finds adversarial perturbations via PGD (projected gradient descent):

\[\delta^{(t+1)} = \Pi_{\|\delta\| \leq \epsilon} \left( \delta^{(t)} + \alpha \cdot \text{sign}(\nabla_\delta \mathcal{L}(f_\theta(x + \delta^{(t)}), y)) \right]\]

Limitation: PGD only finds approximate worst-case examples (local maxima). True worst-case might be worse → no certified guarantees.

Certified Adversarial Training¶

Certified training replaces empirical worst-case with verified worst-case bounds:

\[\min_\theta \mathbb{E}_{(x,y)} \left[ \max_{\delta \in \mathcal{B}_\epsilon(x)} \mathcal{L}(f_\theta(x + \delta), y) \right]\]

where the inner maximization is computed via formal verification (sound upper bound on worst-case loss).

Key difference: Instead of PGD-generated examples (which might miss the true worst-case), use verified bounds that provably bound all possible perturbations.

Certified vs Empirical Training

Standard Adversarial Training:

Inner loop: PGD finds adversarial examples
Guarantees: None (empirical robustness only)
Robustness: Good against known attacks, may fail on stronger attacks
Speed: Fast (PGD is efficient)

Certified Adversarial Training:

Inner loop: Verification computes worst-case bounds
Guarantees: Certified robustness (provable)
Robustness: Guaranteed against all perturbations in \(\epsilon\)-ball
Speed: Slower (verification more expensive than PGD)

Result: Certified training produces networks with provable robustness certificates, at the cost of longer training time.

IBP Training: Interval Bound Propagation¶

IBP (Interval Bound Propagation) [Gowal et al., 2019] training uses interval arithmetic to compute certified bounds during training.

How IBP Training Works¶

Bound propagation: For input region \(x \in [\underline{x}, \overline{x}]\), propagate bounds through layers:

\[\begin{split}\begin{aligned} \underline{z}^{(\ell)}, \overline{z}^{(\ell)} &= \text{IntervalLinear}(W^{(\ell)}, b^{(\ell)}, \underline{h}^{(\ell-1)}, \overline{h}^{(\ell-1)}) \\ \underline{h}^{(\ell)}, \overline{h}^{(\ell)} &= \text{IntervalReLU}(\underline{z}^{(\ell)}, \overline{z}^{(\ell)}) \end{aligned}\end{split}\]

Certified loss: Use worst-case margin from IBP bounds:

\[\mathcal{L}_{\text{IBP}}(f_\theta, x, y, \epsilon) = \text{margin}_y^{\text{lower}} - \max_{c \neq y} \text{margin}_c^{\text{upper}}\]

where margins are computed from output bounds \([\underline{f}(x), \overline{f}(x)]\).

Training objective:

\[\min_\theta \mathbb{E}_{(x,y)} \mathcal{L}_{\text{IBP}}(f_\theta, x, y, \epsilon)\]

Why it works: Maximizing the certified margin directly encourages robustness that verification can prove.

Practical IBP Training¶

Warmup schedule: Start with \(\epsilon = 0\) (standard training), gradually increase to target \(\epsilon\):

def train_ibp(model, dataloader, epochs=100, epsilon_target=0.3):
    """IBP certified training with epsilon warmup."""
    for epoch in range(epochs):
        # Warmup: gradually increase epsilon
        if epoch < epochs // 2:
            epsilon = epsilon_target * (epoch / (epochs // 2))
        else:
            epsilon = epsilon_target

        for x, y in dataloader:
            # Compute IBP bounds
            lower_bounds, upper_bounds = ibp_forward(model, x, epsilon)

            # Certified loss: worst-case margin
            logits_lower = lower_bounds[-1]  # Output layer lower bounds
            logits_upper = upper_bounds[-1]  # Output layer upper bounds

            # For correct class y, maximize lower bound
            # For other classes, minimize upper bound
            margin_y = logits_lower[range(len(y)), y]
            margin_others = torch.max(
                torch.cat([
                    logits_upper[:, :y],
                    logits_upper[:, y+1:]
                ], dim=1),
                dim=1
            ).values

            # Certified margin loss
            loss_certified = F.relu(margin_others - margin_y + kappa).mean()

            # Optional: combine with standard loss
            loss_standard = F.cross_entropy(model(x), y)
            loss_total = loss_certified + lambda_std * loss_standard

            optimizer.zero_grad()
            loss_total.backward()
            optimizer.step()

Key hyperparameters:

Warmup schedule: Linear or exponential increase of \(\epsilon\)
Kappa: Margin buffer (e.g., \(\kappa = 0\))
Lambda: Weight between certified and standard loss

Advantages:

Fast: IBP is O(network size), very efficient
Scalable: Works for large networks (CNNs, ResNets)
Simple: Straightforward implementation

Disadvantages:

Loose bounds: IBP bounds can be very conservative
Requires small \(\epsilon\): Only works well for small perturbations
Warmup critical: Direct training at large \(\epsilon\) often fails

CROWN-Based Training¶

CROWN [Zhang et al., 2018, Zhang et al., 2020] (Certified ROBustness with UNified Neural network verification) provides tighter bounds than IBP through backward linear relaxation.

CROWN Bound Propagation¶

Backward propagation: Instead of propagating interval bounds forward, CROWN propagates linear bounds backward:

\[\underline{f}(x) \geq \alpha x + \beta \quad \text{for } x \in [\underline{x}, \overline{x}]\]

where \(\alpha, \beta\) are computed via backward pass through the network.

Tightness: CROWN bounds are strictly tighter than or equal to IBP bounds. The optimization of \(\alpha, \beta\) parameters reduces conservativeness.

CROWN Training Objective¶

Certified loss with CROWN:

\[\mathcal{L}_{\text{CROWN}}(f_\theta, x, y, \epsilon) = \text{certified_margin}_{\text{CROWN}}(f_\theta, x, y, \epsilon)\]

where the certified margin is computed via CROWN lower/upper bounds on class logits.

Training workflow:

def train_crown(model, dataloader, epsilon, method='crown'):
    """CROWN-based certified training."""
    for x, y in dataloader:
        # Compute CROWN bounds
        lower_bounds, upper_bounds = crown_bounds(model, x, epsilon)

        # Certified margin (same as IBP but with tighter bounds)
        margin_y = lower_bounds[range(len(y)), y]
        margin_others = upper_bounds.scatter(
            1, y.unsqueeze(1), -float('inf')
        ).max(dim=1).values

        loss_certified = F.relu(margin_others - margin_y + kappa).mean()

        # Backward pass
        optimizer.zero_grad()
        loss_certified.backward()
        optimizer.step()

Advantages over IBP:

Tighter bounds → better certified accuracy
Works at larger \(\epsilon\) than IBP
Still polynomial-time (though slower than IBP)

Challenges:

More complex implementation
Slower than IBP (backward propagation overhead)
Still incomplete (provides upper bound on worst-case, not exact)

α-CROWN and β-CROWN¶

α-CROWN [Xu et al., 2020]: Optimizes the linear relaxation parameters \(\alpha\) for each layer to minimize output bounds. This tightens CROWN bounds significantly.

β-CROWN [Wang et al., 2021]: Extends α-CROWN with split constraints (β parameters), providing even tighter bounds.

Training with α-β-CROWN:

\[\min_\theta \mathbb{E}_{(x,y)} \left[ \min_{\alpha, \beta} \text{certified_loss}_{\alpha\beta\text{CROWN}}(f_\theta, x, y, \epsilon; \alpha, \beta) \right]\]

This nested optimization (outer loop over \(\theta\), inner loop over \(\alpha, \beta\)) is expensive but provides the tightest certified training bounds.

TRADES with Certified Bounds¶

TRADES [Zhang et al., 2019] originally uses empirical adversarial examples. Combining TRADES with certified bounds yields certified-TRADES:

\[\min_\theta \mathbb{E}_{(x,y)} \left[ \mathcal{L}(f_\theta(x), y) + \beta \cdot D_{\text{certified}}(f_\theta, x, y, \epsilon) \right]\]

where \(D_{\text{certified}}\) is a certified divergence measure (e.g., KL divergence using verified bounds).

Certified KL divergence: Use CROWN/IBP to compute worst-case KL divergence:

\[D_{\text{KL}}^{\text{cert}}(p \| q) = \max_{\delta} D_{\text{KL}}(p \| f_\theta(x + \delta))\]

approximated via verified bounds on \(f_\theta(x + \delta)\).

Benefit: Combines TRADES’ accuracy-robustness balance with certified guarantees.

Training-Verification Loop¶

Certified training creates a virtuous cycle between training and verification:

Training phase:

Compute certified bounds for current network
Maximize worst-case certified margin
Update network parameters

Verification phase:

Use trained network for deployment
Verify properties using the same verification method (IBP, CROWN)
Certified accuracy should match or exceed training-time estimates

Why the Loop Works

Networks trained to maximize verified bounds are easier to verify:

Activations tend to be clearly active/inactive (fewer uncertain ReLUs)
Larger certified margins (tighter verification succeeds)
Better Lipschitz constants (verification bounds tighter)

This creates a positive feedback loop: better training → easier verification → tighter bounds → better training.

Practical Considerations¶

Epsilon Scheduling¶

Challenge: Training at large \(\epsilon\) from scratch often fails (network cannot learn anything robust).

Solution: Epsilon warmup schedule:

Linear warmup: \(\epsilon(t) = \epsilon_{\max} \cdot \min(t / T_{\text{warmup}}, 1)\)
Exponential warmup: \(\epsilon(t) = \epsilon_{\max} \cdot (1 - e^{-t/\tau})\)
Step schedule: Increase \(\epsilon\) in discrete steps every N epochs

Typical schedule: Start at \(\epsilon = 0\), ramp to target over 50-100 epochs.

Combining Certified and Standard Loss¶

Pure certified training can hurt clean accuracy. Combining helps:

\[\mathcal{L}_{\text{total}} = \lambda_{\text{std}} \mathcal{L}_{\text{std}}(f_\theta, x, y) + \lambda_{\text{cert}} \mathcal{L}_{\text{cert}}(f_\theta, x, y, \epsilon)\]

Typical weights: \(\lambda_{\text{std}} = 0.5, \lambda_{\text{cert}} = 0.5\) or anneal \(\lambda_{\text{std}}\) from 1 to 0 during training.

Computational Cost¶

Training time multiplier:

IBP training: 1.5-3× standard training (depends on network depth)
CROWN training: 2-5× standard training
α-β-CROWN training: 5-10× standard training (due to inner optimization)

Memory: Verification requires storing intermediate bounds → 2-3× memory usage.

Mitigation: Use mixed-precision training, gradient checkpointing, or train on GPUs with large memory.

Results and State-of-the-Art¶

Certified Accuracy Benchmarks¶

MNIST \(\ell_\infty\) with \(\epsilon = 0.3\):

Standard training: ~0% certified accuracy
PGD adversarial training: ~0% (no certification)
IBP training [Gowal et al., 2019]: ~85% certified accuracy
CROWN training [Zhang et al., 2020]: ~92% certified accuracy

CIFAR-10 \(\ell_\infty\) with \(\epsilon = 8/255\):

Standard training: ~0%
PGD-AT: ~0% (empirical robustness only)
IBP training: ~35%
CROWN training: ~55%
α-β-CROWN: ~60%

Trend: Tighter verification during training → higher certified accuracy. The gap between IBP and CROWN shows that bound tightness matters significantly.

Comparison Table¶

Table 30 Certified Training Methods¶
Method	Bound Type	Training Cost	Certified Acc (CIFAR-10)	Best For
Standard	None	1×	0%	Clean accuracy only
PGD-AT	Empirical	7-10×	0% (no cert)	Empirical robustness
IBP	Interval bounds	1.5-3×	~35%	Fast certified training
CROWN	Linear bounds	2-5×	~55%	Tight certified training
α-β-CROWN	Optimized bounds	5-10×	~60%	State-of-the-art certified

When to Use Certified Training¶

Use certified adversarial training when:

Need provable robustness guarantees (safety-critical applications)
Deploying in adversarial environments where empirical defense insufficient
Willing to accept accuracy-robustness tradeoff
Have computational budget for verification during training
Target perturbation radius is known (can train at specific \(\epsilon\))

Use standard adversarial training when:

Only need empirical robustness (defense against known attacks)
Certification not required or too expensive
Maximizing adversarial accuracy more important than certified accuracy
Very large networks where verification doesn’t scale

Hybrid approach:

Pretrain with standard/adversarial training for good initialization
Fine-tune with certified training for certification guarantees
Use IBP for initial training, switch to CROWN for final epochs (balance cost and tightness)

Sweet Spot

Certified training works best for:

Medium-sized networks (CNNs with 10K-1M parameters)
Small to moderate \(\epsilon\) (0.1-0.3 for \(\ell_\infty\) normalized to [0,1])
Applications where certified accuracy in 50-70% range is acceptable
Domains requiring provable guarantees (medical, autonomous systems)

Current Research and Future Directions¶

Tighter training bounds: Developing even tighter verification methods for training (SDP-based [Raghunathan et al., 2018], multi-neuron [Müller et al., 2022]).

Scalability: Extending certified training to large networks (ResNet-50, Vision Transformers) through efficient approximations.

Better epsilon scheduling: Adaptive schedules that optimize certified accuracy rather than following fixed warmup curves.

Architecture co-design: Designing network architectures specifically for certified training (e.g., architectures easier to verify).

Multi-perturbation certification: Training with certified bounds for multiple perturbation types simultaneously (\(\ell_\infty\), \(\ell_2\), semantic).

Limitations¶

Lower certified accuracy: Even state-of-the-art certified training achieves lower certified accuracy (~60% on CIFAR-10) than standard accuracy (~95%).

Computational cost: Verification during training is expensive, especially for tighter bounds (CROWN, α-β-CROWN).

Epsilon limitations: Certified training works well for small \(\epsilon\) but struggles at large perturbations.

Clean accuracy drop: Certified training often hurts clean accuracy (5-15% drop typical).

Scalability barriers: Current methods scale to networks with millions of parameters but struggle with billions (large vision models, LLMs).

Final Thoughts¶

Certified adversarial training [Gowal et al., 2019, Xu et al., 2020, Zhang et al., 2020] represents a fundamental shift in robust training: from empirical robustness (training on attacks we can generate) to provable robustness (training on bounds we can verify). This shift brings the rigor of formal verification [Singh et al., 2019, Weng et al., 2018, Zhang et al., 2018] into the training loop itself.

While certified training doesn’t yet match the clean accuracy or empirical robustness of standard or adversarially-trained networks, it provides something neither can: provable guarantees. For applications where “probably robust” isn’t good enough—safety-critical systems, adversarial environments with sophisticated attackers—certified training is essential.

The progression from IBP [Gowal et al., 2019] to CROWN [Zhang et al., 2020] to α-β-CROWN [Wang et al., 2021] demonstrates steady progress: each generation achieves higher certified accuracy through tighter verification bounds. As verification methods improve, so will certified training, gradually closing the gap between provable and empirical robustness.

Understanding certified training illuminates the deep connection between training and verification: they’re not separate phases but intertwined processes. Better verification enables better training; better training yields networks easier to verify. This symbiotic relationship drives progress toward the ultimate goal: networks that are provably robust, verifiably correct, and practically deployable.

Next Phase

Congratulations on completing Phase 3: Robust Training & Practical Implementation! Continue to Phase 4: Advanced Topics & Frontiers to explore advanced topics including state-of-the-art defenses, scalability challenges, and real-world deployment.

[1]

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.

[2] (1,2,3,4,5,6,7)

Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.

[3] (1,2,3)

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. 2018.

[4] (1,2)

Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.

[5] (1,2)

Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.

[6] (1,2)

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.

[7] (1,2,3)

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.

[8] (1,2)

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.

[9] (1,2,3,4,5)

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.

[10] (1,2)

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR, 2019.

[11] (1,2,3,4,5,6)

Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In International Conference on Learning Representations. 2020.

[12] (1,2,3,4)

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, 4939–4948. 2018.

Certified Adversarial Training

Contents

Certified Adversarial Training¶

The Core Idea: Train on Verified Bounds¶

Standard Adversarial Training¶

Certified Adversarial Training¶

IBP Training: Interval Bound Propagation¶

How IBP Training Works¶

Practical IBP Training¶

CROWN-Based Training¶

CROWN Bound Propagation¶

CROWN Training Objective¶

α-CROWN and β-CROWN¶

TRADES with Certified Bounds¶

Training-Verification Loop¶

Practical Considerations¶

Epsilon Scheduling¶

Combining Certified and Standard Loss¶

Computational Cost¶

Results and State-of-the-Art¶

Certified Accuracy Benchmarks¶

Comparison Table¶

When to Use Certified Training¶

Current Research and Future Directions¶

Limitations¶

Final Thoughts¶

Comments & Discussion