SDP-Based Verification: Semi-Definite Programming for Tighter Bounds¶
Linear programming [Salman et al., 2019] relaxations provide polynomial-time verification but sometimes yield loose bounds. Complete methods [Tjeng et al., 2019, Katz et al., 2019] provide exact answers but don’t scale. What if you want tighter bounds than LP without the exponential cost of complete methods?
Semi-definite programming (SDP) offers exactly this: relaxations tighter than LP while remaining tractable. SDP-based verification [Raghunathan et al., 2018] formulates bound computation as an SDP, exploiting quadratic relationships that LP misses. The tradeoff is computational cost—SDP solvers are slower than LP solvers—but for critical properties on moderately-sized networks, the tightness gain often justifies the expense.
This guide explores how SDP-based verification works, why it provides tighter bounds, and when the extra computational cost is worthwhile.
Why Semi-Definite Programming?¶
Linear programs optimize over polytopes (linear constraints). Semi-definite programs optimize over spectrahedra (positive semi-definite matrix constraints). This richer geometry captures relationships that linear constraints cannot.
Key advantage: SDP can represent quadratic constraints and products of variables naturally through positive semi-definite (PSD) matrix constraints. For neural networks, this captures neuron correlations and non-linear relationships more tightly than linear relaxations.
Semi-Definite Program (SDP)
A semi-definite program has the form:
where:
\(X\) is a symmetric matrix variable
\(X \succeq 0\) means \(X\) is positive semi-definite (all eigenvalues non-negative)
\(\langle A, B \rangle = \text{trace}(A^T B)\) is the matrix inner product
Complexity: SDP can be solved in polynomial time using interior-point methods. Practical solvers (CVXPY, SCS, Mosek) handle SDPs with thousands of variables.
SDP vs LP for Verification¶
LP relaxations [Salman et al., 2019] use linear inequalities to over-approximate the neural network’s behavior. They’re fast (polynomial time) but conservative.
SDP relaxations [Raghunathan et al., 2018] use positive semi-definite constraints to over-approximate behavior. They’re slower (higher-degree polynomial) but tighter.
Tightness hierarchy: For the same network, SDP relaxations provably provide tighter or equal bounds compared to LP relaxations. This theoretical guarantee comes from the fact that SDP can encode LP constraints (every LP is an SDP) plus additional quadratic constraints.
Aspect |
LP-Based Verification |
SDP-Based Verification |
|---|---|---|
Constraints |
Linear inequalities |
PSD matrix constraints |
Time Complexity |
O(n³) interior-point |
O(n^4.5) interior-point |
Tightness |
Moderate (single-neuron) |
Tight (captures quadratic) |
Scalability |
Thousands of neurons |
Hundreds to ~1000 neurons |
Best Use |
General verification |
Critical properties, small networks |
SDP Formulation for Neural Networks¶
The SDP formulation [Raghunathan et al., 2018] for neural network verification extends the LP approach by introducing a matrix variable that captures second-order relationships between neurons.
Lifting to Quadratic Form¶
Standard approach: For a network with variables \(z\) (all neuron activations), verify bounds on outputs.
SDP approach: Introduce a lifted variable \(X = zz^T\), a matrix encoding all pairwise products of neuron activations.
Why this helps: ReLU and other non-linearities create quadratic relationships. By explicitly representing these products in \(X\), the SDP can encode tighter constraints.
Basic SDP Formulation¶
Given: Neural network \(f_\theta\), input region \(\mathcal{X}\), output neuron \(k\) to bound.
Variables:
\(z \in \mathbb{R}^n\): All neuron pre/post-activations
\(X \in \mathbb{R}^{n \times n}\) : Lifted matrix with :math:`X_{ij} = z_i z_j`Objective: Minimize \(z_k\) (output neuron \(k\))
Constraints:
Linear layer constraints: For \(z_j = \sum_i W_{ji} z_i + b_j\):
\[z_j = \langle W_j, z \rangle + b_j\]Input bounds: \(\underline{x} \leq x \leq \overline{x}\) for input \(x\)
ReLU constraints: For \(y = \text{ReLU}(z)\) with bounds \(\underline{z} \leq z \leq \overline{z}\):
If \(\overline{z} \leq 0\): \(y = 0\) (inactive)
If \(\underline{z} \geq 0\): \(y = z\) (active)
If \(\underline{z} < 0 < \overline{z}\): Use SDP relaxation
Lifted consistency: \(X_{ij} = z_i z_j\) (exactly)
PSD constraint: \(\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0\) Key insight: The PSD constraint \(\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0\) enforces that \(X\) represents outer products, making \(X_{ij} \approx z_i z_j\) without requiring exact equality (which would be non-convex).
ReLU Relaxation in SDP¶
For uncertain ReLUs (\(\underline{z} < 0 < \overline{z}\)), the SDP formulation adds tighter constraints than LP [Raghunathan et al., 2018]:
LP triangle relaxation:
SDP quadratic relaxation (additional constraint beyond LP):
This constraint captures the fact that when ReLU is active (\(y = z\)), both \(y\) and \(z\) are positive, so their product is positive. When inactive (\(y = 0\)), the product is zero.
Why it’s tighter: The quadratic constraint \(y \cdot z \geq 0\) is non-linear, which LP cannot express. SDP encodes it via the lifted matrix \(X\) with the PSD constraint.
Nuclear Norm and Rank Constraints¶
An alternative SDP formulation [Raghunathan et al., 2018] uses the nuclear norm (sum of singular values) to encourage low-rank structure in the lifted matrix \(X\).
Nuclear Norm Regularization¶
Motivation: The exact relationship \(X = zz^T\) implies \(X\) is rank-1. Relaxing this constraint makes the problem convex, but allowing arbitrary \(X\) can be loose.
Middle ground: Minimize the nuclear norm \(\|X\|_* = \sum_i \sigma_i(X)\) where \(\sigma_i\) are singular values.
SDP formulation with nuclear norm:
Effect: The nuclear norm penalty encourages \(X\) to be low-rank, closer to the rank-1 structure of true outer products. The parameter \(\lambda\) balances tightness (higher \(\lambda\)) and conservativeness (lower \(\lambda\)).
Nuclear norm as SDP: The nuclear norm minimization can be reformulated as an SDP constraint:
This makes nuclear norm optimization tractable within the SDP framework.
Tightness Improvements¶
Empirical results [Raghunathan et al., 2018]:
Nuclear norm regularization provides 10-30% tighter bounds than unregularized SDP
SDP provides 20-50% tighter bounds than LP relaxation for typical networks
Tightness gain is most pronounced for networks with many uncertain ReLUs
Comparison with Other Methods¶
SDP vs LP-Based Methods¶
LP-based methods [Salman et al., 2019]:
Pros: Fast, scalable to large networks
Cons: Loose bounds (misses quadratic relationships)
SDP-based methods [Raghunathan et al., 2018]:
Pros: Tighter bounds (captures products via PSD constraints)
Cons: Slower (cubic to quartic complexity)
When SDP wins: Networks with strong neuron correlations, properties requiring very tight bounds, moderate network size (hundreds to ~1000 neurons).
SDP vs Multi-Neuron Relaxations¶
Multi-neuron relaxations [Müller et al., 2022]:
Approach: Add pairwise product terms explicitly for selected neuron pairs
Complexity: Depends on number of pairs \(k\); roughly \(O(n + k)\) where \(k \ll n^2\)
Tightness: Tight for selected pairs, but greedy pair selection may miss important correlations
SDP-based methods:
Approach: Encode all pairwise products via lifted matrix \(X\)
Complexity: \(O(n^{4.5})\) for general SDP solvers
Tightness: Captures all pairwise products, potentially tighter than selective multi-neuron
Tradeoff: PRIMA [Müller et al., 2022] is often faster with comparable tightness due to selective pairing. SDP provides theoretical tightness guarantees but at higher computational cost.
SDP vs Complete Methods¶
Complete methods [Katz et al., 2019, Wang et al., 2021]:
Pros: Exact answers (verified or counterexample)
Cons: Exponential worst-case complexity, don’t scale to large networks
SDP methods:
Pros: Polynomial time, tighter than LP/CROWN
Cons: Still incomplete (might return “unknown”)
When to choose SDP over complete: When the network is too large for complete methods but LP/CROWN are too loose. SDP fills the gap between fast incomplete and slow complete methods.
Method |
Tightness |
Time Complexity |
Network Size Limit |
|---|---|---|---|
IBP |
Loosest |
O(n) |
Millions of neurons |
CROWN |
Moderate |
O(n) |
Millions of neurons |
LP-based |
Moderate-Tight |
O(n³) |
Thousands of neurons |
PRIMA |
Tight |
O(n + k pairs) |
Thousands of neurons |
SDP-based |
Very Tight |
O(n^4.5) |
Hundreds to ~1000 |
Complete (SMT/MILP) |
Exact |
Exponential |
Hundreds of neurons |
Practical Implementation¶
Using SDP Solvers¶
Available solvers:
CVXPY [authors, 2020]: High-level Python interface, supports multiple SDP solvers
SCS: Splitting Conic Solver, fast for large-scale problems
MOSEK: Commercial solver, highly optimized
SDPT3/SeDuMi: MATLAB-based solvers
Typical workflow:
import cvxpy as cp
import numpy as np
# Define network layers, input bounds, etc.
n_neurons = 100 # Total neurons (all layers)
# Variables
z = cp.Variable(n_neurons) # Neuron activations
X = cp.Variable((n_neurons, n_neurons), symmetric=True) # Lifted matrix
# Objective: minimize output neuron
objective = cp.Minimize(z[-1])
# Constraints
constraints = []
# Linear layer constraints: z_j = W @ z + b
# (add for each layer)
# Input bounds
constraints.append(z[:input_dim] >= x_lower)
constraints.append(z[:input_dim] <= x_upper)
# ReLU constraints (triangle + quadratic)
# For each uncertain ReLU with pre-activation z_i, post-activation y_j:
# constraints.append(y_j * z_i >= 0) # SDP quadratic constraint
# PSD constraint (lifted consistency)
lifted = cp.bmat([[X, z.reshape(-1, 1)],
[z.reshape(1, -1), np.array([[1]])]])
constraints.append(lifted >> 0) # PSD
# Solve
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.SCS)
lower_bound = problem.value
Performance Considerations¶
Scalability challenges:
Matrix size: \(X \in \mathbb{R}^{n \times n}\) has \(O(n^2)\) entries. For \(n = 1000\), that’s 1 million variables.
Solver time: SDP solvers have \(O(n^{4.5})\) complexity. Doubling network size increases time by ~22×.
Optimizations:
Sparsity exploitation: Many entries of \(X\) may be irrelevant (neurons in different layers don’t interact directly). Exploit this sparsity to reduce variable count.
Layer-by-layer solving: Similar to LP [Salman et al., 2019], solve SDP layer-by-layer rather than for the entire network. This keeps \(n\) small (neurons per layer) rather than large (total neurons).
Warm-starting: When solving multiple similar SDPs (different output neurons, slightly different bounds), initialize solver with previous solution.
GPU acceleration: Some recent work [Xu et al., 2020] explores GPU-accelerated SDP solving, though traditional SDP solvers are CPU-based.
When to Use SDP-Based Verification¶
Use SDP when:
LP or CROWN [Weng et al., 2018, Zhang et al., 2018] return “unknown” and you need tighter bounds
Network is small to medium (hundreds to ~1000 neurons)
Properties are critical—extra computational cost justified for tightness
Strong neuron correlations (convolutional layers, residual connections)
Computational budget allows hours for verification
Don’t use when:
Network is very large (thousands to millions of neurons)—LP/CROWN/PRIMA faster
LP already verifies the property—no need for tighter bounds
Need complete verification—use SMT [Katz et al., 2019] or branch-and-bound [Wang et al., 2021] instead
Rapid iteration required—SDP too slow for tight feedback loops
Sweet Spot for SDP
SDP-based verification occupies a niche: tighter than LP but slower, faster than complete but still incomplete. It’s most valuable when:
Simple incomplete methods fail to verify
Complete methods are too slow
Network size is moderate (not tiny, not huge)
Tightness matters more than speed
Current Research and Extensions¶
Active research directions:
Tighter SDP relaxations: Beyond basic nuclear norm, researchers explore additional constraints (e.g., exploiting network structure like convolution, adding higher-order tensor relaxations) to tighten bounds further.
Faster solvers: Developing specialized SDP solvers for neural network verification that exploit problem structure for faster solving.
Hybrid SDP-MILP: Combine SDP relaxations with integer programming for select neurons, balancing tightness and completeness.
Integration with training: Train networks that are easier to verify via SDP (e.g., encouraging low nuclear norm in activations).
Probabilistic SDP: Extend SDP to handle probabilistic guarantees, combining with randomized smoothing [Cohen et al., 2019].
Limitations¶
Computational cost: \(O(n^{4.5})\) complexity limits SDP to moderately-sized networks. Large-scale verification requires faster methods.
Still incomplete: SDP provides tighter bounds than LP but might still return “unknown.” For definitive answers, complete methods [Katz et al., 2019, Wang et al., 2021] are needed.
Solver numerical issues: SDP solvers can face numerical instability, especially for large problems or poorly-scaled constraints. Careful problem formulation and solver tuning are required.
Not always tighter in practice: While theoretically tighter than LP, in practice the improvement depends on network structure. For some networks, the gain is marginal, not worth the extra cost.
Final Thoughts¶
SDP-based verification [Raghunathan et al., 2018] represents a sophisticated approach to incomplete verification: encoding neural networks as semi-definite programs captures quadratic relationships that simpler methods miss. The positive semi-definite constraint—a subtle but powerful tool from convex optimization—enables reasoning about products and correlations without sacrificing polynomial-time solvability.
While SDP doesn’t scale to massive networks like CROWN [Weng et al., 2018, Zhang et al., 2018] or IBP [Gowal et al., 2019] do, it fills an important niche. For critical properties on moderately-sized networks where LP is too loose and complete methods [Katz et al., 2019] are too slow, SDP provides the right balance.
Understanding SDP-based verification illuminates the tightness hierarchy of incomplete methods: from simple intervals (IBP) to linear bounds (CROWN) to linear programs (LP [Salman et al., 2019]) to semi-definite programs (SDP) to complete enumeration (SMT/MILP [Tjeng et al., 2019]). Each step up the hierarchy trades speed for tightness. The art of verification is choosing the right point on this spectrum for your application.
Further Reading
This guide provides comprehensive coverage of SDP-based verification for neural networks. For readers interested in diving deeper, we recommend the following resources organized by topic:
SDP-Based Verification Foundations:
The seminal work on SDP relaxations for neural network verification [Raghunathan et al., 2018] introduced the lifted matrix formulation and nuclear norm regularization, demonstrating that semi-definite programming can provide significantly tighter bounds than linear programming while maintaining polynomial-time complexity. This foundational paper establishes the theoretical tightness hierarchy: SDP relaxations are provably at least as tight as LP relaxations.
Convex Optimization Background:
For readers unfamiliar with semi-definite programming, CVXPY [authors, 2020] provides an accessible introduction to convex optimization and SDP solving with practical Python implementations. Understanding SDP fundamentals—positive semi-definiteness, matrix inner products, duality theory—is essential for applying SDP-based verification effectively.
Comparison with LP-Based Methods:
LP-based verification [Salman et al., 2019] provides the baseline for comparison. SDP extends LP by adding PSD constraints that capture quadratic relationships. Understanding LP formulations helps clarify what additional tightness SDP provides and why it costs more computationally.
Comparison with Multi-Neuron Methods:
PRIMA [Müller et al., 2022] and other multi-neuron relaxations offer an alternative path to tighter bounds: selectively add product terms rather than lifting the entire problem to \(X = zz^T\). In practice, PRIMA with careful pair selection often matches SDP tightness at lower computational cost. The tradeoff is theoretical: SDP provides worst-case tightness guarantees, while PRIMA’s tightness depends on heuristic pair selection.
Bound Propagation Baselines:
Fast incomplete methods—CROWN [Weng et al., 2018, Zhang et al., 2018], DeepPoly [Singh et al., 2019], IBP [Gowal et al., 2019]—provide the lower bound on the tightness-speed spectrum. SDP trades their speed for tighter bounds. Understanding these baselines clarifies when SDP’s extra cost is justified.
Complete Verification for Context:
When SDP is still too loose or you need definitive answers, complete methods are necessary. Marabou [Katz et al., 2019] provides SMT-based complete verification. MILP approaches [Tjeng et al., 2019] formulate verification as mixed-integer programming. Branch-and-bound [Wang et al., 2021, Zhang et al., 2022] combines incomplete bounds (potentially from SDP) with systematic refinement for completeness.
GPU Acceleration:
Recent work on GPU-accelerated verification [Xu et al., 2020] explores parallelizing bound computation, including SDP solving. While traditional SDP solvers are CPU-based, specialized implementations may enable faster SDP-based verification through hardware acceleration.
Probabilistic Alternatives:
When deterministic verification becomes intractable, randomized smoothing [Cohen et al., 2019] provides probabilistic certification guarantees. This represents a fundamentally different tradeoff: accepting probabilistic rather than deterministic guarantees for better scalability.
Related Topics:
For understanding the LP methods that SDP extends, see Linear Programming for Verification. For multi-neuron alternatives to SDP, see Multi-Neuron Relaxation and PRIMA. For complete methods that SDP might replace or feed into, see Marabou and Reluplex: Extended Simplex for Verification and Branch-and-Bound Verification. For faster incomplete methods, see Bound Propagation Approaches. For understanding Lipschitz-based approaches that complement SDP, see Lipschitz Bounds and Curvature-Based Verification.
Next Guide
Continue to Lipschitz Bounds and Curvature-Based Verification to learn about Lipschitz constant estimation and spectral normalization for robustness certification.
The CVXPY authors. Welcome to cvxpy 1.1 - cvxpy 1.1 documentation. 2020. https://www.cvxpy.org/.
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning. 2019.
Sven Gowal, Krishnamurthy Dj Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In Proceedings of the IEEE International Conference on Computer Vision, 4842–4851. 2019.
Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, and others. The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, 443–452. Springer, 2019.
Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. PRIMA: precise and general neural network certification via multi-neuron convex relaxations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, 2022.
Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 10877–10887. 2018.
Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex relaxation barrier to tight robustness verification of neural networks. In Advances in Neural Information Processing Systems, 9832–9842. 2019.
Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 15072–15083. 2019.
Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. 2019.
Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 2021.
Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. 2018.
Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 2020.
Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J. Zico Kolter. General cutting planes for bound-propagation-based neural network verification. arXiv preprint arXiv:2208.05740, 2022.
Comments & Discussion