SDP-Based Verification: Semi-Definite Programming for Tighter Bounds

Linear programming relaxations provide polynomial-time verification but sometimes yield loose bounds. Complete methods provide exact answers but don’t scale. What if you want tighter bounds than LP without the exponential cost of complete methods?

Semi-definite programming (SDP) offers exactly this: relaxations tighter than LP while remaining tractable. SDP-based verification formulates bound computation as an SDP, exploiting quadratic relationships that LP misses. The tradeoff is computational cost---SDP solvers are slower than LP solvers---but for critical properties on moderately-sized networks, the tightness gain often justifies the expense.

This guide explores how SDP-based verification works, why it provides tighter bounds, and when the extra computational cost is worthwhile.

Why Semi-Definite Programming?

Linear programs optimize over polytopes (linear constraints). Semi-definite programs optimize over spectrahedra (positive semi-definite matrix constraints). This richer geometry captures relationships that linear constraints cannot.

Key advantage: SDP can represent quadratic constraints and products of variables naturally through positive semi-definite (PSD) matrix constraints. For neural networks, this captures neuron correlations and non-linear relationships more tightly than linear relaxations.

Semi-Definite Program (SDP)

A semi-definite program has the form:
$\begin{aligned} \text{minimize} \quad & \langle C, X \rangle \\ \text{subject to} \quad & \langle A_i, X \rangle = b_i \quad \forall i \\ & X \succeq 0 \end{aligned}$
where:

$X$ is a symmetric matrix variable

$X \succeq 0$ means $X$ is positive semi-definite (all eigenvalues non-negative)

$\langle A, B \rangle = \text{trace}(A^T B)$ is the matrix inner product

Complexity: SDP can be solved in polynomial time using interior-point methods. Practical solvers (CVXPY, SCS, Mosek) handle SDPs with thousands of variables.

SDP vs LP for Verification

LP relaxations use linear inequalities to over-approximate the neural network’s behavior. They’re fast (polynomial time) but conservative.

SDP relaxations use positive semi-definite constraints to over-approximate behavior. They’re slower (higher-degree polynomial) but tighter.

Tightness hierarchy: For the same network, SDP relaxations provably provide tighter or equal bounds compared to LP relaxations. This theoretical guarantee comes from the fact that SDP can encode LP constraints (every LP is an SDP) plus additional quadratic constraints.

Aspect	LP-Based Verification	SDP-Based Verification
Constraints	Linear inequalities	PSD matrix constraints
Time Complexity	O(n^3) interior-point	O(n^4.5) interior-point
Tightness	Moderate (single-neuron)	Tight (captures quadratic)
Scalability	Thousands of neurons	Hundreds to ~1000 neurons
Best Use	General verification	Critical properties, small networks

SDP Formulation for Neural Networks

The SDP formulation for neural network verification extends the LP approach by introducing a matrix variable that captures second-order relationships between neurons.

Lifting to Quadratic Form

Standard approach: For a network with variables $z$ (all neuron activations), verify bounds on outputs.

SDP approach: Introduce a lifted variable $X = zz^T$ , a matrix encoding all pairwise products of neuron activations.

Why this helps: ReLU and other non-linearities create quadratic relationships. By explicitly representing these products in $X$ , the SDP can encode tighter constraints.

Basic SDP Formulation

Given: Neural network $f_\theta$ , input region $\mathcal{X}$ , output neuron $k$ to bound.

Variables:

$z \in \mathbb{R}^n$ : All neuron pre/post-activations
$X \in \mathbb{R}^{n \times n}$ : Lifted matrix with $X_{ij} = z_i z_j$

Objective: Minimize $z_k$ (output neuron $k$ )

Constraints:

Linear layer constraints: For $z_j = \sum_i W_{ji} z_i + b_j$ :
$z_j = \langle W_j, z \rangle + b_j$
Input bounds: $\underline{x} \leq x \leq \overline{x}$ for input $x$
ReLU constraints: For $y = \text{ReLU}(z)$ with bounds $\underline{z} \leq z \leq \overline{z}$ :
- If $\overline{z} \leq 0$ : $y = 0$ (inactive)
- If $\underline{z} \geq 0$ : $y = z$ (active)
- If $\underline{z} < 0 < \overline{z}$ : Use SDP relaxation
Lifted consistency: $X_{ij} = z_i z_j$ (exactly)
PSD constraint: $\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0$

Key insight: The PSD constraint $\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0$ enforces that $X$ represents outer products, making $X_{ij} \approx z_i z_j$ without requiring exact equality (which would be non-convex).

ReLU Relaxation in SDP

For uncertain ReLUs ( $\underline{z} < 0 < \overline{z}$ ), the SDP formulation adds tighter constraints than LP:

LP triangle relaxation:

\begin{aligned} y &\geq 0 \\ y &\geq z \\ y &\leq \frac{\overline{z}}{\overline{z} - \underline{z}} (z - \underline{z}) \end{aligned}

SDP quadratic relaxation (additional constraint beyond LP):

y \cdot z \geq 0

This constraint captures the fact that when ReLU is active ( $y = z$ ), both $y$ and $z$ are positive, so their product is positive. When inactive ( $y = 0$ ), the product is zero.

Why it’s tighter: The quadratic constraint $y \cdot z \geq 0$ is non-linear, which LP cannot express. SDP encodes it via the lifted matrix $X$ with the PSD constraint.

Nuclear Norm and Rank Constraints

An alternative SDP formulation uses the nuclear norm (sum of singular values) to encourage low-rank structure in the lifted matrix $X$ .

Nuclear Norm Regularization

Motivation: The exact relationship $X = zz^T$ implies $X$ is rank-1. Relaxing this constraint makes the problem convex, but allowing arbitrary $X$ can be loose.

Middle ground: Minimize the nuclear norm $\|X\|_* = \sum_i \sigma_i(X)$ where $\sigma_i$ are singular values.

SDP formulation with nuclear norm:

\begin{aligned} \text{minimize} \quad & z_k + \lambda \|X\|_* \\ \text{subject to} \quad & \text{(network constraints)} \\ & X \succeq 0 \end{aligned}

Effect: The nuclear norm penalty encourages $X$ to be low-rank, closer to the rank-1 structure of true outer products. The parameter $\lambda$ balances tightness (higher $\lambda$ ) and conservativeness (lower $\lambda$ ).

Nuclear norm as SDP: The nuclear norm minimization can be reformulated as an SDP constraint:

\min \|X\|_* \iff \min t \text{ s.t. } \begin{bmatrix} tI & X \\ X^T & tI \end{bmatrix} \succeq 0

This makes nuclear norm optimization tractable within the SDP framework.

Tightness Improvements

Empirical results:

Nuclear norm regularization provides 10-30% tighter bounds than unregularized SDP
SDP provides 20-50% tighter bounds than LP relaxation for typical networks
Tightness gain is most pronounced for networks with many uncertain ReLUs

Comparison with Other Methods

SDP vs LP-Based Methods

LP-based methods:

Pros: Fast, scalable to large networks
Cons: Loose bounds (misses quadratic relationships)

SDP-based methods:

Pros: Tighter bounds (captures products via PSD constraints)
Cons: Slower (cubic to quartic complexity)

When SDP wins: Networks with strong neuron correlations, properties requiring very tight bounds, moderate network size (hundreds to ~1000 neurons).

SDP vs Multi-Neuron Relaxations

Multi-neuron relaxations (PRIMA):

Approach: Add pairwise product terms explicitly for selected neuron pairs
Complexity: Depends on number of pairs $k$ ; roughly $O(n + k)$ where $k \ll n^2$
Tightness: Tight for selected pairs, but greedy pair selection may miss important correlations

SDP-based methods:

Approach: Encode all pairwise products via lifted matrix $X$
Complexity: $O(n^{4.5})$ for general SDP solvers
Tightness: Captures all pairwise products, potentially tighter than selective multi-neuron

Tradeoff: PRIMA is often faster with comparable tightness due to selective pairing. SDP provides theoretical tightness guarantees but at higher computational cost.

SDP vs Complete Methods

Complete methods (Marabou, branch-and-bound):

Pros: Exact answers (verified or counterexample)
Cons: Exponential worst-case complexity, don’t scale to large networks

SDP methods:

Pros: Polynomial time, tighter than LP/CROWN
Cons: Still incomplete (might return “unknown”)

When to choose SDP over complete: When the network is too large for complete methods but LP/CROWN are too loose. SDP fills the gap between fast incomplete and slow complete methods.

Method	Tightness	Time Complexity	Network Size Limit
IBP	Loosest	O(n)	Millions of neurons
CROWN	Moderate	O(n)	Millions of neurons
LP-based	Moderate-Tight	O(n^3)	Thousands of neurons
PRIMA	Tight	O(n + k pairs)	Thousands of neurons
SDP-based	Very Tight	O(n^4.5)	Hundreds to ~1000
Complete (SMT/MILP)	Exact	Exponential	Hundreds of neurons

Practical Implementation

Using SDP Solvers

Available solvers:

CVXPY: High-level Python interface, supports multiple SDP solvers
SCS: Splitting Conic Solver, fast for large-scale problems
MOSEK: Commercial solver, highly optimized
SDPT3/SeDuMi: MATLAB-based solvers

Typical workflow:

import cvxpy as cp
import numpy as np

# Define network layers, input bounds, etc.
n_neurons = 100  # Total neurons (all layers)

# Variables
z = cp.Variable(n_neurons)  # Neuron activations
X = cp.Variable((n_neurons, n_neurons), symmetric=True)  # Lifted matrix

# Objective: minimize output neuron
objective = cp.Minimize(z[-1])

# Constraints
constraints = []

# Linear layer constraints: z_j = W @ z + b
# (add for each layer)

# Input bounds
constraints.append(z[:input_dim] >= x_lower)
constraints.append(z[:input_dim] <= x_upper)

# ReLU constraints (triangle + quadratic)
# For each uncertain ReLU with pre-activation z_i, post-activation y_j:
# constraints.append(y_j * z_i >= 0)  # SDP quadratic constraint

# PSD constraint (lifted consistency)
lifted = cp.bmat([[X, z.reshape(-1, 1)],
                  [z.reshape(1, -1), np.array([[1]])]])
constraints.append(lifted >> 0)  # PSD

# Solve
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.SCS)

lower_bound = problem.value

Performance Considerations

Scalability challenges:

Matrix size: $X \in \mathbb{R}^{n \times n}$ has $O(n^2)$ entries. For $n = 1000$ , that’s 1 million variables.
Solver time: SDP solvers have $O(n^{4.5})$ complexity. Doubling network size increases time by ~22x.

Optimizations:

Sparsity exploitation: Many entries of $X$ may be irrelevant (neurons in different layers don’t interact directly). Exploit this sparsity to reduce variable count.

Layer-by-layer solving: Similar to LP, solve SDP layer-by-layer rather than for the entire network. This keeps $n$ small (neurons per layer) rather than large (total neurons).

Warm-starting: When solving multiple similar SDPs (different output neurons, slightly different bounds), initialize solver with previous solution.

GPU acceleration: Some recent work explores GPU-accelerated SDP solving, though traditional SDP solvers are CPU-based.

When to Use SDP-Based Verification

Use SDP when:

LP or CROWN return “unknown” and you need tighter bounds
Network is small to medium (hundreds to ~1000 neurons)
Properties are critical---extra computational cost justified for tightness
Strong neuron correlations (convolutional layers, residual connections)
Computational budget allows hours for verification

Don’t use when:

Network is very large (thousands to millions of neurons)---LP/CROWN/PRIMA faster
LP already verifies the property---no need for tighter bounds
Need complete verification---use SMT or branch-and-bound instead
Rapid iteration required---SDP too slow for tight feedback loops

Sweet Spot for SDP: SDP-based verification occupies a niche: tighter than LP but slower, faster than complete but still incomplete. It’s most valuable when:

Simple incomplete methods fail to verify

Complete methods are too slow

Network size is moderate (not tiny, not huge)

Tightness matters more than speed

Current Research and Extensions

Active research directions:

Tighter SDP relaxations: Beyond basic nuclear norm, researchers explore additional constraints (e.g., exploiting network structure like convolution, adding higher-order tensor relaxations) to tighten bounds further.

Faster solvers: Developing specialized SDP solvers for neural network verification that exploit problem structure for faster solving.

Hybrid SDP-MILP: Combine SDP relaxations with integer programming for select neurons, balancing tightness and completeness.

Integration with training: Train networks that are easier to verify via SDP (e.g., encouraging low nuclear norm in activations).

Probabilistic SDP: Extend SDP to handle probabilistic guarantees, combining with randomized smoothing.

Limitations

Computational cost: $O(n^{4.5})$ complexity limits SDP to moderately-sized networks. Large-scale verification requires faster methods.

Still incomplete: SDP provides tighter bounds than LP but might still return “unknown.” For definitive answers, complete methods are needed.

Solver numerical issues: SDP solvers can face numerical instability, especially for large problems or poorly-scaled constraints. Careful problem formulation and solver tuning are required.

Not always tighter in practice: While theoretically tighter than LP, in practice the improvement depends on network structure. For some networks, the gain is marginal, not worth the extra cost.

Final Thoughts

SDP-based verification represents a sophisticated approach to incomplete verification: encoding neural networks as semi-definite programs captures quadratic relationships that simpler methods miss. The positive semi-definite constraint---a subtle but powerful tool from convex optimization---enables reasoning about products and correlations without sacrificing polynomial-time solvability.

While SDP doesn’t scale to massive networks like CROWN or IBP do, it fills an important niche. For critical properties on moderately-sized networks where LP is too loose and complete methods are too slow, SDP provides the right balance.

Understanding SDP-based verification illuminates the tightness hierarchy of incomplete methods: from simple intervals (IBP) to linear bounds (CROWN) to linear programs (LP) to semi-definite programs (SDP) to complete enumeration (SMT/MILP). Each step up the hierarchy trades speed for tightness. The art of verification is choosing the right point on this spectrum for your application.