Phase 2: Methods & Tools Guide 7

SDP-Based Verification: Semi-Definite Programming for Tighter Bounds

Semidefinite programming approaches for tighter neural network verification bounds, including lifted matrix formulations, nuclear norm regularization, and comparison with LP methods.

Linear programming relaxations provide polynomial-time verification but sometimes yield loose bounds. Complete methods provide exact answers but don’t scale. What if you want tighter bounds than LP without the exponential cost of complete methods?

Semi-definite programming (SDP) offers exactly this: relaxations tighter than LP while remaining tractable. SDP-based verification formulates bound computation as an SDP, exploiting quadratic relationships that LP misses. The tradeoff is computational cost---SDP solvers are slower than LP solvers---but for critical properties on moderately-sized networks, the tightness gain often justifies the expense.

This guide explores how SDP-based verification works, why it provides tighter bounds, and when the extra computational cost is worthwhile.

Why Semi-Definite Programming?

Linear programs optimize over polytopes (linear constraints). Semi-definite programs optimize over spectrahedra (positive semi-definite matrix constraints). This richer geometry captures relationships that linear constraints cannot.

Key advantage: SDP can represent quadratic constraints and products of variables naturally through positive semi-definite (PSD) matrix constraints. For neural networks, this captures neuron correlations and non-linear relationships more tightly than linear relaxations.

Semi-Definite Program (SDP)

A semi-definite program has the form:

minimizeC,Xsubject toAi,X=biiX0\begin{aligned} \text{minimize} \quad & \langle C, X \rangle \\ \text{subject to} \quad & \langle A_i, X \rangle = b_i \quad \forall i \\ & X \succeq 0 \end{aligned}

where:

  • XX is a symmetric matrix variable
  • X0X \succeq 0 means XX is positive semi-definite (all eigenvalues non-negative)
  • A,B=trace(ATB)\langle A, B \rangle = \text{trace}(A^T B) is the matrix inner product

Complexity: SDP can be solved in polynomial time using interior-point methods. Practical solvers (CVXPY, SCS, Mosek) handle SDPs with thousands of variables.

SDP vs LP for Verification

LP relaxations use linear inequalities to over-approximate the neural network’s behavior. They’re fast (polynomial time) but conservative.

SDP relaxations use positive semi-definite constraints to over-approximate behavior. They’re slower (higher-degree polynomial) but tighter.

Tightness hierarchy: For the same network, SDP relaxations provably provide tighter or equal bounds compared to LP relaxations. This theoretical guarantee comes from the fact that SDP can encode LP constraints (every LP is an SDP) plus additional quadratic constraints.

AspectLP-Based VerificationSDP-Based Verification
ConstraintsLinear inequalitiesPSD matrix constraints
Time ComplexityO(n^3) interior-pointO(n^4.5) interior-point
TightnessModerate (single-neuron)Tight (captures quadratic)
ScalabilityThousands of neuronsHundreds to ~1000 neurons
Best UseGeneral verificationCritical properties, small networks

SDP Formulation for Neural Networks

The SDP formulation for neural network verification extends the LP approach by introducing a matrix variable that captures second-order relationships between neurons.

Lifting to Quadratic Form

Standard approach: For a network with variables zz (all neuron activations), verify bounds on outputs.

SDP approach: Introduce a lifted variable X=zzTX = zz^T, a matrix encoding all pairwise products of neuron activations.

Why this helps: ReLU and other non-linearities create quadratic relationships. By explicitly representing these products in XX, the SDP can encode tighter constraints.

Basic SDP Formulation

Given: Neural network fθf_\theta, input region X\mathcal{X}, output neuron kk to bound.

Variables:

  • zRnz \in \mathbb{R}^n: All neuron pre/post-activations
  • XRn×nX \in \mathbb{R}^{n \times n}: Lifted matrix with Xij=zizjX_{ij} = z_i z_j

Objective: Minimize zkz_k (output neuron kk)

Constraints:

  1. Linear layer constraints: For zj=iWjizi+bjz_j = \sum_i W_{ji} z_i + b_j:

    zj=Wj,z+bjz_j = \langle W_j, z \rangle + b_j
  2. Input bounds: xxx\underline{x} \leq x \leq \overline{x} for input xx

  3. ReLU constraints: For y=ReLU(z)y = \text{ReLU}(z) with bounds zzz\underline{z} \leq z \leq \overline{z}:

    • If z0\overline{z} \leq 0: y=0y = 0 (inactive)
    • If z0\underline{z} \geq 0: y=zy = z (active)
    • If z<0<z\underline{z} < 0 < \overline{z}: Use SDP relaxation
  4. Lifted consistency: Xij=zizjX_{ij} = z_i z_j (exactly)

  5. PSD constraint: [XzzT1]0\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0

Key insight: The PSD constraint [XzzT1]0\begin{bmatrix} X & z \\ z^T & 1 \end{bmatrix} \succeq 0 enforces that XX represents outer products, making XijzizjX_{ij} \approx z_i z_j without requiring exact equality (which would be non-convex).

ReLU Relaxation in SDP

For uncertain ReLUs (z<0<z\underline{z} < 0 < \overline{z}), the SDP formulation adds tighter constraints than LP:

LP triangle relaxation:

y0yzyzzz(zz)\begin{aligned} y &\geq 0 \\ y &\geq z \\ y &\leq \frac{\overline{z}}{\overline{z} - \underline{z}} (z - \underline{z}) \end{aligned}

SDP quadratic relaxation (additional constraint beyond LP):

yz0y \cdot z \geq 0

This constraint captures the fact that when ReLU is active (y=zy = z), both yy and zz are positive, so their product is positive. When inactive (y=0y = 0), the product is zero.

Why it’s tighter: The quadratic constraint yz0y \cdot z \geq 0 is non-linear, which LP cannot express. SDP encodes it via the lifted matrix XX with the PSD constraint.

Nuclear Norm and Rank Constraints

An alternative SDP formulation uses the nuclear norm (sum of singular values) to encourage low-rank structure in the lifted matrix XX.

Nuclear Norm Regularization

Motivation: The exact relationship X=zzTX = zz^T implies XX is rank-1. Relaxing this constraint makes the problem convex, but allowing arbitrary XX can be loose.

Middle ground: Minimize the nuclear norm X=iσi(X)\|X\|_* = \sum_i \sigma_i(X) where σi\sigma_i are singular values.

SDP formulation with nuclear norm:

minimizezk+λXsubject to(network constraints)X0\begin{aligned} \text{minimize} \quad & z_k + \lambda \|X\|_* \\ \text{subject to} \quad & \text{(network constraints)} \\ & X \succeq 0 \end{aligned}

Effect: The nuclear norm penalty encourages XX to be low-rank, closer to the rank-1 structure of true outer products. The parameter λ\lambda balances tightness (higher λ\lambda) and conservativeness (lower λ\lambda).

Nuclear norm as SDP: The nuclear norm minimization can be reformulated as an SDP constraint:

minX    mint s.t. [tIXXTtI]0\min \|X\|_* \iff \min t \text{ s.t. } \begin{bmatrix} tI & X \\ X^T & tI \end{bmatrix} \succeq 0

This makes nuclear norm optimization tractable within the SDP framework.

Tightness Improvements

Empirical results:

  • Nuclear norm regularization provides 10-30% tighter bounds than unregularized SDP
  • SDP provides 20-50% tighter bounds than LP relaxation for typical networks
  • Tightness gain is most pronounced for networks with many uncertain ReLUs

Comparison with Other Methods

SDP vs LP-Based Methods

LP-based methods:

  • Pros: Fast, scalable to large networks
  • Cons: Loose bounds (misses quadratic relationships)

SDP-based methods:

  • Pros: Tighter bounds (captures products via PSD constraints)
  • Cons: Slower (cubic to quartic complexity)

When SDP wins: Networks with strong neuron correlations, properties requiring very tight bounds, moderate network size (hundreds to ~1000 neurons).

SDP vs Multi-Neuron Relaxations

Multi-neuron relaxations (PRIMA):

  • Approach: Add pairwise product terms explicitly for selected neuron pairs
  • Complexity: Depends on number of pairs kk; roughly O(n+k)O(n + k) where kn2k \ll n^2
  • Tightness: Tight for selected pairs, but greedy pair selection may miss important correlations

SDP-based methods:

  • Approach: Encode all pairwise products via lifted matrix XX
  • Complexity: O(n4.5)O(n^{4.5}) for general SDP solvers
  • Tightness: Captures all pairwise products, potentially tighter than selective multi-neuron

Tradeoff: PRIMA is often faster with comparable tightness due to selective pairing. SDP provides theoretical tightness guarantees but at higher computational cost.

SDP vs Complete Methods

Complete methods (Marabou, branch-and-bound):

  • Pros: Exact answers (verified or counterexample)
  • Cons: Exponential worst-case complexity, don’t scale to large networks

SDP methods:

  • Pros: Polynomial time, tighter than LP/CROWN
  • Cons: Still incomplete (might return “unknown”)

When to choose SDP over complete: When the network is too large for complete methods but LP/CROWN are too loose. SDP fills the gap between fast incomplete and slow complete methods.

MethodTightnessTime ComplexityNetwork Size Limit
IBPLoosestO(n)Millions of neurons
CROWNModerateO(n)Millions of neurons
LP-basedModerate-TightO(n^3)Thousands of neurons
PRIMATightO(n + k pairs)Thousands of neurons
SDP-basedVery TightO(n^4.5)Hundreds to ~1000
Complete (SMT/MILP)ExactExponentialHundreds of neurons

Practical Implementation

Using SDP Solvers

Available solvers:

  • CVXPY: High-level Python interface, supports multiple SDP solvers
  • SCS: Splitting Conic Solver, fast for large-scale problems
  • MOSEK: Commercial solver, highly optimized
  • SDPT3/SeDuMi: MATLAB-based solvers

Typical workflow:

import cvxpy as cp
import numpy as np

# Define network layers, input bounds, etc.
n_neurons = 100  # Total neurons (all layers)

# Variables
z = cp.Variable(n_neurons)  # Neuron activations
X = cp.Variable((n_neurons, n_neurons), symmetric=True)  # Lifted matrix

# Objective: minimize output neuron
objective = cp.Minimize(z[-1])

# Constraints
constraints = []

# Linear layer constraints: z_j = W @ z + b
# (add for each layer)

# Input bounds
constraints.append(z[:input_dim] >= x_lower)
constraints.append(z[:input_dim] <= x_upper)

# ReLU constraints (triangle + quadratic)
# For each uncertain ReLU with pre-activation z_i, post-activation y_j:
# constraints.append(y_j * z_i >= 0)  # SDP quadratic constraint

# PSD constraint (lifted consistency)
lifted = cp.bmat([[X, z.reshape(-1, 1)],
                  [z.reshape(1, -1), np.array([[1]])]])
constraints.append(lifted >> 0)  # PSD

# Solve
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.SCS)

lower_bound = problem.value

Performance Considerations

Scalability challenges:

  • Matrix size: XRn×nX \in \mathbb{R}^{n \times n} has O(n2)O(n^2) entries. For n=1000n = 1000, that’s 1 million variables.
  • Solver time: SDP solvers have O(n4.5)O(n^{4.5}) complexity. Doubling network size increases time by ~22x.

Optimizations:

Sparsity exploitation: Many entries of XX may be irrelevant (neurons in different layers don’t interact directly). Exploit this sparsity to reduce variable count.

Layer-by-layer solving: Similar to LP, solve SDP layer-by-layer rather than for the entire network. This keeps nn small (neurons per layer) rather than large (total neurons).

Warm-starting: When solving multiple similar SDPs (different output neurons, slightly different bounds), initialize solver with previous solution.

GPU acceleration: Some recent work explores GPU-accelerated SDP solving, though traditional SDP solvers are CPU-based.

When to Use SDP-Based Verification

Use SDP when:

  • LP or CROWN return “unknown” and you need tighter bounds
  • Network is small to medium (hundreds to ~1000 neurons)
  • Properties are critical---extra computational cost justified for tightness
  • Strong neuron correlations (convolutional layers, residual connections)
  • Computational budget allows hours for verification

Don’t use when:

  • Network is very large (thousands to millions of neurons)---LP/CROWN/PRIMA faster
  • LP already verifies the property---no need for tighter bounds
  • Need complete verification---use SMT or branch-and-bound instead
  • Rapid iteration required---SDP too slow for tight feedback loops

Sweet Spot for SDP: SDP-based verification occupies a niche: tighter than LP but slower, faster than complete but still incomplete. It’s most valuable when:

  • Simple incomplete methods fail to verify
  • Complete methods are too slow
  • Network size is moderate (not tiny, not huge)
  • Tightness matters more than speed

Current Research and Extensions

Active research directions:

Tighter SDP relaxations: Beyond basic nuclear norm, researchers explore additional constraints (e.g., exploiting network structure like convolution, adding higher-order tensor relaxations) to tighten bounds further.

Faster solvers: Developing specialized SDP solvers for neural network verification that exploit problem structure for faster solving.

Hybrid SDP-MILP: Combine SDP relaxations with integer programming for select neurons, balancing tightness and completeness.

Integration with training: Train networks that are easier to verify via SDP (e.g., encouraging low nuclear norm in activations).

Probabilistic SDP: Extend SDP to handle probabilistic guarantees, combining with randomized smoothing.

Limitations

Computational cost: O(n4.5)O(n^{4.5}) complexity limits SDP to moderately-sized networks. Large-scale verification requires faster methods.

Still incomplete: SDP provides tighter bounds than LP but might still return “unknown.” For definitive answers, complete methods are needed.

Solver numerical issues: SDP solvers can face numerical instability, especially for large problems or poorly-scaled constraints. Careful problem formulation and solver tuning are required.

Not always tighter in practice: While theoretically tighter than LP, in practice the improvement depends on network structure. For some networks, the gain is marginal, not worth the extra cost.

Final Thoughts

SDP-based verification represents a sophisticated approach to incomplete verification: encoding neural networks as semi-definite programs captures quadratic relationships that simpler methods miss. The positive semi-definite constraint---a subtle but powerful tool from convex optimization---enables reasoning about products and correlations without sacrificing polynomial-time solvability.

While SDP doesn’t scale to massive networks like CROWN or IBP do, it fills an important niche. For critical properties on moderately-sized networks where LP is too loose and complete methods are too slow, SDP provides the right balance.

Understanding SDP-based verification illuminates the tightness hierarchy of incomplete methods: from simple intervals (IBP) to linear bounds (CROWN) to linear programs (LP) to semi-definite programs (SDP) to complete enumeration (SMT/MILP). Each step up the hierarchy trades speed for tightness. The art of verification is choosing the right point on this spectrum for your application.

Further Reading

This guide provides comprehensive coverage of SDP-based verification for neural networks. For readers interested in diving deeper, we recommend the following resources organized by topic:

SDP-Based Verification Foundations:

The seminal work on SDP relaxations for neural network verification introduced the lifted matrix formulation and nuclear norm regularization, demonstrating that semi-definite programming can provide significantly tighter bounds than linear programming while maintaining polynomial-time complexity. This foundational paper establishes the theoretical tightness hierarchy: SDP relaxations are provably at least as tight as LP relaxations.

Convex Optimization Background:

For readers unfamiliar with semi-definite programming, CVXPY provides an accessible introduction to convex optimization and SDP solving with practical Python implementations. Understanding SDP fundamentals---positive semi-definiteness, matrix inner products, duality theory---is essential for applying SDP-based verification effectively.

Comparison with LP-Based Methods:

LP-based verification provides the baseline for comparison. SDP extends LP by adding PSD constraints that capture quadratic relationships. Understanding LP formulations helps clarify what additional tightness SDP provides and why it costs more computationally.

Comparison with Multi-Neuron Methods:

PRIMA and other multi-neuron relaxations offer an alternative path to tighter bounds: selectively add product terms rather than lifting the entire problem to X=zzTX = zz^T. In practice, PRIMA with careful pair selection often matches SDP tightness at lower computational cost. The tradeoff is theoretical: SDP provides worst-case tightness guarantees, while PRIMA’s tightness depends on heuristic pair selection.

Bound Propagation Baselines:

Fast incomplete methods---CROWN, DeepPoly, IBP---provide the lower bound on the tightness-speed spectrum. SDP trades their speed for tighter bounds. Understanding these baselines clarifies when SDP’s extra cost is justified.

Complete Verification for Context:

When SDP is still too loose or you need definitive answers, complete methods are necessary. Marabou provides SMT-based complete verification. MILP approaches formulate verification as mixed-integer programming. Branch-and-bound combines incomplete bounds (potentially from SDP) with systematic refinement for completeness.

Related Topics:

For understanding the LP methods that SDP extends, see LP verification. For multi-neuron alternatives to SDP, see PRIMA verification. For complete methods that SDP might replace or feed into, see Marabou and Reluplex and branch and bound. For faster incomplete methods, see bound propagation. For understanding Lipschitz-based approaches that complement SDP, see Lipschitz verification.

Next Guide: Continue to Lipschitz Verification to learn about Lipschitz constant estimation and spectral normalization for robustness certification.