Phase 2: Methods & Tools Guide 4

Linear Programming for Verification

LP formulations for neural network verification, including triangle relaxation, duality theory, complexity analysis, and comparison with other bound propagation methods.

Neural network verification is fundamentally about computing bounds: given input constraints, what are the tightest possible bounds on the output? Linear programming offers a powerful approach---formulate bound computation as an LP, solve it efficiently, and get tight bounds without the exponential cost of complete methods.

LP-based verification sits between fast but loose methods (interval bound propagation) and expensive complete methods (SMT, MILP). It provides a sweet spot: tighter bounds than simple propagation, polynomial-time complexity, and well-understood optimization theory. This guide explores how LP-based verification works, when to use it, and how it compares to other approaches.

Why Linear Programming?

Linear programs are efficiently solvable. The simplex algorithm and interior-point methods solve LPs in polynomial time for most practical cases. Mature solvers (CPLEX, Gurobi, open-source alternatives) handle LPs with millions of variables.

Key advantage: Unlike MILP (which adds integer variables for ReLU states and becomes NP-hard), LP relaxes these constraints. You lose completeness but gain tractability.

LP vs MILP

  • MILP (Mixed-Integer LP): Encodes ReLU states exactly with binary variables → NP-hard, exponential worst-case
  • LP (Linear Program): Relaxes ReLU states to continuous relaxations → Polynomial time, sound but incomplete

LP trades exactness for speed. It provides sound over-approximations---bounds are guaranteed to contain the true output range, but might be conservative.

The LP Formulation for Neural Networks

LP-based verification computes bounds on network outputs by formulating constraints as a linear program.

Network as Linear Constraints

Linear layers: For z=Wx+bz = Wx + b, these are native linear equality constraints:

zi=jWijxj+biiz_i = \sum_j W_{ij} x_j + b_i \quad \forall i

Input constraints: For \ell_\infty ball xx0ϵ\|x - x_0\|_\infty \leq \epsilon:

x0ϵxx0+ϵx_0 - \epsilon \leq x \leq x_0 + \epsilon

Objective: To find bounds on output dimension kk, solve two LPs:

Lower bound:minfθ(x)kUpper bound:maxfθ(x)k\begin{aligned} \text{Lower bound:} \quad & \min \quad f_\theta(x)_k \\ \text{Upper bound:} \quad & \max \quad f_\theta(x)_k \end{aligned}

The challenge: ReLU activations introduce non-linearity.

ReLU Relaxation

ReLU y=ReLU(z)=max(0,z)y = \text{ReLU}(z) = \max(0, z) isn’t linear. LP-based methods use linear relaxations:

Triangle relaxation (standard approach):

For ReLU with pre-activation bounds zzz\underline{z} \leq z \leq \overline{z}:

y0yzyzzz(zz)(if z<0<z)\begin{aligned} y &\geq 0 \\ y &\geq z \\ y &\leq \frac{\overline{z}}{\overline{z} - \underline{z}} (z - \underline{z}) \quad \text{(if } \underline{z} < 0 < \overline{z}\text{)} \end{aligned}

This creates a convex polytope over-approximating the ReLU:

  • If z0\overline{z} \leq 0: ReLU is always inactive, y=0y = 0
  • If z0\underline{z} \geq 0: ReLU is always active, y=zy = z
  • If z<0<z\underline{z} < 0 < \overline{z}: ReLU is uncertain, use triangle relaxation

Geometric Interpretation: The triangle relaxation creates a polytope containing the true ReLU function. The tighter the pre-activation bounds [z,z][\underline{z}, \overline{z}], the tighter the polytope. Loose bounds → loose relaxation → conservative output bounds.

Layer-by-Layer LP Solving

Naive approach: Encode the entire network as one massive LP with all layer variables.

Problem: For deep networks, this creates huge LPs (millions of variables), becoming computationally expensive.

Better approach: Solve layer-by-layer:

  1. Layer 1: Compute bounds on all neurons using input constraints
  2. Layer 2: Use Layer 1 bounds as constraints, compute Layer 2 bounds
  3. Repeat for all layers
  4. Output layer: Final bounds on output

Each layer’s LP is much smaller---only variables for that layer. This is faster than encoding the full network.

Comparison with Other Bound Propagation Methods

LP-Based vs Interval Bound Propagation (IBP)

IBP: Propagates intervals [z,z][\underline{z}, \overline{z}] through layers analytically.

  • Speed: Very fast (closed-form propagation)
  • Tightness: Loose (doesn’t optimize, just computes worst-case)

LP-based: Solves optimization problem for each bound.

  • Speed: Slower (requires LP solver calls)
  • Tightness: Tighter (optimizes over feasible region)

Tradeoff: LP is worth the extra cost when tighter bounds are needed. For rapid screening, IBP suffices.

LP-Based vs CROWN

CROWN: Backward linear bound propagation with optimized coefficients.

  • Speed: Fast (backward pass, analytical optimization)
  • Tightness: Tight (optimizes linear relaxation parameters)

LP-based: Solves explicit LP for each bound.

  • Speed: Moderate (LP solver overhead)
  • Tightness: Can be tighter for some networks (depends on relaxation quality)

In practice: CROWN is often faster with comparable tightness. LP-based verification is useful when you need the explicit LP formulation (e.g., for custom constraints or objectives).

MethodTime ComplexityTightnessBest Use Case
IBPO(network size)LooseRapid screening, certified training
CROWNO(network size)TightGeneral verification, production use
LP-basedO(network size x LP solve)Very tightWhen explicit optimization needed
SDP-basedO(network size^3)TightestSmall networks, research

Tightening LP Relaxations

Standard triangle relaxation can be loose. Several techniques tighten it:

Pre-Activation Bound Tightening

Key observation: Tighter pre-activation bounds [z,z][\underline{z}, \overline{z}] → tighter ReLU relaxation → tighter output bounds.

Strategy: Before constructing the LP, use cheaper methods (IBP, CROWN) to compute initial bounds. Then use these to initialize the LP formulation.

Iterative refinement: Solve LP for each neuron’s bounds, use those to tighten subsequent layers, repeat until convergence.

Duality Theory in Depth

The dual formulation provides a powerful alternative perspective on LP-based verification. Instead of optimizing over network variables directly (primal), we optimize over dual variables representing constraint sensitivities. Understanding duality theory reveals why LP-based verification works and when it’s tight.

Primal LP Formulation:

For a neural network verification problem, the primal LP optimizes over neuron activations:

mincTzs.t.Azbz0\begin{aligned} \min \quad & c^T z \\ \text{s.t.} \quad & Az \leq b \\ & z \geq 0 \end{aligned}

where zz contains all neuron variables, AzbAz \leq b encodes linear layer constraints and ReLU relaxations, and cc selects the output neuron to bound.

Dual LP Formulation:

The Lagrangian dual introduces dual variables λ0\lambda \geq 0 for each constraint:

maxbTλs.t.ATλ=cλ0\begin{aligned} \max \quad & b^T \lambda \\ \text{s.t.} \quad & A^T \lambda = c \\ & \lambda \geq 0 \end{aligned}

Lagrangian Relaxation:

The dual arises from the Lagrangian function:

L(z,λ)=cTz+λT(bAz)L(z, \lambda) = c^T z + \lambda^T (b - Az)

For any feasible zz and λ0\lambda \geq 0:

cTzL(z,λ)=cTz+λT(bAz)minzL(z,λ)c^T z \geq L(z, \lambda) = c^T z + \lambda^T (b - Az) \geq \min_z L(z, \lambda)

The dual objective bTλb^T \lambda equals minzL(z,λ)\min_z L(z, \lambda) when ATλ=cA^T \lambda = c. This provides a lower bound on the primal objective.

Strong Duality for LP:

For linear programs, strong duality holds: if both primal and dual are feasible, then:

minzcTz=maxλbTλ\min_{z} c^T z = \max_{\lambda} b^T \lambda

The optimal values are equal. This is a special property of LP---not true for general non-convex problems.

Why the Dual is Often Easier to Solve:

  1. Fewer variables: The dual has one variable per constraint. For neural networks with sparse connectivity, the dual can be much smaller.

    • Primal: Variables for every neuron activation across all layers
    • Dual: Variables for each constraint (many constraints might be implicit in forward propagation)
  2. Structure exploitation: The dual formulation ATλ=cA^T \lambda = c directly reveals which constraints “matter” for the bound. Inactive constraints (with λi=0\lambda_i = 0) don’t affect the solution.

  3. Warm-starting: When computing bounds for multiple outputs, dual variables from previous solutions provide good initializations.

Connection to Outer Approximation:

The dual formulation connects to outer approximation techniques. The primal LP computes bounds by propagating constraints forward. The dual computes bounds by backpropagating sensitivity---how much each constraint contributes to the final bound.

Theorem (LP Duality for Neural Network Verification): For the LP relaxation of a ReLU network verification problem:

minx,z,y{fθ(x)k:xX,z=Wx+b,y=ReLU-relax(z)}\min_{x, z, y} \{ f_\theta(x)_k : x \in X, z = Wx + b, y = \text{ReLU-relax}(z) \}

the dual problem:

maxλ{dual objective(λ):dual constraints}\max_{\lambda} \{ \text{dual objective}(\lambda) : \text{dual constraints} \}

has the same optimal value (strong duality), and solving the dual provides the same lower bound as the primal.

Practical Implication: Methods like CROWN implicitly solve the dual formulation through backward bound propagation. The “optimized linear bounds” in CROWN are dual variables optimized to maximize the lower bound.

Cutting Planes

Standard LP relaxation uses only basic constraints. Cutting planes add additional linear inequalities that:

  • Don’t exclude any feasible solutions (sound)
  • Tighten the relaxation (remove infeasible regions)

Example: For two ReLUs with correlated pre-activations, add constraints capturing their relationship beyond independent relaxations.

Challenge: Identifying useful cutting planes is problem-specific. Generic approaches exist but require careful tuning.

Practical Implementation

Using LP Solvers

Commercial solvers: Gurobi, CPLEX (fast, highly optimized, licensing costs)

Open-source solvers: GLPK, CLP, SoPlex (free, decent performance)

Interface: Most solvers provide Python APIs (Gurobi Python, PuLP, CVXPY)

Example workflow:

import gurobipy as gp

# Create model
model = gp.Model()

# Add variables for each neuron
x = model.addVars(input_dim, lb=lower_bounds, ub=upper_bounds)

# Add linear layer constraints
z = model.addVars(layer_size)
for i in range(layer_size):
    model.addConstr(z[i] == sum(W[i,j] * x[j] for j in range(input_dim)) + b[i])

# Add ReLU relaxation constraints
y = model.addVars(layer_size, lb=0)
for i in range(layer_size):
    model.addConstr(y[i] >= z[i])  # y >= z
    # Add triangle constraint if uncertain ReLU

# Optimize for bounds
model.setObjective(output_neuron, gp.GRB.MINIMIZE)
model.optimize()
lower_bound = model.objVal

Performance Optimization

Warm-starting: When solving multiple similar LPs (e.g., bounds on different neurons), initialize solver with previous solution. This speeds up convergence.

Parallel solving: Compute bounds for different neurons in parallel. Each LP is independent---trivial parallelization.

Selective bounding: Only compute bounds for neurons that matter (those affecting the final property). Skip intermediate neurons where crude bounds suffice.

GPU acceleration: Some recent work explores GPU-accelerated LP solving, though traditional LP solvers are CPU-based.

Theoretical Guarantees of LP-Based Verification

LP-based verification provides formal guarantees despite being incomplete. Understanding these guarantees clarifies when LP methods are reliable and when they may fail.

Soundness Guarantee

Theorem (Soundness of LP Relaxation): For a neural network fθf_\theta with ReLU activations, input region XX, and LP-based bound computation yielding [y,y][\underline{y}, \overline{y}] on output yy:

xX:yfθ(x)y\forall x \in X: \underline{y} \leq f_\theta(x) \leq \overline{y}

The LP relaxation never underestimates lower bounds or overestimates upper bounds. All true outputs lie within the computed bounds.

Proof Sketch:

  1. Linear layers are exact: Affine transformations z=Wx+bz = Wx + b are encoded exactly as linear equality constraints. No approximation.

  2. ReLU relaxation is an over-approximation: The triangle relaxation creates a convex polytope PP such that:

    z:ReLU(z)P(z)\forall z: \text{ReLU}(z) \in P(z)

    For any z[z,z]z \in [\underline{z}, \overline{z}], the true ReLU output lies within the polytope defined by the linear constraints.

  3. Composition preserves soundness: Layerwise application of sound over-approximations yields a sound overall over-approximation.

  4. LP optimum is conservative: The LP maximizes/minimizes over the over-approximated feasible region, which contains the true feasible region. Therefore bounds are conservative.

Consequence: LP-based verification never produces false negatives. If it certifies a property (e.g., y1>y2y_1 > y_2 for all xXx \in X), the property truly holds. But it may produce false positives (fail to certify a true property due to loose relaxation).

Over-Approximation Guarantees

LP relaxation over-approximates the true network output region. The quality of this approximation depends on the relaxation tightness.

Definition (Relaxation Tightness): For a neuron with true output region RtrueR_{\text{true}} and LP-relaxed region RLPR_{\text{LP}}:

RtrueRLPR_{\text{true}} \subseteq R_{\text{LP}}

The volume ratio Vol(RLP)/Vol(Rtrue)\text{Vol}(R_{\text{LP}}) / \text{Vol}(R_{\text{true}}) measures looseness. Larger ratio → looser relaxation.

Tightness Analysis:

For a single ReLU with z[z,z]z \in [\underline{z}, \overline{z}] where z<0<z\underline{z} < 0 < \overline{z}:

  • True ReLU output region: RtrueR_{\text{true}} is the union of two line segments (non-convex)
  • Triangle relaxation region: RLPR_{\text{LP}} is the triangle bounded by y0y \geq 0, yzy \geq z, and yzzz(zz)y \leq \frac{\overline{z}}{\overline{z} - \underline{z}}(z - \underline{z})

Area of over-approximation:

The triangle region has area:

Area(RLP)=12zz\text{Area}(R_{\text{LP}}) = \frac{1}{2} \cdot |\underline{z}| \cdot \overline{z}

This is the wasted space not covered by the true ReLU but included in the relaxation.

Tightness depends on bounds:

  • If zz|\underline{z}| \ll \overline{z} or zz|\underline{z}| \gg \overline{z}, the triangle is relatively tight
  • If zz|\underline{z}| \approx \overline{z}, the triangle area is maximal (loosest case)

Consequence: Pre-activation bound quality directly determines LP relaxation tightness. Tighter initial bounds → tighter LP → better verification.

When LP Relaxation is Exact

LP relaxation is exact (no over-approximation) in special cases:

Case 1: All ReLUs are stable

If every ReLU has either z0\underline{z} \geq 0 (always active) or z0\overline{z} \leq 0 (always inactive), then:

  • No triangle relaxation needed
  • ReLU is replaced by y=zy = z or y=0y = 0 (exact)
  • LP computes exact bounds

Implication: Networks trained for verifiability aim to maximize stable ReLUs, reducing unstable neurons where LP relaxation is loose.

Case 2: Network is piecewise-linear and property is linear

For a fully piecewise-linear network (no smooth activations) with linear property specification, if all ReLU activation patterns are known, LP computes exact bounds.

Practical observation: In practice, deep networks have many unstable ReLUs, so LP relaxation is rarely exact. The goal is to minimize looseness, not eliminate it entirely.

Connection to Convex Barrier

The triangle relaxation for ReLU is the optimal convex hull (see theoretical barriers). This means:

Theorem (LP Optimality for Single-Neuron ReLU): The triangle relaxation is the tightest possible convex linear relaxation of a single ReLU over any interval [z,z][\underline{z}, \overline{z}] with z<0<z\underline{z} < 0 < \overline{z}.

Consequence: Standard LP-based verification using triangle relaxation cannot be improved for individual ReLUs without either:

  1. Multi-neuron analysis (analyze multiple ReLUs jointly, e.g., PRIMA)
  2. Non-linear relaxations (e.g., quadratic constraints in SDP)
  3. Sacrificing soundness (incomplete methods that might miss violations)

LP-based verification hits the convex barrier---a fundamental limit on single-neuron linear relaxation tightness. This explains why methods like DeepPoly and CROWN (both using triangle relaxation) have similar tightness---they’re all at the barrier.

Complexity Analysis

Understanding the computational complexity of LP-based verification clarifies its scalability and practical applicability.

LP Solving Complexity

Worst-case complexity: Simplex algorithm is exponential in the worst case (known for decades). Interior-point methods are polynomial but with high degree.

Average-case complexity: Simplex is polynomial on average for most practical LPs. This is why it remains the dominant LP algorithm despite exponential worst-case.

Theorem (Interior-Point Complexity): Interior-point methods solve an LP with mm constraints and nn variables in:

O(n(m+n)barrier-iterations)O(\sqrt{n} \cdot (m + n) \cdot \text{barrier-iterations})

where barrier-iterations is typically O(log(1/ϵ))O(\log(1/\epsilon)) for ϵ\epsilon-optimal solution.

For neural network LPs with nn neurons per layer and LL layers:

  • Variables: O(nL)O(nL) (one per neuron)
  • Constraints: O(nL)O(nL) (linear layers + ReLU relaxations)
  • Complexity per LP: O(n1.5L1.5)O(n^{1.5} L^{1.5})

For layer-by-layer solving: Solve O(nL)O(nL) LPs (one per neuron per layer), each with O(n)O(n) variables:

Total complexity=O(nL)×O(n1.5)=O(n2.5L)\text{Total complexity} = O(nL) \times O(n^{1.5}) = O(n^{2.5} L)

This is polynomial in network size, unlike MILP (exponential) or SAT (exponential).

How Network Size Affects LP Size

Width scaling: For a network with nn neurons per layer:

  • Primal LP: O(n2)O(n^2) constraints (fully-connected layers have n2n^2 weights)
  • Dual LP: O(n2)O(n^2) variables

Doubling network width → 4x LP size → roughly 8x solving time (for interior-point).

Depth scaling: For a network with LL layers:

  • Monolithic LP (encoding full network): O(nL)O(nL) variables and constraints
  • Layer-by-layer LP: Each layer’s LP has O(n)O(n) variables, solve LL times

Doubling depth → 2x more LPs to solve (layer-by-layer) or 2x LP size (monolithic).

Practical observation: Layer-by-layer scaling is better for deep networks. Monolithic LPs become intractable beyond ~10 layers.

Trade-offs: Tightness vs LP Size

Tighter relaxations require more constraints, increasing LP size:

Triangle relaxation: 3 constraints per unstable ReLU

y0,yz,yzzz(zz)y \geq 0, \quad y \geq z, \quad y \leq \frac{\overline{z}}{\overline{z} - \underline{z}}(z - \underline{z})

Multi-neuron relaxation (PRIMA): For kk neurons analyzed jointly, O(2k)O(2^k) constraints.

  • k=2k=2: 4x more constraints
  • k=3k=3: 8x more constraints
  • k=4k=4: 16x more constraints (quickly intractable)

SDP relaxation: O(n2)O(n^2) variables in semidefinite matrix, O(n6)O(n^6) solving complexity.

Tradeoff table:

Relaxation TypeConstraints per NeuronSolving ComplexityTightness
Triangle (single-neuron)O(1)O(1)O(n2.5L)O(n^{2.5} L)Moderate
Multi-neuron (k=2k=2)O(n)O(n)O(n3.5L)O(n^{3.5} L)Tight
Multi-neuron (k=3k=3)O(n2)O(n^2)O(n4.5L)O(n^{4.5} L)Very tight
SDPO(n2)O(n^2) matrixO(n6L)O(n^6 L)Tightest

Practical strategy: Use single-neuron LP for large networks, multi-neuron for small critical regions, SDP only for tiny networks (fewer than 100 neurons).

Why Multi-Neuron LP is Exponentially Larger

Single-neuron LP: Each ReLU relaxed independently. For nn ReLUs, O(n)O(n) constraints.

Multi-neuron LP: kk ReLUs analyzed jointly require encoding all possible activation patterns for those kk neurons.

  • kk ReLUs have 2k2^k activation patterns (each active/inactive)
  • Each pattern requires separate constraints to encode the corresponding linear region
  • Total constraints: O(2k(nk))O(2^k \cdot \binom{n}{k}) for all kk-tuples of neurons

Example: For n=1000n=1000 neurons and k=3k=3:

  • Number of 3-tuples: (10003)1.66×108\binom{1000}{3} \approx 1.66 \times 10^8
  • Constraints per tuple: 23=82^3 = 8
  • Total: ~1.3 billion constraints (completely intractable)

PRIMA’s solution: Don’t analyze all tuples. Use heuristics to select the most important kk-tuples (those with highest impact on final bounds), analyze only those.

Limitations of LP-Based Verification

Scalability: Each layer requires solving multiple LPs (one per neuron bound). For large networks, this becomes expensive despite polynomial-time LP solving.

Relaxation looseness: Triangle relaxation is conservative. For networks with many uncertain ReLUs, accumulated looseness reduces effectiveness.

Incomplete: LP-based verification is incomplete---it might return “unknown” when tighter methods (or complete methods) would verify.

Comparison with alternatives:

  • CROWN is often faster with similar tightness---backward propagation avoids explicit LP solving
  • Branch-and-bound with CROWN achieves better results than pure LP for difficult properties
  • For very tight bounds: Multi-neuron relaxations or SDP needed

When to Use LP-Based Verification

Use LP-based when:

  • Need tighter bounds than IBP/CROWN provide
  • Want explicit optimization formulation (for custom constraints)
  • Verifying small to medium networks where LP overhead is acceptable
  • Research context exploring LP-based techniques
  • Integrating with other LP-based tools/frameworks

Don’t use when:

  • Network is very large (millions of neurons)---CROWN or IBP better
  • Need complete verification---use SMT, MILP, or branch-and-bound instead
  • Speed is critical---IBP or CROWN faster
  • Need tightest possible bounds---SDP or multi-neuron relaxations tighter

Current Research Directions

Tighter relaxations: Beyond triangle relaxation---exploring non-polyhedral convex relaxations, learned relaxations.

Adaptive relaxation: Choose relaxation tightness per-layer based on impact on final bounds.

Integration with learning: Train networks that are easier to verify via LP (e.g., encouraging activations to be clearly active/inactive).

Hybrid LP-SDP: Combine LP efficiency with SDP tightness for critical neurons.

Final Thoughts

LP-based verification occupies a valuable middle ground: tighter than simple bound propagation, faster than complete methods. While CROWN and other specialized bound propagation methods often outperform generic LP approaches in practice, understanding LP formulations provides theoretical grounding for verification.

The LP perspective clarifies what bound propagation methods are actually doing: they’re solving relaxations of the verification problem, trading exactness for tractability. Whether you use explicit LP solvers or specialized algorithms like CROWN, the underlying principles---convex relaxation, sound approximation, optimization over feasible regions---remain the same.

Further Reading

This guide provides comprehensive coverage of linear programming approaches to neural network verification. For readers interested in diving deeper, we recommend the following resources organized by topic:

LP-Based Verification Foundations:

Convex relaxations provide the theoretical foundation for LP-based verification, showing how ReLU networks can be over-approximated with linear constraints while maintaining soundness. The key insight is trading completeness for polynomial-time solvability through convex relaxation.

Comparison with Other Incomplete Methods:

IBP provides the fastest but loosest bounds through simple interval propagation. CROWN achieves tighter bounds through backward linear propagation with optimized coefficients, often outperforming explicit LP solving in practice. Abstract interpretation offers a general framework encompassing both interval and polyhedral relaxations.

Tighter Relaxations:

Multi-neuron relaxations (PRIMA) consider multiple ReLUs jointly rather than independently, capturing correlations that single-neuron LP relaxations miss. SDP-based approaches provide even tighter bounds through semi-definite programming, at higher computational cost. These methods represent the tightness frontier for incomplete verification.

Complete Methods for Comparison:

When LP-based incomplete verification returns “unknown,” complete methods provide definitive answers. MILP extends LP with integer variables for exact ReLU encoding. SMT provides alternative complete verification. Branch-and-bound combines incomplete bounds with systematic refinement.

GPU Acceleration:

Recent work on GPU-accelerated verification demonstrates massive speedups through parallelization. While traditional LP solvers are CPU-based, specialized bound propagation methods benefit greatly from GPU acceleration, often making them more practical than explicit LP solving for large networks.

Related Topics:

For understanding the bound propagation methods that LP-based verification compares against, see bound propagation. For multi-neuron relaxations that extend LP approaches, see PRIMA verification. For SDP-based methods that provide tighter bounds, see SDP verification. For complete methods that LP relaxes, see SMT and MILP verification.

Next Guide: Continue to PRIMA Verification to learn about multi-neuron relaxation techniques that capture correlations between neurons for tighter verification bounds.