Linear Programming for Verification

Neural network verification is fundamentally about computing bounds: given input constraints, what are the tightest possible bounds on the output? Linear programming offers a powerful approach---formulate bound computation as an LP, solve it efficiently, and get tight bounds without the exponential cost of complete methods.

LP-based verification sits between fast but loose methods (interval bound propagation) and expensive complete methods (SMT, MILP). It provides a sweet spot: tighter bounds than simple propagation, polynomial-time complexity, and well-understood optimization theory. This guide explores how LP-based verification works, when to use it, and how it compares to other approaches.

Why Linear Programming?

Linear programs are efficiently solvable. The simplex algorithm and interior-point methods solve LPs in polynomial time for most practical cases. Mature solvers (CPLEX, Gurobi, open-source alternatives) handle LPs with millions of variables.

Key advantage: Unlike MILP (which adds integer variables for ReLU states and becomes NP-hard), LP relaxes these constraints. You lose completeness but gain tractability.

LP vs MILP

MILP (Mixed-Integer LP): Encodes ReLU states exactly with binary variables → NP-hard, exponential worst-case

LP (Linear Program): Relaxes ReLU states to continuous relaxations → Polynomial time, sound but incomplete

LP trades exactness for speed. It provides sound over-approximations---bounds are guaranteed to contain the true output range, but might be conservative.

The LP Formulation for Neural Networks

LP-based verification computes bounds on network outputs by formulating constraints as a linear program.

Network as Linear Constraints

Linear layers: For $z = Wx + b$ , these are native linear equality constraints:

z_i = \sum_j W_{ij} x_j + b_i \quad \forall i

Input constraints: For $\ell_\infty$ ball $\|x - x_0\|_\infty \leq \epsilon$ :

x_0 - \epsilon \leq x \leq x_0 + \epsilon

Objective: To find bounds on output dimension $k$ , solve two LPs:

\begin{aligned} \text{Lower bound:} \quad & \min \quad f_\theta(x)_k \\ \text{Upper bound:} \quad & \max \quad f_\theta(x)_k \end{aligned}

The challenge: ReLU activations introduce non-linearity.

ReLU Relaxation

ReLU $y = \text{ReLU}(z) = \max(0, z)$ isn’t linear. LP-based methods use linear relaxations:

Triangle relaxation (standard approach):

For ReLU with pre-activation bounds $\underline{z} \leq z \leq \overline{z}$ :

\begin{aligned} y &\geq 0 \\ y &\geq z \\ y &\leq \frac{\overline{z}}{\overline{z} - \underline{z}} (z - \underline{z}) \quad \text{(if } \underline{z} < 0 < \overline{z}\text{)} \end{aligned}

This creates a convex polytope over-approximating the ReLU:

If $\overline{z} \leq 0$ : ReLU is always inactive, $y = 0$
If $\underline{z} \geq 0$ : ReLU is always active, $y = z$
If $\underline{z} < 0 < \overline{z}$ : ReLU is uncertain, use triangle relaxation

Geometric Interpretation: The triangle relaxation creates a polytope containing the true ReLU function. The tighter the pre-activation bounds $[\underline{z}, \overline{z}]$ , the tighter the polytope. Loose bounds → loose relaxation → conservative output bounds.

Layer-by-Layer LP Solving

Naive approach: Encode the entire network as one massive LP with all layer variables.

Problem: For deep networks, this creates huge LPs (millions of variables), becoming computationally expensive.

Better approach: Solve layer-by-layer:

Layer 1: Compute bounds on all neurons using input constraints
Layer 2: Use Layer 1 bounds as constraints, compute Layer 2 bounds
Repeat for all layers
Output layer: Final bounds on output

Each layer’s LP is much smaller---only variables for that layer. This is faster than encoding the full network.

Comparison with Other Bound Propagation Methods

LP-Based vs Interval Bound Propagation (IBP)

IBP: Propagates intervals $[\underline{z}, \overline{z}]$ through layers analytically.

Speed: Very fast (closed-form propagation)
Tightness: Loose (doesn’t optimize, just computes worst-case)

LP-based: Solves optimization problem for each bound.

Speed: Slower (requires LP solver calls)
Tightness: Tighter (optimizes over feasible region)

Tradeoff: LP is worth the extra cost when tighter bounds are needed. For rapid screening, IBP suffices.

LP-Based vs CROWN

CROWN: Backward linear bound propagation with optimized coefficients.

Speed: Fast (backward pass, analytical optimization)
Tightness: Tight (optimizes linear relaxation parameters)

LP-based: Solves explicit LP for each bound.

Speed: Moderate (LP solver overhead)
Tightness: Can be tighter for some networks (depends on relaxation quality)

In practice: CROWN is often faster with comparable tightness. LP-based verification is useful when you need the explicit LP formulation (e.g., for custom constraints or objectives).

Method	Time Complexity	Tightness	Best Use Case
IBP	O(network size)	Loose	Rapid screening, certified training
CROWN	O(network size)	Tight	General verification, production use
LP-based	O(network size x LP solve)	Very tight	When explicit optimization needed
SDP-based	O(network size^3)	Tightest	Small networks, research

Tightening LP Relaxations

Standard triangle relaxation can be loose. Several techniques tighten it:

Pre-Activation Bound Tightening

Key observation: Tighter pre-activation bounds $[\underline{z}, \overline{z}]$ → tighter ReLU relaxation → tighter output bounds.

Strategy: Before constructing the LP, use cheaper methods (IBP, CROWN) to compute initial bounds. Then use these to initialize the LP formulation.

Iterative refinement: Solve LP for each neuron’s bounds, use those to tighten subsequent layers, repeat until convergence.

Duality Theory in Depth

The dual formulation provides a powerful alternative perspective on LP-based verification. Instead of optimizing over network variables directly (primal), we optimize over dual variables representing constraint sensitivities. Understanding duality theory reveals why LP-based verification works and when it’s tight.

Primal LP Formulation:

For a neural network verification problem, the primal LP optimizes over neuron activations:

\begin{aligned} \min \quad & c^T z \\ \text{s.t.} \quad & Az \leq b \\ & z \geq 0 \end{aligned}

where $z$ contains all neuron variables, $Az \leq b$ encodes linear layer constraints and ReLU relaxations, and $c$ selects the output neuron to bound.

Dual LP Formulation:

The Lagrangian dual introduces dual variables $\lambda \geq 0$ for each constraint:

\begin{aligned} \max \quad & b^T \lambda \\ \text{s.t.} \quad & A^T \lambda = c \\ & \lambda \geq 0 \end{aligned}

Lagrangian Relaxation:

The dual arises from the Lagrangian function:

L(z, \lambda) = c^T z + \lambda^T (b - Az)

For any feasible $z$ and $\lambda \geq 0$ :

c^T z \geq L(z, \lambda) = c^T z + \lambda^T (b - Az) \geq \min_z L(z, \lambda)

The dual objective $b^T \lambda$ equals $\min_z L(z, \lambda)$ when $A^T \lambda = c$ . This provides a lower bound on the primal objective.

Strong Duality for LP:

For linear programs, strong duality holds: if both primal and dual are feasible, then:

\min_{z} c^T z = \max_{\lambda} b^T \lambda

The optimal values are equal. This is a special property of LP---not true for general non-convex problems.

Why the Dual is Often Easier to Solve:

Fewer variables: The dual has one variable per constraint. For neural networks with sparse connectivity, the dual can be much smaller.
- Primal: Variables for every neuron activation across all layers
- Dual: Variables for each constraint (many constraints might be implicit in forward propagation)
Structure exploitation: The dual formulation $A^T \lambda = c$ directly reveals which constraints “matter” for the bound. Inactive constraints (with $\lambda_i = 0$ ) don’t affect the solution.
Warm-starting: When computing bounds for multiple outputs, dual variables from previous solutions provide good initializations.

Connection to Outer Approximation:

The dual formulation connects to outer approximation techniques. The primal LP computes bounds by propagating constraints forward. The dual computes bounds by backpropagating sensitivity---how much each constraint contributes to the final bound.

Theorem (LP Duality for Neural Network Verification): For the LP relaxation of a ReLU network verification problem:

\min_{x, z, y} \{ f_\theta(x)_k : x \in X, z = Wx + b, y = \text{ReLU-relax}(z) \}

the dual problem:

\max_{\lambda} \{ \text{dual objective}(\lambda) : \text{dual constraints} \}

has the same optimal value (strong duality), and solving the dual provides the same lower bound as the primal.

Practical Implication: Methods like CROWN implicitly solve the dual formulation through backward bound propagation. The “optimized linear bounds” in CROWN are dual variables optimized to maximize the lower bound.

Cutting Planes

Standard LP relaxation uses only basic constraints. Cutting planes add additional linear inequalities that:

Don’t exclude any feasible solutions (sound)
Tighten the relaxation (remove infeasible regions)

Example: For two ReLUs with correlated pre-activations, add constraints capturing their relationship beyond independent relaxations.

Challenge: Identifying useful cutting planes is problem-specific. Generic approaches exist but require careful tuning.

Practical Implementation

Using LP Solvers

Commercial solvers: Gurobi, CPLEX (fast, highly optimized, licensing costs)

Open-source solvers: GLPK, CLP, SoPlex (free, decent performance)

Interface: Most solvers provide Python APIs (Gurobi Python, PuLP, CVXPY)

Example workflow:

import gurobipy as gp

# Create model
model = gp.Model()

# Add variables for each neuron
x = model.addVars(input_dim, lb=lower_bounds, ub=upper_bounds)

# Add linear layer constraints
z = model.addVars(layer_size)
for i in range(layer_size):
    model.addConstr(z[i] == sum(W[i,j] * x[j] for j in range(input_dim)) + b[i])

# Add ReLU relaxation constraints
y = model.addVars(layer_size, lb=0)
for i in range(layer_size):
    model.addConstr(y[i] >= z[i])  # y >= z
    # Add triangle constraint if uncertain ReLU

# Optimize for bounds
model.setObjective(output_neuron, gp.GRB.MINIMIZE)
model.optimize()
lower_bound = model.objVal

Performance Optimization

Warm-starting: When solving multiple similar LPs (e.g., bounds on different neurons), initialize solver with previous solution. This speeds up convergence.

Parallel solving: Compute bounds for different neurons in parallel. Each LP is independent---trivial parallelization.

Selective bounding: Only compute bounds for neurons that matter (those affecting the final property). Skip intermediate neurons where crude bounds suffice.

GPU acceleration: Some recent work explores GPU-accelerated LP solving, though traditional LP solvers are CPU-based.

Theoretical Guarantees of LP-Based Verification

LP-based verification provides formal guarantees despite being incomplete. Understanding these guarantees clarifies when LP methods are reliable and when they may fail.

Soundness Guarantee

Theorem (Soundness of LP Relaxation): For a neural network $f_\theta$ with ReLU activations, input region $X$ , and LP-based bound computation yielding $[\underline{y}, \overline{y}]$ on output $y$ :

\forall x \in X: \underline{y} \leq f_\theta(x) \leq \overline{y}

The LP relaxation never underestimates lower bounds or overestimates upper bounds. All true outputs lie within the computed bounds.

Proof Sketch:

Linear layers are exact: Affine transformations $z = Wx + b$ are encoded exactly as linear equality constraints. No approximation.
ReLU relaxation is an over-approximation: The triangle relaxation creates a convex polytope $P$ such that:
$\forall z: \text{ReLU}(z) \in P(z)$
For any $z \in [\underline{z}, \overline{z}]$ , the true ReLU output lies within the polytope defined by the linear constraints.
Composition preserves soundness: Layerwise application of sound over-approximations yields a sound overall over-approximation.
LP optimum is conservative: The LP maximizes/minimizes over the over-approximated feasible region, which contains the true feasible region. Therefore bounds are conservative.

Consequence: LP-based verification never produces false negatives. If it certifies a property (e.g., $y_1 > y_2$ for all $x \in X$ ), the property truly holds. But it may produce false positives (fail to certify a true property due to loose relaxation).

Over-Approximation Guarantees

LP relaxation over-approximates the true network output region. The quality of this approximation depends on the relaxation tightness.

Definition (Relaxation Tightness): For a neuron with true output region $R_{\text{true}}$ and LP-relaxed region $R_{\text{LP}}$ :

R_{\text{true}} \subseteq R_{\text{LP}}

The volume ratio $\text{Vol}(R_{\text{LP}}) / \text{Vol}(R_{\text{true}})$ measures looseness. Larger ratio → looser relaxation.

Tightness Analysis:

For a single ReLU with $z \in [\underline{z}, \overline{z}]$ where $\underline{z} < 0 < \overline{z}$ :

True ReLU output region: $R_{\text{true}}$ is the union of two line segments (non-convex)
Triangle relaxation region: $R_{\text{LP}}$ is the triangle bounded by $y \geq 0$ , $y \geq z$ , and $y \leq \frac{\overline{z}}{\overline{z} - \underline{z}}(z - \underline{z})$

Area of over-approximation:

The triangle region has area:

\text{Area}(R_{\text{LP}}) = \frac{1}{2} \cdot |\underline{z}| \cdot \overline{z}

This is the wasted space not covered by the true ReLU but included in the relaxation.

Tightness depends on bounds:

If $|\underline{z}| \ll \overline{z}$ or $|\underline{z}| \gg \overline{z}$ , the triangle is relatively tight
If $|\underline{z}| \approx \overline{z}$ , the triangle area is maximal (loosest case)

Consequence: Pre-activation bound quality directly determines LP relaxation tightness. Tighter initial bounds → tighter LP → better verification.

When LP Relaxation is Exact

LP relaxation is exact (no over-approximation) in special cases:

Case 1: All ReLUs are stable

If every ReLU has either $\underline{z} \geq 0$ (always active) or $\overline{z} \leq 0$ (always inactive), then:

No triangle relaxation needed
ReLU is replaced by $y = z$ or $y = 0$ (exact)
LP computes exact bounds

Implication: Networks trained for verifiability aim to maximize stable ReLUs, reducing unstable neurons where LP relaxation is loose.

Case 2: Network is piecewise-linear and property is linear

For a fully piecewise-linear network (no smooth activations) with linear property specification, if all ReLU activation patterns are known, LP computes exact bounds.

Practical observation: In practice, deep networks have many unstable ReLUs, so LP relaxation is rarely exact. The goal is to minimize looseness, not eliminate it entirely.

Connection to Convex Barrier

The triangle relaxation for ReLU is the optimal convex hull (see theoretical barriers). This means:

Theorem (LP Optimality for Single-Neuron ReLU): The triangle relaxation is the tightest possible convex linear relaxation of a single ReLU over any interval $[\underline{z}, \overline{z}]$ with $\underline{z} < 0 < \overline{z}$ .

Consequence: Standard LP-based verification using triangle relaxation cannot be improved for individual ReLUs without either:

Multi-neuron analysis (analyze multiple ReLUs jointly, e.g., PRIMA)
Non-linear relaxations (e.g., quadratic constraints in SDP)
Sacrificing soundness (incomplete methods that might miss violations)

LP-based verification hits the convex barrier---a fundamental limit on single-neuron linear relaxation tightness. This explains why methods like DeepPoly and CROWN (both using triangle relaxation) have similar tightness---they’re all at the barrier.

Complexity Analysis

Understanding the computational complexity of LP-based verification clarifies its scalability and practical applicability.

LP Solving Complexity

Worst-case complexity: Simplex algorithm is exponential in the worst case (known for decades). Interior-point methods are polynomial but with high degree.

Average-case complexity: Simplex is polynomial on average for most practical LPs. This is why it remains the dominant LP algorithm despite exponential worst-case.

Theorem (Interior-Point Complexity): Interior-point methods solve an LP with $m$ constraints and $n$ variables in:

O(\sqrt{n} \cdot (m + n) \cdot \text{barrier-iterations})

where barrier-iterations is typically $O(\log(1/\epsilon))$ for $\epsilon$ -optimal solution.

For neural network LPs with $n$ neurons per layer and $L$ layers:

Variables: $O(nL)$ (one per neuron)
Constraints: $O(nL)$ (linear layers + ReLU relaxations)
Complexity per LP: $O(n^{1.5} L^{1.5})$

For layer-by-layer solving: Solve $O(nL)$ LPs (one per neuron per layer), each with $O(n)$ variables:

\text{Total complexity} = O(nL) \times O(n^{1.5}) = O(n^{2.5} L)

This is polynomial in network size, unlike MILP (exponential) or SAT (exponential).

How Network Size Affects LP Size

Width scaling: For a network with $n$ neurons per layer:

Primal LP: $O(n^2)$ constraints (fully-connected layers have $n^2$ weights)
Dual LP: $O(n^2)$ variables

Doubling network width → 4x LP size → roughly 8x solving time (for interior-point).

Depth scaling: For a network with $L$ layers:

Monolithic LP (encoding full network): $O(nL)$ variables and constraints
Layer-by-layer LP: Each layer’s LP has $O(n)$ variables, solve $L$ times

Doubling depth → 2x more LPs to solve (layer-by-layer) or 2x LP size (monolithic).

Practical observation: Layer-by-layer scaling is better for deep networks. Monolithic LPs become intractable beyond ~10 layers.

Trade-offs: Tightness vs LP Size

Tighter relaxations require more constraints, increasing LP size:

Triangle relaxation: 3 constraints per unstable ReLU

y \geq 0, \quad y \geq z, \quad y \leq \frac{\overline{z}}{\overline{z} - \underline{z}}(z - \underline{z})

Multi-neuron relaxation (PRIMA): For $k$ neurons analyzed jointly, $O(2^k)$ constraints.

$k=2$ : 4x more constraints
$k=3$ : 8x more constraints
$k=4$ : 16x more constraints (quickly intractable)

SDP relaxation: $O(n^2)$ variables in semidefinite matrix, $O(n^6)$ solving complexity.

Tradeoff table:

Relaxation Type	Constraints per Neuron	Solving Complexity	Tightness
Triangle (single-neuron)	$O(1)$	$O(n^{2.5} L)$	Moderate
Multi-neuron ( $k=2$ )	$O(n)$	$O(n^{3.5} L)$	Tight
Multi-neuron ( $k=3$ )	$O(n^2)$	$O(n^{4.5} L)$	Very tight
SDP	$O(n^2)$ matrix	$O(n^6 L)$	Tightest

Practical strategy: Use single-neuron LP for large networks, multi-neuron for small critical regions, SDP only for tiny networks (fewer than 100 neurons).

Why Multi-Neuron LP is Exponentially Larger

Single-neuron LP: Each ReLU relaxed independently. For $n$ ReLUs, $O(n)$ constraints.

Multi-neuron LP: $k$ ReLUs analyzed jointly require encoding all possible activation patterns for those $k$ neurons.

$k$ ReLUs have $2^k$ activation patterns (each active/inactive)
Each pattern requires separate constraints to encode the corresponding linear region
Total constraints: $O(2^k \cdot \binom{n}{k})$ for all $k$ -tuples of neurons

Example: For $n=1000$ neurons and $k=3$ :

Number of 3-tuples: $\binom{1000}{3} \approx 1.66 \times 10^8$
Constraints per tuple: $2^3 = 8$
Total: ~1.3 billion constraints (completely intractable)

PRIMA’s solution: Don’t analyze all tuples. Use heuristics to select the most important $k$ -tuples (those with highest impact on final bounds), analyze only those.

Limitations of LP-Based Verification

Scalability: Each layer requires solving multiple LPs (one per neuron bound). For large networks, this becomes expensive despite polynomial-time LP solving.

Relaxation looseness: Triangle relaxation is conservative. For networks with many uncertain ReLUs, accumulated looseness reduces effectiveness.

Incomplete: LP-based verification is incomplete---it might return “unknown” when tighter methods (or complete methods) would verify.

Comparison with alternatives:

CROWN is often faster with similar tightness---backward propagation avoids explicit LP solving
Branch-and-bound with CROWN achieves better results than pure LP for difficult properties
For very tight bounds: Multi-neuron relaxations or SDP needed

When to Use LP-Based Verification

Use LP-based when:

Need tighter bounds than IBP/CROWN provide
Want explicit optimization formulation (for custom constraints)
Verifying small to medium networks where LP overhead is acceptable
Research context exploring LP-based techniques
Integrating with other LP-based tools/frameworks

Don’t use when:

Network is very large (millions of neurons)---CROWN or IBP better
Need complete verification---use SMT, MILP, or branch-and-bound instead
Speed is critical---IBP or CROWN faster
Need tightest possible bounds---SDP or multi-neuron relaxations tighter

Current Research Directions

Tighter relaxations: Beyond triangle relaxation---exploring non-polyhedral convex relaxations, learned relaxations.

Adaptive relaxation: Choose relaxation tightness per-layer based on impact on final bounds.

Integration with learning: Train networks that are easier to verify via LP (e.g., encouraging activations to be clearly active/inactive).

Hybrid LP-SDP: Combine LP efficiency with SDP tightness for critical neurons.

Final Thoughts

LP-based verification occupies a valuable middle ground: tighter than simple bound propagation, faster than complete methods. While CROWN and other specialized bound propagation methods often outperform generic LP approaches in practice, understanding LP formulations provides theoretical grounding for verification.

The LP perspective clarifies what bound propagation methods are actually doing: they’re solving relaxations of the verification problem, trading exactness for tractability. Whether you use explicit LP solvers or specialized algorithms like CROWN, the underlying principles---convex relaxation, sound approximation, optimization over feasible regions---remain the same.