Bound Propagation Approaches

Neural network verification is an exciting field and it essentially frames the verification problem as an optimization problem. The core idea? Estimating the possible output range of a neural network given an input range. More precisely, we deal with over-approximation, meaning that our estimated output range must fully contain the actual range.

But it doesn’t stop at the output---this property must hold for every neuron, including those in the intermediate layers. Any approach that guarantees this property is called sound, meaning it maintains soundness. Today, instead of diving deep into verification theory, let’s focus on one of the most powerful techniques in this field: bound propagation.

Bound Propagation

Bound propagation methods form the foundation of scalable neural network verification. These approaches compute over-approximations of the reachable output range by propagating bounds layer by layer through the network.

Interval Bound Propagation (IBP)

Interval Bound Propagation (IBP) is the most straightforward and efficient bound propagation approach, though it produces looser bounds than more sophisticated methods.

Definition: Interval Bound Propagation

Given perturbed input region $B_{p,\epsilon}(x_0)$ , for the first layer we have $z_1 = x \in [x_0 - \epsilon, x_0 + \epsilon]$ .

Let $[l_i, u_i]$ represent the numerical interval for layer $z_i$ . We derive $[l_{i+1}, u_{i+1}]$ from $[l_i, u_i]$ as follows:

For the affine transformation $\hat{z}_{i+1} = W_i z_i + b_i$ , the bounds are:
$\hat{l}_{i+1} = W_i^+ l_i + W_i^- u_i + b_i$ $\hat{u}_{i+1} = W_i^+ u_i + W_i^- l_i + b_i$
where $W^+$ sets negative elements to 0, and $W^-$ sets positive elements to 0.

For ReLU activation $z_{i+1} = \text{ReLU}(\hat{z}_{i+1})$ :
$l_{i+1} = \max\{\hat{l}_{i+1}, 0\}, \quad u_{i+1} = \max\{\hat{u}_{i+1}, 0\}$

IBP performs only four matrix-vector products per layer, making it extremely scalable. However, for normally-trained DNNs, IBP often yields trivial bounds due to its looseness. Interestingly, for models specifically trained with IBP, the certified robustness can approach state-of-the-art results.

Zonotope Abstraction

Instead of intervals, we can use zonotopes, a geometric abstraction commonly used in affine arithmetic. Zonotopes represent sets as affine combinations of generator vectors, allowing for more precise tracking of dependencies between variables. This approach originates from abstract interpretation and has been widely adopted in verification, though it comes with higher memory costs compared to simpler methods.

Formal Zonotope Definition and Theory

Definition (Zonotope): A zonotope in $\mathbb{R}^n$ is a set of the form:

Z = \left\{ c + \sum_{i=1}^p \epsilon_i g_i : \epsilon_i \in [-1, 1] \right\}

where $c \in \mathbb{R}^n$ is the center and $g_1, \ldots, g_p \in \mathbb{R}^n$ are generator vectors.

Compact notation: $Z = \langle c, G \rangle$ where $G = [g_1 \mid g_2 \mid \cdots \mid g_p] \in \mathbb{R}^{n \times p}$ is the generator matrix.

Geometric interpretation: A zonotope is the Minkowski sum of line segments. It’s a centrally symmetric convex polytope. In 2D, zonotopes are parallelogons (special polygons). In 3D, they’re parallelohedra.

Zonotope Abstract Domain Theory

As an abstract domain (see abstract interpretation section above), zonotopes provide a sweet spot between intervals and general polyhedra:

Abstract Elements:

\mathcal{D}_{\text{zonotope}}^{\#} = \{ \langle c, G \rangle : c \in \mathbb{R}^n, G \in \mathbb{R}^{n \times p}, p \in \mathbb{N} \}

Concretization:

\gamma(\langle c, G \rangle) = \{ c + G\epsilon : \epsilon \in [-1, 1]^p \}

Abstract Transformers:

Affine transformation $y = Wx + b$ :
$\langle c, G \rangle \xrightarrow{W, b} \langle Wc + b, WG \rangle$
Exact! No over-approximation for affine functions. This is zonotope’s key advantage over intervals.
ReLU transformation $y = \max(0, x)$ :

For neuron $i$ with bounds $[l_i, u_i]$ :
- Case 1 ( $l_i \geq 0$ ): Active, $y_i = x_i$ (exact)
- Case 2 ( $u_i \leq 0$ ): Inactive, $y_i = 0$ (exact)
- Case 3 ( $l_i < 0 < u_i$ ): Unstable, over-approximate with new generators
For unstable case, zonotope uses triangle relaxation similar to polyhedra, but encodes it using generator representation.

Comparison with Polyhedra

Zonotopes vs. Polyhedra (DeepPoly, CROWN) for neural network verification:

Property	Zonotopes	Polyhedra (Symbolic Bounds)
Representation	Center + generator matrix	Linear inequalities per neuron
Affine exactness	Yes (exact transformation)	Yes (exact composition)
ReLU handling	New generators for unstable	Triangle relaxation constraints
Memory per neuron	$O(p)$ (p generators)	$O(n)$ (n symbolic coefficients)
Time complexity	$O(np)$ per layer	$O(n^2)$ per layer (backward)
Tightness	Slightly looser than polyhedra	Tighter (can express arbitrary linear constraints)

Theoretical Analysis: Why Zonotopes are Slightly Looser

Theorem: For the same ReLU network, polyhedra abstraction (DeepPoly) produces tighter or equal bounds compared to zonotope abstraction.

Reason:

Zonotopes are a strict subset of general polyhedra:

Zonotope: Centrally symmetric polytope with specific structure (Minkowski sum of segments)
Polyhedra: Arbitrary convex polytopes (intersection of half-spaces)

For unstable ReLUs, polyhedra can express the exact triangle relaxation with arbitrary linear constraints. Zonotopes must approximate the triangle using their generator representation, which may introduce additional over-approximation.

Example: Consider a single unstable ReLU with $l = -1, u = 1$ . The triangle relaxation requires expressing:

0 \leq y \leq \frac{1}{2}(x + 1)

Polyhedra represent this exactly. Zonotopes must add generators that over-approximate this region, potentially including points outside the triangle.

When Zonotopes Excel:

Despite slight looseness, zonotopes have advantages:

Affine exactness without symbolic tracking: Direct matrix operations (no symbolic expression management)
Memory predictability: Generator count $p$ grows predictably
Parallelizability: Generator operations are highly parallelizable (GPU-friendly)

Practical Tradeoff:

Zonotopes: Slightly looser than DeepPoly, but simpler implementation and memory management
DeepPoly/CROWN: Tighter, but requires careful symbolic bound management

This explains the observation that zonotopes achieve “similar tightness” to backward symbolic propagation with “high memory costs”---the similarity is due to both using linear relaxations, but zonotope’s generator representation has higher memory overhead than sparse symbolic bounds.

Polyhedra Abstraction: CROWN and DeepPoly

Polyhedra abstraction-based approaches represent a significant advancement over IBP. Methods like Fast-Lin, CROWN, and DeepPoly replace the two lower bounds in the ReLU polytope with a single linear lower bound, resulting in one lower and one upper bound per neuron.

The key advantages:

Scalability: Linear bounds propagate through layers efficiently without solving LP problems
Tightness: Maintains interactions between components, typically tighter than IBP
Generality: Supports general activation functions beyond ReLU

These approaches compute polyhedra domain abstraction for DNNs. Rather than using explicit numerical intervals, they track symbolic expressions (typically linear) that represent bounds in terms of input variables.

The Convex Barrier:

An important theoretical result proves the convex barrier: all linear inequality-based verification approaches cannot be tighter than linear programming-based approaches. This establishes a fundamental limit on the tightness achievable by methods like CROWN and DeepPoly.

Abstract Interpretation Theory: Formal Foundation

The bound propagation methods above---IBP, zonotopes, and polyhedra---are all instances of abstract interpretation, a formal framework from program analysis. Understanding this theoretical foundation clarifies why these methods work and what guarantees they provide.

Formal Abstract Domain Definition:

An abstract domain $\mathcal{D}^{\#}$ is a mathematical structure that represents sets of concrete values in an over-approximated form.

Definition (Abstract Domain for Neural Networks): For a neural network layer with $n$ neurons, an abstract domain $\mathcal{D}^{\#}$ provides:

Abstract elements $a^{\#} \in \mathcal{D}^{\#}$ representing sets of concrete values
Concretization function $\gamma: \mathcal{D}^{\#} \rightarrow \mathcal{P}(\mathbb{R}^n)$ mapping abstract elements to sets of concrete values
Abstract transformers $f^{\#}: \mathcal{D}^{\#} \rightarrow \mathcal{D}^{\#}$ for each neural network operation $f$

Soundness Requirement:

\forall a^{\#} \in \mathcal{D}^{\#}, \forall x \in \gamma(a^{\#}): f(x) \in \gamma(f^{\#}(a^{\#}))

This means: if we apply the abstract transformer $f^{\#}$ to an abstract element $a^{\#}$ , the result must over-approximate all possible concrete outputs.

Examples of Abstract Domains:

Interval Domain (IBP):

Abstract elements: $a^{\#} = [l, u]$ (lower and upper bounds per dimension)
Concretization: $\gamma([l, u]) = \{x \in \mathbb{R}^n : l_i \leq x_i \leq u_i, \forall i\}$
Abstract affine transformer: $[l', u'] = W^+ [l] + W^- [u] + b$ (as in IBP definition)
Abstract ReLU transformer: $[\max(l, 0), \max(u, 0)]$

Zonotope Domain:

Abstract elements: $a^{\#} = \langle c, G \rangle$ where $c \in \mathbb{R}^n$ is center, $G \in \mathbb{R}^{n \times p}$ is generator matrix
Concretization: $\gamma(\langle c, G \rangle) = \{c + G\epsilon : \epsilon \in [-1, 1]^p\}$
Abstract affine transformer: $\langle Wc + b, WG \rangle$ (exact for affine functions!)
Abstract ReLU transformer: Over-approximation using new generators

Polyhedra Domain (DeepPoly, CROWN):

Abstract elements: $a^{\#}$ $a^{#}$ = symbolic linear bounds per neuron
- Lower: $a_i \geq \underline{W}_i x + \underline{b}_i$
- Upper: $a_i \leq \overline{W}_i x + \overline{b}_i$
Concretization: $\gamma(a^{\#})$ = all vectors satisfying the linear constraints
Abstract transformers: Compose linear bounds through layers

Galois Connections and Soundness Guarantees

The abstract interpretation framework guarantees soundness through Galois connections.

Definition (Galois Connection): For concrete domain $\mathcal{C} = \mathcal{P}(\mathbb{R}^n)$ and abstract domain $\mathcal{D}^{\#}$ , a Galois connection consists of:

Abstraction function $\alpha: \mathcal{C} \rightarrow \mathcal{D}^{\#}$ (maps concrete sets to abstract elements)
Concretization function $\gamma: \mathcal{D}^{\#} \rightarrow \mathcal{C}$ (maps abstract elements to concrete sets)

satisfying:

\forall C \in \mathcal{C}, \forall a^{\#} \in \mathcal{D}^{\#}: \alpha(C) \sqsubseteq a^{\#} \Leftrightarrow C \subseteq \gamma(a^{\#})

Soundness Theorem (Abstract Interpretation for Neural Networks):

If abstract transformers $f^{\#}$ satisfy:

f(\gamma(a^{\#})) \subseteq \gamma(f^{\#}(a^{\#}))

for all network operations $f$ , then the composed abstract transformers through all layers produce sound over-approximations of the true output range.

Proof Sketch:

By induction on layers:

Base case: Input abstraction $a_0^{\#}$ satisfies $X \subseteq \gamma(a_0^{\#})$ by construction
Inductive step: If layer $i$ satisfies $Z_i \subseteq \gamma(a_i^{\#})$ , then:
$\begin{aligned} Z_{i+1} &= f_{i+1}(Z_i) \\ &\subseteq f_{i+1}(\gamma(a_i^{\#})) \quad \text{(by inductive hypothesis)} \\ &\subseteq \gamma(f_{i+1}^{\#}(a_i^{\#})) \quad \text{(by abstract transformer soundness)} \\ &= \gamma(a_{i+1}^{\#}) \end{aligned}$
Conclusion: Output abstraction $a_L^{\#}$ satisfies $Y \subseteq \gamma(a_L^{\#})$ (soundness)

Tightness Ordering of Abstract Domains:

Abstract domains can be ordered by tightness (how close the over-approximation is to the true set):

\mathcal{D}_1^{\#} \sqsubseteq \mathcal{D}_2^{\#} \quad \text{if} \quad \gamma_1(a_1^{\#}) \subseteq \gamma_2(\alpha_2(\gamma_1(a_1^{\#})))

Tightness hierarchy for neural network verification:

\text{Interval} \sqsubseteq \text{Zonotope} \sqsubseteq \text{Polyhedra} \sqsubseteq \text{Exact}

This hierarchy (from theoretical barriers) shows that polyhedra (DeepPoly, CROWN) are tighter than zonotopes, which are tighter than intervals (IBP).

Why Abstract Interpretation Guarantees Soundness:

The Galois connection framework ensures that:

Compositional soundness: Composing sound abstract transformers yields sound overall verification
No false negatives: If abstract domain says “safe”, the concrete network is truly safe
Possible false positives: Abstract domain may say “unsafe” even if concrete network is safe (incompleteness)

This is the soundness-completeness tradeoff: abstract interpretation sacrifices completeness (may fail to verify true properties) to gain tractability (polynomial-time verification).

However, all these methods share one characteristic: they propagate bounds forward. What if we could achieve even tighter approximations? That’s where back-substitution comes into play!

Bound Propagation with Back-Substitution

Bound propagation with back-substitution extends symbolic bound propagation, but instead of moving forward, it works backward---hence the name backward bound propagation.

Forward vs. Backward Bound Propagation

Forward: Start with input variables and propagate bounds layer by layer toward the output.
Backward: Start with the output variables and back-propagate their symbolic expressions toward the input.

This shift in direction allows us to refine bounds at earlier layers, leading to tighter estimates. But to make it work, we need one crucial ingredient: relaxation of nonlinear functions.

Handling Nonlinearities

Neural networks contain nonlinear functions, such as activation functions. Since bound propagation relies on linear expressions, we need to approximate these nonlinear functions with two linear bounds---one upper and one lower. When propagating bounds, we carefully choose which bound to apply based on weight signs to maintain the soundness property.

Fast-Lin and CROWN:

Fast-Lin introduced efficient backward bound propagation for ReLU networks. CROWN extended this to general activation functions and introduced optimizable linear bounds with learnable parameters, laying the foundation for modern state-of-the-art verifiers like alpha-beta-CROWN.

DeepPoly:

DeepPoly formulated bound propagation using abstract interpretation theory. While CROWN and DeepPoly are conceptually similar, DeepPoly updates bounds during back-substitution at each layer, whereas CROWN back-substitutes all the way to the input layer. This subtle difference can improve tightness for certain architectures (e.g., networks with adjacent nonlinear layers like MaxPool after ReLU).

Symbolic Dependency Tracking: Theoretical Advantage

The key advantage of polyhedra-based methods (DeepPoly, CROWN) over intervals (IBP) is symbolic dependency tracking. Understanding this theoretically clarifies why backward propagation is tighter.

Problem with Interval Arithmetic: Dependency Loss

Interval arithmetic (IBP) treats each neuron independently:

x_i \in [l_i, u_i], \quad x_j \in [l_j, u_j]

When computing $x_i + x_j$ , IBP produces:

x_i + x_j \in [l_i + l_j, u_i + u_j]

Issue: If $x_i$ and $x_j$ depend on the same input variable (e.g., both derived from $x_0$ ), IBP ignores this correlation and over-approximates.

Example (Dependency Loss):

Consider $x_0 \in [-1, 1]$ , and two neurons:

$x_1 = x_0$
$x_2 = x_0$

True range: $x_1 - x_2 = x_0 - x_0 = 0$ (always zero!)

IBP computes:

x_1 - x_2 \in [-1, 1] - [-1, 1] = [-2, 2]

Error: IBP’s over-approximation is 2x wider than the true range. This error compounds exponentially through layers (see theoretical barriers dependency loss theorem).

Solution: Symbolic Bounds (DeepPoly, CROWN)

Instead of concrete intervals, maintain symbolic expressions in terms of input variables:

\begin{aligned} x_1 &\in [\underline{W}_1 x_0 + \underline{b}_1, \overline{W}_1 x_0 + \overline{b}_1] \\ x_2 &\in [\underline{W}_2 x_0 + \underline{b}_2, \overline{W}_2 x_0 + \overline{b}_2] \end{aligned}

When computing $x_1 - x_2$ , substitute symbolically:

x_1 - x_2 \in [(\underline{W}_1 - \overline{W}_2) x_0 + (\underline{b}_1 - \overline{b}_2), (\overline{W}_1 - \underline{W}_2) x_0 + (\overline{b}_1 - \underline{b}_2)]

For our example ( $x_1 = x_2 = x_0$ ):

x_1 - x_2 = (1 - 1) x_0 = 0 \quad \text{(exact!)}

Theorem (Symbolic Bounds Preserve Dependencies):

For a network with $L$ layers, symbolic bound propagation (DeepPoly, CROWN) maintains exact affine dependencies through all layers. Dependencies are lost only at ReLU relaxations.

Proof Sketch:

Affine layers: Composition of linear functions is linear → symbolic bounds remain exact
ReLU layers: Triangle relaxation introduces over-approximation, but symbolic form is preserved
Composition: Dependency loss occurs only at ReLU relaxations, not during affine propagation

Consequence: Interval arithmetic loses dependency information at every layer (even affine layers), while symbolic bounds lose information only at ReLU relaxations. This explains why symbolic methods are exponentially tighter for deep networks.

Complexity Trade-offs

Symbolic dependency tracking comes at computational cost:

Method	Dependency Tracking	Memory per Neuron	Computation per Layer	Tightness
IBP	None (concrete intervals)	$O(1)$ (2 scalars: l, u)	$O(n)$ (matrix-vector)	Loose (exponential loss)
DeepPoly/CROWN	Full (symbolic expressions)	$O(d)$ (d input dimensions)	$O(n \cdot d)$ (matrix-matrix)	Tight (loss only at ReLU)

Backward Propagation and Refinement

DeepPoly’s layer-by-layer back-substitution provides additional refinement opportunities:

Forward-only: Bounds at layer $i$ depend only on layers $1, \ldots, i-1$

Backward with refinement: Bounds at layer $i$ can be refined using tighter bounds from later layers

Algorithm (Iterative Backward Refinement):

Forward pass: Compute initial bounds layer-by-layer
Backward pass: Back-substitute from output to input
Refinement: At each layer during backward pass, use the tighter bounds from forward pass to refine symbolic expressions
Iterate: Repeat until convergence (bounds stop improving)

Why this works: Later layers might have tighter bounds due to specific network structure. Back-propagating these tighter bounds can improve early layer estimates.

Theorem (Convergence of Iterative Refinement):

For a finite network with ReLU activations, iterative backward refinement converges to a fixed point in at most $O(L \cdot n)$ iterations, where $L$ is depth and $n$ is maximum width.

Practical Reality: Typically 1-3 iterations suffice. Diminishing returns after that.

Backtracking Theory

Advanced verifiers use backtracking to selectively refine bounds:

Identify critical neurons: Neurons with widest unstable bounds (greatest impact on output)
Refine selectively: Back-substitute only for critical neurons, not all
Trade-off: Reduced computation while maintaining most tightness gains

This explains the difference between CROWN and DeepPoly:

CROWN: Back-substitutes all the way to input (full dependency tracking, simpler algorithm)
DeepPoly: Refines at each layer (potentially tighter, more complex, better for specific architectures like adjacent non-linear layers)

Both are theoretically sound and asymptotically similar in tightness. The practical choice depends on network architecture and computational budget.

Why Use Bound Propagation?

Efficiency: Bound propagation is fast, thanks to parallel computing. Most operations are basic linear algebra operations, which are heavily optimized in modern libraries and can be accelerated using GPUs.
Scalability: Unlike other verification approaches that require solving complex optimization problems, bound propagation works efficiently on large networks.

Comparing Different Bound Propagation Methods

Bound propagation is a powerful tool in neural network verification. While interval arithmetic is simple and fast, it’s often too loose to be practical. More advanced techniques like zonotope-based and symbolic methods offer much better approximations, with backward symbolic bound propagation being among the tightest available methods. There’s no formal proof ranking these methods in terms of tightness, but we have some general observations:

Interval arithmetic is the most conservative.
Forward symbolic bound propagation is tighter than interval arithmetic but still somewhat conservative.
Zonotope-based approaches can sometimes achieve similar tightness to backward symbolic bound propagation because it require parallel lower and upper linear bounds, but they come with high memory costs.
Backward symbolic bound propagation generally provides the tightest bounds, though it has higher computational complexity.

Backward symbolic bound propagation is generally the most accurate, but it comes at the cost of increased computational effort due to repeated back-substitutions.

Approach	Tightness	Memory Usage	Time Complexity (for layer number)
Interval Arithmetic	Loose	Low	Fast (Linear)
Forward Symbolic	Moderate	Low	Fast (Linear)
Zonotope	Good	High	Fast (Linear)
Backward Symbolic	Best	Moderate	Slower (Quadratic)

Final Thoughts

Is there any proof to guarantee the tightness of the bound propagation?

Unfortunately, there is no universal proof that one bound propagation method will always produce tighter bounds than another across all networks and input domains. This is because:

Network-specific behavior: Different network architectures (varying depth, width, activation function types) interact with bound propagation methods differently. A method that excels on shallow networks with ReLUs may perform poorly on networks with mixed activation functions.
Input domain dependency: The tightness of bounds can vary significantly depending on the input region being verified. A method may be tight for some inputs but loose for others.
Approximation trade-offs: The linear relaxations of nonlinear activation functions themselves are approximations. The quality of these relaxations depends on how well the linear bounds capture the nonlinear function’s behavior in the input region.

However, in practice, we have strong empirical evidence that backward symbolic bound propagation methods (CROWN, ab-CROWN) outperform simpler approaches on most benchmarks. The theoretical guarantee comes from soundness, not tightness---all these methods guarantee that the approximated bounds contain the true output range, but they don’t guarantee how close they are.

Choosing a Method

If you’re implementing or using a verification tool, here’s practical guidance:

For prototyping or educational purposes: Use interval arithmetic for simplicity.
For reasonable accuracy on large networks: Use forward symbolic bound propagation.
For best results on networks you care about: Use backward symbolic bound propagation (CROWN-based methods like ab-CROWN).

The choice ultimately depends on your specific requirements: Do you need maximum accuracy, or is a fast approximate solution sufficient? Is your network small enough to handle the computational cost of tighter methods?

Looking Forward

Bound propagation continues to evolve. Recent advances include:

Multi-neuron relaxation that considers multiple neurons together for tighter bounds
Adaptive methods that choose between different relaxation techniques dynamically
Learning-based approaches that optimize linear bounds during verification
GPU acceleration that makes even tighter methods practical for larger networks
Coupling with branch-and-bound for complete verification

Whether you’re using these methods for robustness certification, formal verification, or trustworthiness analysis, understanding the tradeoffs between different bound propagation approaches is essential for making informed choices about your verification strategy.