My Research Journey

How I Got Here

Here’s the thing: we’re putting neural networks everywhere. Self-driving cars. Medical diagnosis systems. Financial trading. Even systems that decide who gets a loan or bail. And honestly? Most of the time, we have no idea what these networks are actually doing. Sure, the math works. The accuracy looks great on test sets. But can we prove these systems won’t make catastrophic mistakes? Usually not.

That’s what got me into verification research. I kept seeing this gap between “this model works really well in practice” and “can we guarantee it’s safe?” It’s like the difference between a bridge that seems sturdy and one that an engineer has actually certified won’t collapse. For software controlling cars or diagnosing diseases, “seems sturdy” isn’t good enough.

The Problem

We’re in this weird situation where AI deployment is moving way faster than our ability to verify it. Companies are racing to put neural networks into production—and I get it, the technology is amazing—but we’re basically crossing our fingers and hoping nothing goes wrong. It’s like building skyscrapers without structural engineers.

The challenge isn’t just academic hand-wringing. When a neural network misclassifies a stop sign as a speed limit sign, that’s not a cute bug—it’s potentially fatal. When a medical AI misses a tumor, someone doesn’t get treatment they need. When a loan approval system discriminates unfairly, real people’s lives are affected. And right now, we don’t have great ways to prove these things won’t happen before we deploy.

Research Focus

My research is about building practical tools that let us formally verify neural networks. Not just “test it a lot and see what happens,” but actually prove mathematical guarantees about what a network can and can’t do. The goal is simple: make it possible for practitioners to deploy AI systems with confidence, backed by real guarantees instead of just hope.

What I’m Working On

🎯 Making AI Systems You Can Actually Trust

Think of neural networks like a chain of mathematical functions stacked on top of each other. Data goes in one end, gets transformed by each layer, and predictions come out the other end. The tricky part is that most of these transformations (we call them activation functions) are non-linear—they curve and bend in ways that make them hard to reason about mathematically.

The traditional approach treats these functions like black boxes. We don’t really know what they’ll do, so we just… run them and see. I’m working on something different: wrapping tight geometric boundaries around what these functions can possibly do. It’s like putting a safety fence around the function’s behavior, but making that fence as tight as possible so we don’t lose precision.

Featured Tools

I built WraLU and WraAct that do exactly this. Instead of generic, loose boundaries that don’t tell you much, these tools exploit the specific geometry of activation functions (ReLU, Sigmoid, Tanh, etc.) to construct minimal constraint sets. For the technical folks: we’re using convex hull approximation to get polytopes that are orders of magnitude tighter than standard relaxations. But the practical upshot is simple: better bounds with less computation.

Turns out mathematical elegance and actually working in practice don’t have to be enemies. WraLU has verified networks with tens of thousands of neurons—modern ResNets and CNNs that other methods choke on. And it’s fast enough that you can actually use it in a development workflow instead of waiting days for results.

⚡ Verification That Scales

Here’s the frustrating part about neural network verification: you can either do it perfectly and wait three weeks for your computer to finish, or you can get an answer fast but maybe miss something important. It’s the classic security audit problem—thorough but slow, or quick but incomplete.

I’m trying to find ways to get both. Or at least, to intelligently decide when to use which approach. Some ideas I’m exploring:

  • Adaptive bound propagation: Start with fast approximations, then refine only where you need precision

  • GPU-accelerated verification: Using auto-differentiation frameworks (such as PyTorch) to make verification run on the same hardware that trains networks

  • Compositional verification: Break big networks into pieces, verify each piece separately, then combine the results

  • Hybrid approaches: Mix deterministic guarantees for critical parts with probabilistic certification for less critical components

The goal is making formal verification practical not just for toy academic benchmarks, but for the diverse, messy architectures people actually use in industry. Because verification research is pretty useless if it only works on networks from 2015.

🔧 Building Tools People Actually Use

Look, verification research is worthless if nobody uses it. I can publish elegant theory papers all day, but if practitioners can’t actually apply the techniques, what’s the point?

That’s why I’m focused on the broader ecosystem of trustworthy AI, not just pure verification:

  • Certified adversarial training: Methods that bake robustness into networks during training, using randomized smoothing and formal training procedures. It’s way easier to verify a network that was designed to be verifiable.

  • Formal specifications: Translating domain requirements into mathematical properties we can actually check. Like, when you’re building a medical AI system, you need to specify “this model should never miss a malignant tumor with more than 1% probability in the critical category.” Then we can verify that property formally.

  • Real-world applications: Working with actual safety-critical systems—autonomous vehicles, medical AI, cyber-physical systems—to understand what guarantees they actually need (not what academic papers say they should need).

  • Practical tooling: Open-source libraries like wraact, propdag, and torchvnnlib that let people integrate verification into their existing PyTorch workflows without learning a whole new ecosystem.

The conviction underlying all this: verification should guide training, not just validate results after the fact. If we build networks with verification in mind from the start, we can get both better performance and better guarantees.

The Hard Parts

Of course, none of this is easy. Here are the things that make this research genuinely hard (and keep me up at night):

🔄 The Curse of Approximation

Every verification method involves some approximation—we can’t perfectly capture what a neural network does without basically re-running it. And as networks get deeper, approximation errors compound exponentially. It’s like playing telephone with a hundred people: by the end, the message is barely recognizable.

I’ve made progress with tighter convex approximations, but there’s a fundamental limit to how much bound propagation can scale. We might need entirely new approaches for really deep networks.

🏗️ Modern Architecture Chaos

Verification was born in the era of simple feedforward networks. Now we’ve got Transformers with attention mechanisms, Graph Neural Networks with complex message passing, Vision Transformers with patches and embeddings, hybrid architectures that do weird things I didn’t even know were possible.

And don’t even get me started on Transformers—those attention matrices are a verification nightmare. The field needs architecture-specific solutions that actually understand how these modern networks work.

⚖️ Scalability-Completeness Tradeoff

Complete verification gives you 100% certainty but doesn’t scale past small networks. Incomplete methods scale to big networks but you might miss vulnerabilities. It’s like choosing between a full code review by security experts (takes forever, finds everything) or automated scanning (fast, might miss stuff).

We need smarter hybrid approaches that allocate computational budget intelligently—maybe fast approximate methods for initial screening, complete verification for critical samples, and probabilistic certification where deterministic guarantees are too expensive.

🌉 Bridging Theory and Practice

Academic benchmarks don’t reflect real-world constraints. Production networks operate under latency budgets, memory constraints, and continuous update cycles. You can’t just halt deployment for three days to verify your model.

Verification needs to become a runtime concern, not a post-hoc validation step. This requires fundamentally rethinking certification—not as a one-time check before deployment, but as an ongoing guarantee maintained throughout the system’s lifecycle.

And honestly? Even when we build mature verification tools, people don’t use them enough. Adoption is as much about documentation, standards, and making things easy as it is about algorithmic breakthroughs. That’s why I write educational content alongside research—the best verification method in the world is useless if practitioners don’t know how to apply it or why it matters.

What I’ve Built So Far

So what do I have to show for all this? A few things I’m pretty proud of:

🚀 WraLU and WraAct

Fast, precise convex hull approximation tools that outperform existing methods by orders of magnitude on efficiency while improving precision. They’ve been used to verify real production networks, not just academic toy examples.

The thing that excites me most: practitioners actually use these tools in their workflows.

[WraLU] | [WraAct]

📚 Educational Content

I maintain a comprehensive A Guide to NNV making neural network verification accessible to people who aren’t formal methods experts.

Because honestly, this field has a serious jargon problem, and if we want widespread adoption, we need to explain things in ways normal software engineers can understand.

🔧 Open-Source Tooling

Libraries like wraact, propdag, torchvnnlib that integrate into existing PyTorch ecosystems.

I built these because practitioners kept asking “how do I actually use verification in my ML pipeline?” Now they can just pip install and go.

📄 Research Publications

Papers in top venues (POPL, OOPSLA, CCS, S&P) demonstrating impact across verification, security, and programming languages.

But beyond publication metrics, what matters is whether the ideas actually move the field forward and help people build safer AI.

The real goal, though, isn’t just papers or tools. It’s building a community of verification-aware practitioners who can responsibly deploy AI in safety-critical systems. We’re not there yet, but we’re making progress.

Where This Is All Going

Here’s what I’m really excited about for the future:

🤖 Learning-Based Verification

Using neural networks to help verify other neural networks. Sounds circular, but there’s real potential here—networks that predict verification outcomes to guide search, or learn to refine bounds more intelligently than hand-crafted heuristics. It’s still early, but I think there’s something there.

🔗 Symbolic Reasoning at Scale

We’ve got powerful symbolic methods (SAT/SMT solvers, MILP) that don’t scale, and machine learning that scales but isn’t interpretable. What if we combine them in smarter ways? Use learning to guide symbolic search, or symbolic methods to provide guarantees for learned components.

⏱️ Real-Time Verification

Continuous certification during deployment with dynamic adaptation. Imagine a self-driving car that’s constantly re-verifying its own safety guarantees as the environment changes. We’re nowhere near this yet, but that’s where it needs to go for truly safety-critical deployment.

🌍 Beyond Robustness

Most verification work focuses on adversarial robustness—can an attacker fool the network? But we also need formal methods for fairness, equity, privacy, and other critical properties. The techniques generalize, but the specifications are harder to formalize.

🎓 Verification-Aware Training

Making verification part of the optimization objective from day one. Not “train a network, then try to verify it,” but “train a network while ensuring it will be verifiable.” This might mean accepting slightly lower accuracy for dramatically better guarantees—a tradeoff that’s worth it for safety-critical systems.

My bet is that in ten years, formal verification will be as standard in ML development as unit testing is in software engineering. We’re not there yet—not even close. But the technical foundations are solid, practical tools are emerging, and the real-world demand is only growing. The challenge now is making these techniques accessible, scalable, and actually usable by the people building AI systems.

That’s the vision, anyway. Ask me in ten years if we got there.

Learn More

Interested in neural network verification? Check out the A Guide to NNV for comprehensive tutorials covering foundational concepts to cutting-edge research techniques. Or explore WraLU and WraAct to see the tools in action.