Why change role to AI Engineering

April 10, 2025

What does switching to AI Engineering actually mean for your career? Drawing from my experience pivoting to SRE, I offer an honest assessment of the opportunities and challenges for software engineers considering this move—from high-impact, high-pressure work to the realities of slower progress and less visible product surface.

AI Innovator's Dilemma

March 13, 2025

Working in AI today, I'm seeing the innovator's dilemma play out in real time. While larger companies carefully plan deployments that work for their entire customer base, smaller teams like ours can ship, learn, and improve our AI products through actual usage. This isn't just about moving faster—it's about fundamental advantages in how AI products develop that favor startups, regardless of the resources incumbents can deploy. The dynamics surprised me, and they might surprise you too.

You don't need Python to build AI products

February 16, 2025

I've met teams who switched to Python just to build AI features, abandoning their normal stack for the ecosystem. But it's really not worth it! At incident.io we stuck with Go and it's been great - turns out static typing and proper concurrency are exactly what you want when building AI systems, provided you build some nice abstractions to go with it.

Beyond the AI MVP: What it really takes

February 1, 2025

The gap between demo-ready AI products and production-grade systems is much larger than most realise. This post explains the four stages of AI product maturity, what tooling you actually need to build reliable AI systems, and how to recognise if you're stuck in the MVP trap.

Looking back at 2024

December 31, 2024

From reliability engineering to wrestling with LLMs, my fourth year at incident.io pushed me harder than I'd expected. We launched On-call, weathered some tough times as a team, and I ended the year diving fully into AI.

When Game Days go wrong

December 8, 2024

A story about how incident response training went wrong, with valuable lessons about pod priorities, isolation, and the importance of a healthy incident response culture.

Learn one thing at a time

April 26, 2024

Of the mental models and rules I use in my life, by far the most useful is to learn only one thing at any given time.

Looking back at 2023

December 29, 2023

My reflections on 2023, now my second full year at incident.io. Doubled the team this year (34 to 77), launched Status Pages and Catalog, and spent the last six months building a really exciting new product.

Adding concurrency control to HTTP APIs

October 14, 2023

Whenever a system has access to a consistent store, you can extend that consistency through compare-and-swap to the system's users. This post shows how you can add CAS to an HTTP API using example code and real-world examples.

Use your database to power state machines

September 16, 2023

If you build a state machine on top of a relational database you can abstract concurrency problems away from your business logic and allow developers to write safe-by-default code without dealing with concurrency concerns.

This post explains how to build a library that offers those protections, and how they work under-the-hood.