Your company probably has a lot of data. When you expose all of these
different sources under a tool that makes complex analysis as fast as
thought, you'll create a load of opportunities to make data-driven
decisions.
By sharing an example where 2hrs of analysis helped prioritse 2-4 weeks of
engineering work, I'm going to try convincing you that the value of a
connected dataset is far more than the sum of its parts.
Continue reading
Tips-and-tricks to better handle incidents, learned over years of dealing
with production issues. Included are opinions on strategy, process, tools
and how to handle the all-important human element.
Read this if you're new to incident response and want a starter-pack of
advice, or to contrast your own perspective with another.
Continue reading
As a team's infrastructure estate grows, it becomes increasingly beneficial
to create a global registry of all people, services, and components. Once
you do, you can integrate with tools like terraform, Chef, and Kubernetes to
help provision your infrastructure according to a single authoritative
source.
This post explains how GoCardless built their registry, and some of the uses
we’ve put it to.
Continue reading
Most Prometheus metrics recording durations are subject to a
time-of-measurement bias, causing misleading graphs that can derail
investigations. See how an open-source Tracer can help solve this problem.
Continue reading
This post covers the implementation of pgreplay-go, a tool to realistically
simulate captured Postgres traffic. I'll explain why existing tools didn't
fit and explain some challenges in the implementation, focusing on what I
learned personally from the process.
Continue reading