Topics
- Managing incidents in highly regulated environments (FinTech).
- Penalties for downtime requiring rigor and discipline in response.
- incident.io’s evolution: monolithic Go binary on Heroku → GCP with native security primitives (Secret Manager, Kubernetes).
- Dynamically generating ephemeral runbooks by crawling GitHub PRs, Slack, telemetry, and past post-mortems.
- Technical challenges of using RAG (Retrieval-Augmented Generation) for incident context.
Context
Recorded in the wake of one of the worst AWS incidents in history. Timely conversation about resilience and incident response.