Getting to the last-mile with our investigations system and digging into message format, how we package the ‘RCA’ and next-steps we propose to responders, etc.
Wanted to share an example of the differences between Sonnet 3.7 and GPT-4.1 in formatting the investigation message.
Things to note are:
- Sonnet 3.7 is much more concise than GPT-4.1, and if you look carefully at the messages there is almost no information lost, it’s just speaking more plainly
- GPT-4.1 is more verbose and restates technical detail, something we’ve found to be useful in other parts of our investigation system (we’re using a lot of GPT-4.1 to build the data behind this message!) but doesn’t translate well to a human readable message
- GPT-4.1 is more likely to explain reasoning and caveats, and has downgraded the confidence just slightly (high -> medium) which is consistent with our experience of the model elsewhere
In this case I much prefer the Sonnet version. When you’ve just been paged you want a concise and human-friendly message to complement your error reports and stacktraces, so we’re going to stick with Claude for this prompt!