I've started a page where I intend to collect links to various materials that I liked.
Today's addition is John Allspaw's talk Incident Analysis: How Learning is Different Than Fixing. The talk is only about 20 minutes long (the rest are questions and I didn't find that part particularly interesting). In the talk John describes common pitfalls of incident analysis.
A few points that resonated with me:
- Severity of the incident has nothing to do with how difficult and interesting it was
- Good postmortems are stories - they use typical story-telling techniques and are interesting to read
- Often postmortems become box-ticking exercises. Many postmortems are written to be filed, not read. This happens even in pioneering SRE orgs like Google.
Spend 20 minutes of your time, it may be worth it.