Friday, May 11, 2018

Actionable Production Escalations

I've long considered the following items the basics of an actionable production escalation. These were taught to me by Googlers (mostly when I violated these understated values). The fundamentals of any production escalation require the documentation of the following from SREs:
1. An exception, call graph, logs or metrics showing the problem
2. A first pass characterization of the problem (what is it / how much impact)
3. Why me? (Do we need a PoC that you wouldn't know otherwise?) 
4. What have you already tried. 
5. Things that you have noted that are out of the ordinary.
6. How specifically can I help solve this problem? (Find a PoC? look at the code? Judge downstream impact? Validate severity?)

Following the above process keeps a check on the level of due diligence needed before a Dev escalation. It also helps formulate concrete action items as part of the escalation process. I've found that this helps resolve issues quicker and keeps the prod overhead low for devs. What do you think?