05 · Reflection

What I'd do differently.

Honest gaps from the timebox. Two are tactical next steps. The third is where this project points toward a platform.

Next iteration

Latency SLIs

I measured available but not fast enough. A slow-but-200 response counted as good. Natural next step.

Next iteration

Burn-rate alerting

Look at how the budget is being consumed and page on burn rate, not on individual errors. Would close the loop on the culture shift.

The platform version

SLI definitions as code

Push from a one-off measurement system into a self-service reliability platform: any team declares an SLO the same way, gets the pipeline and dashboards for free. The platform-engineering version of what I built.

The through-line

The pattern.

The pipeline is the artifact. The repeatable thing is spotting a reliability gap and closing it — with or without a mandate.

iZettle
with the mandate

Platform Services Tech Lead; org-wide observability strategy.

Einride
without one

Founded an SRE Working Group; became company-wide KPIs.

Fever
without one

This system; the org's first factual reliability baseline.

Reliability as an on-call chore → a measured property of the system.