04 · Impact

Caught something the service itself was mislabelling.

A 500-for-4xx mislabelling burned our budget for failures that weren't ours and hid a real client-facing problem — retroactive proof the black-box principle mattered.

The strongest outcome

Caught a service reporting client errors as server failures

Digging into an anomaly on the SLO dashboards, I found one service returning HTTP 500 for problems that were really 4xx-class bad-request issues. That mattered on two fronts: 5xx means we failed, 4xx means the client sent something invalid — so we were burning our error budget for failures that weren't ours, and the mislabelling hid a real client-facing problem behind a generic “server error.” Every self-reported service metric showed a legitimate 5xx spike. I dug into each failing case using the distributed tracing I'd set up — a wrapped OpenTelemetry library with trace-to-log correlation across our Go services — cross-referencing traces, logs and code paths to confirm it was malformed input rather than a real server fault. I re-classified those routes as a pragmatic fix and it drove the work to correct the status codes at source — so the metric measured what it was supposed to.

The dashboards surfaced the anomaly; digging into it revealed the mislabelling. Retroactive proof the black-box principle mattered.
Monthly error budget · 99.9% SLO
43of 43 good-minute budget remaining
0 min burned50%85%43 min · SLO breach
Healthy· A minute is good only if both API (no 5xx) AND controllers (<5% down) pass.

Shifted reliability culture

Teams stopped asking 'who's on-call?' and started asking 'is this worth burning budget for?' — reliability moved from an on-call behaviour to a property of the system with a shared vocabulary. Outlived my involvement.

First canonical source of truth

The org's first factual basis for whether the API and controllers were actually working — not an opinion, a number everyone referenced.

Visibility into the key user journeys

For the first time we could see, at minute granularity, whether the interactions that mattered to customers were actually working — surfaced issues that were invisible to the services' own metrics.

Durability without me

I left shortly after shipping. Pipeline has stayed solid; BigQuery data is queried and investigated frequently.