serverless-techno-druid

@serverless-techno-druid

  • http://pronoun.is/they

Techno-druid, death floof, cloudie. Tracker, pythonista.


for about 40 minutes there was an issue where the dashboard would fail to load with an "output validation failed" error. the rest of the site was unaffected.

the outage was caused in the first place by a couple of unfortunate events lining up:

  • at some point several months ago, we silently wrote an API method such that if you provided it a query which would return zero results, it would return an output validation error (causing the page load to completely fail) rather than returning an empty result set. this was missed in code review, and took the form of using a validation called .positive() instead of the correct .nonnegative().
  • today we changed the input escaping on this method to fix issues related to loading tag pages for tags containing percent signs, but forgot to change a lingering use of it which would now return zero results; poor test coverage for the API method and one of our devs' failure to perform a manual smoke test in addition to an automated test pass let it through.

we also had some operations problems reverting to a last-known-good build, caused by another bugfix in the deployment pipeline at the same time, that required manual troubleshooting and resulted in the outage lasting 40 minutes instead of 10-15.

we'll be fixing that API method and writing better tests for it so this doesn't happen again.