
member of @staff, lapsed linguist and drummer, electronics hobbyist
zip's bf
no supervisor but ludd means the threads any good
Centralia's VOL 1+2 Kickstarter is 75% funded!! Wowie zowies!
Here's the first of 4 postcard print designs that are available with the postcard print set ☆ the set is included in all bundle pledges and is available as an add-on to all pledge levels!
I've written a big incident report after Honeycomb had its biggest outage since having paying customers. There's a short copy on the blog but I encourage people to read the long form report [PDF]
It's got a bit of everything but the TL:DR; of it is that we migrated across clusters to avoid a bug, inadvertently discovered that a feature flag we had used many times did not work how we thought (because deploys were not running this one time and it somehow needed deploys), which in turn killed the update feed that was used to keep the ingest cache warm, which ended up creating heavy load on a DB, which ended up having an internal deadlock (in the MySQL implementation, not our transactions), which took down 100% of the system, which required a full ingest clamp and some SQL surgery to bring everything back to life.
What's specifically "fun" about this incident is that pretty much all the contributing factors we had were trying to do good things to prevent things from getting worse, and actually making them worse until the full outage.
It is a bit ironic how feature flags, frequent deploys, suspending deploys during incidents, and learning from prior near-misses all technically contributed to this incident, while being some of the most trusted practices we have to make our system safer.
Hopefully there's interesting lessons in there for our readers.
thank u jimmy buffet for ur everlasting impact on nonbinary fashion 🙏🏻 rest easy buddy n may u have an infinite supply of cheeseburgers in paradise