akhra

🏴🚩⚧️⚢♾️ΘΔ⚪

  • &🍯she/her 🐲xie/xer 🦡e/em/es

wenchcoat system:
🍯 Akhra (or Melli to disambiguate), ratel.
🐲 Rhiannon, drangolin.
🦡 Lenestre, American badger.

unless tagged or otherwise obvious, assume 🍯🐲🦡 in chorus; even when that's not quite accurate, we will always be in consensus. address collectively as Akhra (she/her), or as wenchcoat (she/her or plural).

💞@atonal440
💕@cattie-grace
❤️‍🔥(not#onhere)
🧇@Reba-Rabbit


Discord (mention cohost, I get spam follows)
@akhra
Discord server ostensibly for the Twitch channel but with Cohost in hospice y'know what let's just link it here
discord.gg/AF57qnub3D

Stegosaur
@Stegosaur

I am lucky enough to have the day off, so I don't have to deal with thousands of Windows endpoints running Crowdstrike requiring recovery mode reboots and BitLocker keys to fix a bad update. So instead, I'm going to offer some insights from my ~15 years of professional experience for the rest of ya'll as a learning experience.

Here are my hot takes:

  • This outage is a demonstrator of the overcentralization of IT in general. One OS with one security agent has managed to take out banks, hospitals, airlines, government agencies, and media outlets. This should be terrifying to the casual observer, because it shows a single point of failure for modern economies.
  • This is also just a small preview of an eventual public cloud outage. What happens when AWS goes down globally? Or Azure? Or Google? This isn't doomsaying either, as all three have already had outages on regional scales before.
  • Too much of modern tech is just building bandaids for the problems of bigger tech companies - like CrowdStrike as a bandaid for Microsoft's abysmal product security. There's few to no companies actually championing a brand new way of doing things that removes dependencies on older technologies. Even Microsoft hasn't given up supporting Active Directory on-prem or eliminating it entirely, as every major Cloud Identity Provider inevitably hooks into it for Authentication and, often, Authorization.
  • Tech stocks have been overvalued for years, and this incident really shows their weakness. These aren't "radical innovators", they're just really good at selling minor improvements as gargantuan features (like Apple) or outright snake oil, and pumping their valuations accordingly.
  • The real innovators are being derided for not pandering to the status quo. Outfits like Oxide Computer are trying fresh, new approaches to longstanding problems, but because their products force businesses to behave better (by removing legacy systems, modernizing infrastructure, or just investing in their fucking workers), they're often passed over in favor of whatever lets the company keep the status quo a little longer and outsource abroad.
  • Speaking of outsourcing, that is absolutely an underlying contributor to today's events. Microsoft and CrowdStrike are popular precisely because they're a known factor for outsourced workers. Companies don't keep Microsoft because it's a good value or a high-quality product, they keep it so they can fire the expensive worker in the USA and ship that role to India for pennies on the dollar, all because CrowdStrike "makes it safer". They keep it because it's something "everyone already knows".
  • On the topic of outsourcing, the present solution to the CrowdStrike fiasco involves booting to recovery mode and juggling encryption keys (if enabled - and they better be enabled if you're good at your job). Companies that outsourced everything abroad will have slower recovery times for the very reason those folks can't physically assist their employees with recovery, and have to read or transmit these keys and instructions out-of-band (meaning, outside normal communications channels) in many cases. Having a domestic workforce, even in a remote-only working environment, better connects workers and allows faster recovery times.
  • One more on outsourcing: these outsourcing and MSP firms often support dozens, maybe hundreds of customers. How do you know where you sit in the priority list for a global outage like this? How long was your recovery delayed because you outsourced in the first place? In-house staff puts you first, while outsourced staff has to prioritize whoever pays the most first.

TL;DR Version

Even non-tech people should be horrified at what's happening today, because actually competent IT people (like myself) have been screaming about this for over a decade, now, and these outages are only growing more frequent in timing, larger in scope, and severe in intensity. The ultimate root cause of this is centralization of technology and business into the hands of very few companies, and the lack of both incentives to proactively fix their fucking shit, or meaningful punishments for failing to do so. It's a matter of global defense to decentralize our infrastructure, divest into multiple vendors, and improve regulations so that the next time something like this happens, the executives and shareholders who supported share buybacks and layoffs over R&D or improvements are held to account.

This was not a failure of technical controls, it was a failure of basic competency.


You must log in to comment.

in reply to @Stegosaur's post:

I had this conversation w someone who was convinced it was some secret hacker situation going on or whatever but like. I worked in tech and this kinda just happens when the tools you're forced to use fail. We weren't particularly affected by this kind of stuff bc we had our own servers, but I think a lot of people just aren't aware that like, especially because of some very unfortunate recent trends in development cycles, pushing something out to production and it just breaking immediately just happens all the freaking time. Heck, I work at Starbucks now and our region got a new register system and both the first update and the second non-fix update had every store in our region down for half a day lmao. I think a lot of people love to think this stuff is always freak accident type situations, there's comfort in thinking this doesn't happen very often, but the reality is that it's always the same issue over and over again, it happens constantly. I have caused similar (much smaller scale) issues just by being a bit hungover while doing code review or messing with the database lmao.

but I think a lot of people just aren't aware that like, especially because of some very unfortunate recent trends in development cycles, pushing something out to production and it just breaking immediately just happens all the freaking time.

Yup, can confirm this is a frequent occurrence in modern development operations (DevOps). It's rarely because of a lack of devtest or staging environments, though, and often because of a lack of customer response (if you're the one making the change) or a push from above to just get it done. I'm presently in an organization where the mantra is "just stack wins", because an "almost finished deployment of X" counts as a win to leadership who just need to check a box for their KPIs.

Specific examples aside, this was the general public's first real exposure to just how fragile the modern tech stack is. It'll only get worse from here, so long as governments fail to incentivize better operations - or at least punish companies who fail to build resilient and secure systems or products.

It's rarely because of a lack of devtest or staging environments, though, and often because of a lack of customer response (if you're the one making the change) or a push from above to just get it done.

Oh yeah absolutely. Most of the issues we had was just like our project manager saying something needs to get done in an incredibly unrealistic time frame, but because we were working on software for the state government, we were far more careful about making sure things actually worked, because the consequence would be losing an inconceivable amount of money. Even though it meant we occasionally had to stay late just to fix stuff, it was kind of nice because... I worked on web accessibility for a long while and about once a week our pm would come up to me and go "why did you change the colors you have to make it look like the design document" and I'd get to go "please change the design document to have color contrasts that won't get us in trouble with the state". Bizarre type of conversation to have to have but it sure is evidence that through the threat of loss of large amounts of money, anything is possible.