• she/they/any

software engineer | blaseball tool maintainer

avatar by cinnamon_shakes

occasionally 18+


hecate
@hecate

i learned something cool today: steam apparently uses the google cloud cdn now (too? in conjunction with akamai? unclear. they still use akamai for steam client downloads, and possibly some portion of game downloads?)

i learned this because: i work on google's cdn and baldur's gate 3 launched today, and did not have a preload period because of its early access :)

(everything was fine, we just did notice an uptick on our graphs lol)


hecate
@hecate

without breaking nda, i can say that obviously, youtube traffic is the largest part of the cdn traffic generally, and it makes up a Significant Portion of global internet traffic (by bytes; im sure that if you want to measure by packets or other metrics there's other stuff that takes the cake. but bytes throughput is generally the limiting factor in global networking!)

our team does not actually provide any support for non-youtube serving - we own the bytes-serving service that runs on every cdn node, and that also handles serving bytes for stuff from other backends, too, but it's always just been a "we dont notice the chrome downloads and android updates and whatever" situation bc the traffic for those is just a constant trickle. video traffic is Big, and it is Constant.

video games are one of the few other Really Big Files things that modern internet speeds enable, but they dont generally have really large patches everyone downloads at once. or if they do, they generally have preload periods

...except baldur's gate 3 had early access, which meant no preloading. if you bought it early, you got what was functionally a demo, and then had an 80gib patch this morning (instead of a 100gb preload at some point this week, that could have been staggered or queued for off-peak times)

(and that still wasn't enough for anything to go wrong! it went just fine! i dont think it would be fine if this became a common practice and happened frequently, but it's cool nonetheless)


hecate
@hecate

also anyways the public sources on what i'm talking about:

  • steam cdn on netify -> you can see a lil bit more here (such as, say, the fact that the domain google2.cdn.steampipe.steamcontent.com exists is indicative of hey, google cdn being used for steamcontent)
  • you can see the egress peak on steam's POV here, or below:

it looks like steam still largely uses akamai, but they also do use an amalgation of other cdn providers depending on geographic location, i think? based on this

curious day


hecate
@hecate

weirdly, i am seeing different download bandwidth graphs on that steam page based on which computer i am using:

unsure what that's about, but still, lol. 150Tbps spike.

my general insight here is that the download issues people saw this morning were basically just 'local cdn node gets hammered by druids [18+]' and like - there is fundamentally nothing you can do about that outside of, yknow, having preloads and staggering downloads prior to launch

cdn networking and capacity management is not like the capacity management inside DCs and clusters and whatever that most engineers are used to now, thanks to The Cloud and Containers and just spinning up new replicas and whatever. CDN network capacity is not fungible. you can't use your off-peak nodes in india or switzerland to serve people in chile - those links don't exist, and even if they did, the speeds would be impacted, etc etc. upgrading that capacity takes time (since just dropping a new node or rack of nodes somewhere may not really work - it depends on all the peering and how saturated those peer links already are, if those links will need upgrading to support a new node with significant egress, how long those upgrades will take, etc etc)

thankfully a lot of this stuff is incredibly ephemeral, too, since a lot of the Huge Load on CDNs is also, to some degree, weathered by the nature of The CDN - if everyone is requesting the same stuff, it will all be living in-cache (generally), which at least ensures relatively speedy local downloads that don't need trips back to DCs for cachefill, so while the downloads for everyone on that node might get throttled as its total egress gets split up between users, there is a very finite amount of data the users want, and it will end up being delivered over a Relatively Reasonable amount of time (hopefully).

(i kinda love my job & love my lil perch where i get to Observe the youtube cdn all the time. it kinda does underpin the internet. i don't think the internet as we know it would really exist without CDNs being what they are now)


You must log in to comment.

in reply to @hecate's post:

huh, that's interesting! I wonder if they mix cdn providers to maximize their geographic coverage to get low latencies everywhere? or like more for cost efficiency? I'd be curious to learn how they distribute their content with mixed cdns!

if you're able to share: did you learn this from an incident response? :eggbug-devious:

i learned it from someone going "huh, this small jump on this graph came from one customer, is there a routing bug or something?". the gamers in the chat figured out what was up pretty quickly

and yeah, it looks like they mix cdn providers now (eg alibaba cdn in china)