nys

definitely a human

  • she/it

30-ish, definitely not a personality construct running on an android. nope.
S/N: 3113

not everything is 18+, but this an adult page

pfp is by moonlitvesper

This user is an it


lexyeevee
@lexyeevee

is a process, not a stat, not even a progress bar

please can the Gamers stop saying things have "bad optimization" or whatever. i don't know how to kindly explain to you that the way it sounds lands squarely on a spectrum between clueless and jackass. consider the following:

  1. it is a MIRACLE that video games work. like, at all. i make video games and i can hardly believe they exist. and that's before you even get to the graphics!

    here, i'll give you all the vertex coordinates of ten thousand objects in 3D space. tell me which ones are touching. you have: 0.005 seconds. good luck

  2. making something faster while behaving the same is hard. it is really hard. it is really really fucking hard. it is one of the hardest things. if you're lucky there's some obvious low-hanging fruit that gets you far enough. beyond that it is a niche so specialized that it has no name, performed by 1 witch at your studio that no one's sure they've actually met in person, who disappears for three weeks only to reveal they've improved load times by 9% by forking ext4 to store entries in alphabetical order, whatever the fuck that even means, while the rest of you are desperately scouring stackoverflow for something bozotic like whether double-quotes are "faster than" single-quotes

  3. there is no such thing as "fast". there is only "faster", and "fast enough". you don't just keep going until you have Fully Optimized The Code because that is not a real thing. you plunk away at one thing at a time and watch your update and render times drop by microseconds, sometimes unsure whether you've even made a difference because it's drowned out by noise. maybe you made things slightly worse, even. maybe better on one platform but worse on another. cross your fingers i guess. how long until we're supposed to ship, again?

this just really grates at me because like

"it's slow for me" is a factual observation. even "it's slow for everyone" is a factual observation. "it has poor performance" is still a factual observation. "it's fucking unplayable", "runs like ass", sure

but "it's badly optimized" is a value judgement of skilled work that someone did (or lacked time/expertise to do) on code you have never read. it seems to have come out of the same vortex that produces insights like "[game] was made with the unity engine, which is why [non sequitur]" from people who are inexplicably compelled to talk about the nuts and bolts despite having never seen either a nut or bolt themselves

you can just, have opinions on video games. you don't need to try to fake sounding like maybe a programmer


nys
@nys

i am, what you might consider to be, a pretty good programmer. i am very holistic and it is why i am in my internal tools/devops world; to wear lots of hats. i have written “performant” code that i “optimized” to handle decent workloads.

i say all this to introduce my best friend from college and who i learned so much from. he is a performance engineer. he read database whitepapers for fun. he makes roughly 2.5x my grossly inflated salary and he deserves it.

let’s set up a little example. it is, fittingly for the above post, a non-distributed parallel computing 400-level class (which i took in parallel with graphics, and linear, ha). in this class[^1] the top 3 performing submissions for an assignment got 20, 10, and 5 percent extra respectively.


our first assignment was ezpz matrix multiplication. bog-standard stuff, using a simple space/newline delimited input format with raw text files. the time required to get full marks on the 10k by 10k matrix? like 15s (before you say anything, general parallel computing in the 2010s was moving fast). the prof warned that most of that is from parsing the bad data format.

my proud, time after an hour or two of painstaking optimization (mostly in parsing) of the assignment? 8 something seconds. that got me a firm second place by a few second margin and then 3-10th were separated by a second.

so you have:

  1. ????
  2. 8ish
  3. ~10
  4. ~11

what did my bff get? less than 2s.

my little parsing optimizations boiled down to threading the parser and doing some manual parsing not using the stdlib.

he wrote the whole parser in assembly

it took him nearly 6x as long as me to do it, but he emerged from our bedroom with code that parsed the stupid file format insanely fast. he even optimized (note the lack of quotations) the code to work on the instruction sets available to the bot that ran our code (our laptops had that new AVX instruction set).

this, him way ahead, me leading the rest with a smaller but comfy margin continued through the year. we used to compete on performance per time spent and i beat him handily on that which is why i am a decent engineer who can “optimize.” but even if i spent the same time he did on the problem, it wouldn’t approach his times.

this is all to say that while we convince sand to think, performance engineers take advantage of the layout of the sand to make it think even faster which is literal technomancy. and sometimes they literally realize the sand works faster on odd numbered days so now that bit of sand thinks it’s always the first.

also in this class we didn’t realize we had a presentation due so i, quite literally, was writing the google slides as he presented them. (we crushed it)

[^1] : this was very common for 400+ classes. my networks class showed your performance on the class distribution (this footnote kept not rendering and it is @646 so i am tired and confused, wow i am programmer such good)


You must log in to comment.

in reply to @lexyeevee's post:

Also yes to the witch. Everyone's been fighting to pare cycles from the inner game loop for three weeks and they turn up in awful shape at standup one morning having not slept in four days. The new routine is a bunch of raw opcodes in a byte array that, insanely, means the same thing in armhf as it does in x64. It is 50% faster but no one understands how it works, them included. The CR has the original assembly but it doesn't assemble to the same set of opcodes in from either the ARM or Intel assembly source. It passed all the tests however. They do not turn up at any more team meetings. A month after launch you find out they moved to Alaska.

you don't just keep going until you have Fully Optimized The Code because that is not a real thing.

Wanted to add to this, because optimizing graphics is currently my job, and explaining things like this to my coworkers is therefore also my job.

Doing an optimization is several kinds of hard. Doing a "full optimization" is several kinds of impossible, coupled with several kinds of not well defined.

I can make benchmarks that try to represent real-world stress cases such that "code that performs well on these benchmarks will seem fast to users". Inevitably some of the assumptions I made will turn out to have been wrong. I'll have missed some rare but important real-world case and have to make updates to the benchmarks.

But let's suppose my assumptions are solid, and that I can use the benchmarks as ground truth. There's still a sense in which "optimization" is not well defined. Doing well on one test-case in one execution environment doesn't guarantee doing well on all of them. There's room in the design space to make tradeoffs, like "what if I did some extra computation to prune some potentially unneeded branches? It would make some cases faster and others slower". If I want some way to compare apples and oranges, I could make up a composite score that flattens all the results into one metric. Again, I'd be making assumptions that are definitely arbitrary and probably wrong.

But let's suppose that I've made the "right" assumptions (whatever that means). There's now some sense in which code could be said to be "optimal" with respect to that metric. Once I've fixed a scoring system, there exists some implementation out there that gets the best score. Of course, I still don't know what the best score is, and finding out is an undecidable problem.

I am currently involved in a performance project (not games, just boring business software), and the question of “are we done?” is of course obviously ridiculous, but even the question of “is it even any faster on anyone’s computer other than my own, in any situation outside of this synthetic test case?” is soooooooo much harder to answer than it seems like it should be

This reminds me of the old days when I had to regularly engage with programmers who had big-O Opinions on the relative "efficiency" of different programming languages. Because yes, the problem with your code is definitely that you're running on top of a virtual machine and not the fact that your data is stored with no plan of how to connect related elements, and you should definitely rewrite the whole thing in assembly language to trip half a percent off the runtime, assuming that it doesn't produce so much code that you have more cache misses...

This is a fair criticism and I will be altering my language accordingly.

Except with one exception: "Final Fantasy XIV 1.0 was badly optimized" is a factual statement because of shit like the flowerpot that had more polygons than a player character.

in reply to @nys's post:

yeah that would be like 500mb each… maybe it was lower. eyes memory suspiciously

and the format was annoying. one decimal point but no .0 and negatives were allowed so it could be like 4 or -3.2 or something. and i wanna say values were <100

maybe that was on his macbook (which had pcie ssd then) cause i’m pretty sure the server it ran on was SSD. then again i remember thinking it was like barely possible with disk read times.