small week because jae continues to be sick in ways that are annoying, but with some business news and big changes!
- business news: @aidan is now officially a co-owner of the company! congrats and/or condolences!
- HUGE change to how we cache data about posts that makes everything way faster. we’d had a blind spot in our knowledge of database development that has now been filled in.
- deployment took longer than expected and unfortunately made posting Not Work for like 15 minutes but it’s fine now.
- lesson learned: run big database changes against more faithful clones of the production environment, to get reasonable estimates at their performance characteristics instead of just guessing based on how they ran on a slice of the database from months and months ago.
- deployment took longer than expected and unfortunately made posting Not Work for like 15 minutes but it’s fine now.
- fixed some issues with embeds causing a lot of flashing
- tweaked the long post collapsing threshold based on feedback
- fixed a bug where notifications would sometimes display “ERROR STATE” when there simply wasn’t any text to show
- made fairly substantial changes to how post publishing works
- fixes a bug that most people would have never experienced but that the three of us experienced literally every single time we posted
that’s all for this week, thanks for using cohost! 
what was wrong?
there's a few pieces of data about each post which are rendered (almost) everywhere a post is, but aren't stored with the post for various reasons:
- how many comments it has
- which tags are applied to it
- which attachments it has
all of these things were being recomputed every time a post was rendered, and even though we'd done all of the easy things to make each individual computation fairly quick (a handful of milliseconds), there were a bunch of them to do on certain pages (the tag page, dashboard page, etc.) that added up to entire seconds overall.
how did you fix it?
same way anyone fixes basically anything slow in computing: computing a result once, storing the result of the computation, and then only recomputing it when you have to (i.e., when someone makes a post, comments on a post, changes the tags on a post, adds an attachment to a post, etc.) it's called a space-time tradeoff and it's extremely cool.
why didn't you fix it earlier?
with our level of understanding of relational databases at the time1, we assumed that we would have to do something super complicated and fragile involving other services; we also knew that the queries that built these pages were slow, but hadn't done the digging into exactly which parts of the queries were slow, and didn't know how much faster the slow parts would get if we fixed them. so we put it all off for a while hoping that we'd find something else that improved performance while being less of a headache to implement.
but then by chance, reading through the PostgreSQL documentation, we discovered that (duh) these problems aren't new, so people had seen fit to make this technique straightforward in-database decades ago, in the form of materialized views. we didn't get to use PostgreSQL's built-in materialized views for this, because they have to be manually refreshed across the whole view at once -- i.e., any time anyone posted, commented, etc., we would have to recompute these statistics across the whole database2 -- but this is a well-known limitation and if you know the magic phrase, there's plenty of intermediate-level blog posts about building your own materialized view by hand that works the way you want it to work. here's the one we used.
no new techniques here, no strokes of genius, no tech talk material, just knowing the right two words to punch into google to find a how-to article.
-
I had the opportunity to take a databases class back in college but discarded the idea as boring3 so I didn't learn back then, and while we've used databases plenty over our careers so far, this is our first time dealing with database-bound operations as a core part of our day job.
-
this process of computing the statistics for the entire database, incidentally, is what took almost 20 minutes to run during the deploy on Wednesday -- so you can see how that's not something we want to do a few thousand times a day.
-
still 100% sure I was right about it being boring.4
-
should've still taken it anyway, probably. I took classes in operating systems and artificial intelligence instead and I've barely used either of those.
