i'm sorry, i'm techposting again, but at least this time i'm not posting about representational state transfer. anyway, on with the show
- use uuids not ids (for posts)
- use strings not booleans (for state)
- use everything (for pagination)
1. autoincrement bad
the devil has given us tools for our own destruction, and one of the more innocuous ones is an auto-incrementing key. in less database terms, it's a number that gets assigned to every post, that you add one to each time.
the problem with autoincrement keys is very simple: people will use it to infer other information, most notably they pretend it's a timestamp
here's a simple way it can break. you create two drafts, one gets key=100, one gets key=101. you publish them in the opposite order, and now clients will show the order in which they were created, not in the order they were published
there's other issues that i'll get to, but it's worth noting how badly this backfired for twitter. they also used an autoincrementing id for tweets. this means every time you hit post, you had to fight the database to increment a single global number. then when they got big, they had to "scale it out"
so they wrote a special "number go up" service, which handed out batches of numbers to servers, to assign to new posts. unfortunately, this meant that "number bigger = post is newer" stopped being true globally. so a lot of api clients broke, showing posts in weird orders, replies happening before the actual post, that sort of thing. it was bad.
the thing is, because they used a number in the api responses, clients broke anyway. first it was too big for a 32bit integer. then it was too big for a floating point number. eventually they gave up and put it inside a string field.
you can save yourself a lot of hassle by using uuids (as god intended) and returning them as opaque strings (as foretold in the prophecy).
there's other problems with autoincrement keys. people can tell how many of a thing you have, people can guess the next number you're about to use. it's how someone made a tweet that quote tweeted itself—which was pretty cool, but there can be more destructive examples.
anyway: use a uuid. it's good for you.
2. booleans smell
let's say you haves something like a background task table in a database. you have a "is_complete" field, and set it to true when it's done. then you discover some of them have errored. so you add a "has_errors" field, and set that to true if there's errors. then you find out that sometimes a thing completes with warnings, so you add a "has_warnings" field. the poor sod of an api client ends up having to write "if complete and ((not errors) or (errors and warnings)" just to turn a lump of boolean fields into a reasonable state value.
if you find yourself writing long chains of if-then statements to work out what state a thing is in, save yourself some time and make a variable called state. in the previous example, you'd have state being one of "enqueued, active, successful, warning, failed"
and what's more: adding new states is pretty trivial. you can even add a "state_description" field to present in a client, with a human language description of where something is at.
in general: booleans smell bad. real bad. unless you can be sure a thing will always be in one of two states, independently of any other property, you're going to have a bad time—as sooner or later, you need a third state
a friend wrote a cfp submission system, and i told him this wisdom beforehand. "use a field called state" and he shrugged and said ok. a week later, he'd discovered a new state a talk submission could be in. not just submitted, not just accepted or even rejected, but "needs more information"
the trick with api design is never making something absolutely final unless you're absolutely sure it will never change.
3. pagination is just tricky
in the bad old days, people would paginate through posts by passing in a "skip 30 posts" parameter to the query. then someone would make a new post, and all of your pages would be off by one, or even two or three. throw in infinite scroll, and this is why sometimes you see duplicate posts when scrolling through a list of posts.
there's other problems with this approach too: it's bad for the database. instead of going to the records you want, you go to the first record, and skim ahead n records. so the first page of results is fast, the second is a bit slower, and by page fifty your database begins to crawl, because there's a lot of clients hammering away catching up on old missed posts.
so, you decide "i will use a timestamp"
so you use a timestamp. unfortunately, this creates a small edge condition, where two posts are created at the same time, and lie across a pagination boundary. when you hit next, you might end up skipping over one of the posts, or seeing the same post again.
the solution? you pass in the uuid and the timestamp of the last post you saw, and the server can then accurately pick the next one in the chain to send back. timestamps don't provide a total ordering of posts, but a timestamp+uuid combination will.
you can just send in the uuid, but it's kinda nicer to send through all the information you need, too
bonus: screenscraping is great
just put some semantic class names in your html. let people scrape the pages that come out. a website is a lot easier to maintain than a website and an api. screenscraping is also a lot more robust to changes, too. you put new data in and clients just shrug and ignore it.
it also means you spend less time arguing about urls and http methods. you just make a website, people click on links, things work. even when automated
summary
api design isn't doing the right thing, it's avoiding doing the same old wrong things over and over again but wait. there's one more thing.
edit: there wasn't one more thing. the above sentence was about screen scraping before i jiggled the paragraphs around. i'm leaving it in because it's funny, not because of dogme'95