tef

bad poster & mediocre photographer

  • they/them

here's the short version: what's the difference between doing GET /root?walk=file/path and GET /root/file/path?

you might guess "not a lot" at first, and maybe "another layer of indirection" with an accompanying eyeroll. the actual answer, for what it's worth, is that it makes relative links opaque to the client. which does kinda change the game, ever so slightly.

honestly, i'm not even sure how good an idea this is, but there are some merits.


here's the big idea: when things are opaque to the client, the server can change the details without changing the client to handle them.

this might not seem like much, but it's a very powerful tool in protocol design. it might be easier to see with an example: take iterating through a list in a networked service.

  • a client asks the server for all the files in a directory, but there's too many of them, and so the request needs to be broken up into smaller chunks.

  • this pagination requires keeping track of where the client is in the list, and either the client or server has to keep track of the state:

  • server side state is bad for scaling, but the client api is simple. if i change things on the server side, to add new fields or search options, old clients can continue working as normal.

  • client side state is good for scaling, but it isn't as fun. every time the underlying details change, the client needs to change too. the client code can get pretty hairy, pretty quickly, as the state needs to be extracted and rewoven into each subsequent request.

thankfully, it isn't an either-or situation. most apis opt for something less clumsy, state passing, a fancy term for an opaque token (a string) passed back and forth between requests.

although we're storing the state on the client, the state is managed by the server. this is a real "have your cake and eat it" situation, and it's all over the web. if you've ever clicked "next" on a forum page, you've likely used it without realising it. it's the same big idea. i click "next" instead of editing in "?offset=40" in the url.

when you make things opaque to the client, you decouple it from whatever the server is doing.

ok that's one big idea down. now we can talk about relative links, but it'll be a little longer before we bring it all together.

here's the next big idea: when you make relative links opaque, then the server handles navigating them.

yes, this means i have to talk about filesystems. i'm sorry about this, but please bear with me.


i need you to imagine a very simple fileservice.

this service is a rudimentary array, containing either File or Directory objects. the server protocol is even simpler, with only one action: get this numbered File/Directory. to download a file, i begin by downloading the root inode. next i search the directory listing inside for the filename i want, and download it, repeating as necessary.

it might not be immediately clear, but this is another example of "client side state", or client side processing.

here, the client keeps track of which file or directory it's at, and handles things like searching through directory indexes by downloading the entire file. meanwhile, something more "server side state" might involve sending over "change directory" commands, and it might involve servers keeping track of which byte of a file a user is currently looking at too.

again, this isn't the full picture. real life doesn't lend itself to dichotomies, and most protocols opt for something a little in-between.

take plan9's 9p. the client has to use inodes and has to keep track of where it is in a file, but the server handles things like navigating directories, and also keeps track of which files a user has open. it offers a Walk(inode, path) -> inode operation to it lets a client skip the expensive part of navigating directories, without the server having to track any extra client state.

in summary: absolute addresses (like inodes) force navigation state to be stored client-side, but handling relative addresses (like a inode and a path) can be added to the protocol without further burdening the server with state.


ok ok, enough about filesystems, what about hypermedia, what about relative urls? we've already been talking about them

  • a hypermedia service (i.e, the web) is like the client-side filesystem protocol. to resolve a link, i must download each html page in turn before getting to the end of the chain.

  • server side state would avoid having to download the files, but tracking state is expensive

  • alternatively, we can handle things like relative addresses server side, by adding a Walk like operation to resolve links server side.

  • we could even accept (base, path) arguments wherever a path is passed in the protocol, supporting relative paths in every operation.

  • yep, that's right. what if we did GET /root?walk=file instead of GET /root/file

let's do a worked example. here's an imaginary client, trying to find a given file, searching through directory listings.

GET /  
GET /Namespace/
GET /Namespace/ResourceName/
GET /Namespace/ResourceName/111233-44844

you can return extra results in the first call, and skip subsequent lookup calls, but you still have to make two calls.

GET / # directory listing is recursive
GET /Namespace/ResourceName/111233-44844

with a relative link? one. bliss.

GET /?walk=Namespace,ResourceName,111233-44844 

as far as i can make out, three really nice consequences fall out of this design.

  • we don't have to download every item in turn to get to a page. this is the biggie.

  • we can do state transfer and smuggle state inside the relative path, and the client is totally oblivious to it. if you already have query params, this is a little moot, but it's worth mentioning anyway.

  • the third consequence is a little more involved. in filesystem like protocols, there's a 'conflict' between expressing operations in terms of do(filename) and do(directory, filename). some operations work on the file (like updating or fetching) and others (delete and create) work on the directory. meanwhile, when every operation accepts relative paths as well as absolute paths, every operation ends up being expressed as do(base, filename, which is kinda neat.

the first consequence is the biggest one. now, yes, you might be saying "well, you can just hardcode the urls in the client" and my brother in christ, that is not what a hypermedia system is. the relative url might not resolve so neatly in a real world example.

the point is that a client gets to pretend it's hardcoding urls, but it's actually hardcoding names of links to follow. heh heh heh. i'll make you bastards restful one day.


You must log in to comment.