jamesmunns

Doing embedded stuff

Be nice, work hard. Build tools, tell stories. Start five year fights. Kein Ort für Nazis.


A list of all the other services I'm on around the internet


Posts with code highlighted with codehost by @wavebeem.


All posts licensed under CC-BY-SA 4.0, unless otherwise stated.


jamesmunns
@jamesmunns

This is the rough version of a post I'd like to polish up later. Forgive me for not defining terms, or giving good didactical explanations. There will be a better post later, when I have more time and energy.

I work a lot with systems that talk to other systems. Technically, this puts them in the class of distributed systems, but unlike databases or something similarly complex, it's usually some kind of PC (laptop, webserver), talking to some kind of embedded system (a USB device, a sensor).

Regardless of complexity, these systems need to talk to each other to achieve some goal: obtaining sensor data, changing configuration values, etc.


jamesmunns
@jamesmunns

I went back and edited the post above, and put it on my consulting blog. If you missed it last time, I'd suggest reading the new version:

But there were also some good comments on the rough draft above :)


You must log in to comment.

in reply to @jamesmunns's post:

I had someone else mention session types! The initial examples I can find are... pretty dry and not very illuminating, if you have any good resources (blog posts, papers) that are more practical than theoretical, I'd love to read them!

Interested to hear what you think when you've finished reading the post.

It’s been a while since I last tried to mess with session types (it was probably like 2019), the Idris examples are the best I’ve seen. My impression I’m piecing back together is that basically you’d write the type the same way you wrote bootstrap, and it would produce a type that can be split in two and both halves used to write the two ends of the channel. I’ll have to try messing with the Idris session types library and see if I can write up this protocol with it.

"But how the hell do I specify a "protocol"? And how the hell do I write unfragile code that lets me write something that has a good chance of working out of the box?

I think talking about a stateful protocol in terms of a state machine is a good way to describe it to a human, in terms of how it's supposed to work. But I don't think it really gets us directly to unfragile implementations. The core issue, at least in the domain where protocols typically matter, is that each end of the conversation may have a different idea of protocol state. This could be due to buggy implementation, lossy communications, a malicious actor on the other end, cosmic rays, who knows.

The way I've dealt with this in the past is by starting with a protocol state machine, as you did here, but then moving into specific state machine specifications for each end of the communication. This is the sweet spot for model checkers; they're designed for exactly this kind of verification. I've used TLA+ for this in the past, pretty successfully.

Of course, a TLA+ model is not the code. It may be that if you write your endpoints against the right abstraction, you could verify the properties you care about with https://github.com/model-checking/kani or similar. Not sure, but it would be pretty sweet if you could.

w.r.t. session types: I've used this in the past as a way to make sure the types in your code are a direct reflection of an endpoint's observation of the protocol state. It doesn't solve the 'shit on the wire is painful' design problem, but it does let you statically check that you're only taking allowed transitions for one of the endpoint state machines. BUT, at least in Rust, I've session typing to be more trouble than it's worth. I inevitably build the non-session-typed version of a protocol client as well, which I always seem to need because the types don't /quite/ work out otherwise. Ymmv, of course.

Honestly

shit on the wire is painful

is actually super untrue for me! I use postcard for everything, so I typically don't even worry about the wire format - it's just data types in and data types out. "what data to send, when, and what messages are valid at what stage" are the much harder questions.

TLA+ is a good point, and I'll have to check out kani!

I've been reading the erlang paper lately, and am really coming around to "shits always gunna break, just handle brokenness at the system level", which is honestly very appealing. We still need a good way to specify the correct "happy path" tho, even if we defer all "non happy path" steps as "immediately bail and reset the state machine (exceptions in erlang, error returns in Rust).

The core issue, at least in the domain where protocols typically matter, is that each end of the conversation may have a different idea of protocol state.

I think this is exactly the thrust that I'm getting at - I want to be able to specify the protocol as an atomic unit, rather than implement it in two places. At least for my current work - where I'm writing the software for both ends of the wire.

In degenerate cases, where the state machines HAVE fallen out of phase, we declare that is an error!

While it is sometimes necessary to separate the different parts (different langs, protocol versions, etc.), IMO that should be the weird case, not the default option.

Funny enough, I have a case almost exactly like the one you described.

I can see this being particularly useful for implementing chatty stateful protocols that encapsulate a single complex operation (a "procedure").

[Reasoning out loud] *However you must be absolutely sure the state machine "contract" matches for both the client and server (either by versioning or as a checksum).

Defer any reliability concerns to a protocol layer like TCP. If someone violates the protocol, the other end resets the connection.*

I wonder if this would possibly also be useful in-process.