Time To Refactor
I've been spending a bunch of time doing refactors and writing extra little bitmask utilities & unit tests etc to get thing set up for proper reliability behavior, but I finally hit a wall today where I need to step back and do a bigger refactor.
My MVP of MVPs here had a very simple approach for unreliable sends, and I was kicking some cans down the road in a "eh I can figure it out later, I think this interface will work" approach.
In true pilot-programming fashion I've realized how much undesirable coupling and side effects I was beginning to create as I began shuffling stuff around to support our reliability concerns. Namely there's three "layers" that have to coordinate to route all the inputs/outputs and state management:
┌──────────────────────────┐
│ GameConnection<Active> │
├──────────────────────────┤
│ GameStream │
├──────────────────────────┤
│ dyn ReliabilityStrategy │
└──────────────────────────┘
In the current architecture, the plan was to have a lot of common behavior live in the non-generic, concrete GameStream type, which includes a lot of little behaviors for processing stream headers and their ACK masks, sequence counters, etc. The hope was to let all GameStreams do this work by default, and to then have a family of dyn ReliabilityStrategy elements that GameStream can send raw stream-part data into and get raw stream-part data out of.
In my journey to get the reliability stuff going, I've realized that there are just too many stateful concerns between the reliable approach and unreliable approaches to share a common base without just a ton of special case spaghetti and plumbing across all 3 layers.
The New Plan
I strayed away from this originally but I'm happy to come back to it, the new plan is basically just to turn GameStream into a trait and have concrete implementations per each type, and forego the 3rd ReliabilityStrategy layer.
This will mean having a Box<dyn GameStream> in GameConnection, but the vtable lookup is not an actual concern of mine.
This also means I need to spend a few evenings just breaking things apart and shuffling code around. GameStream was getting a little big, and even bigger with reliability concerns/specializations, so with this lesson learned I've realized a decent new shape for things.
Previously the GameStream struct looked like this:
pub struct GameStream {
reliability_strategy: Box<dyn ReliabilityStrategy>,
/// Current sequence counter for outgoing sends.
send_sequence: u16,
/// The ack mask my peer has observed
send_ack_mask: u32,
/// The most recent sequence id our remote peer has seen from us
send_ack_seq: u16,
/// The ack mask i've calculated based on messages
/// received from my peer.
recv_ack_mask: u32,
latest_recv_seq: u16,
last_recv: Instant,
// outgoing packet buffer
// TODO: these may be worth getting rid of as we have the "reliability strategy"
// traits as a component of streams now
packet_buf_seq: [u16; PACKET_BUF_SIZE],
packet_buf: [Option<Bytes>; PACKET_BUF_SIZE],
// the sending side of the channel that will ultimately want raw
// bytes off of this channel.
recv_handler: Box<dyn Fn(Bytes) -> Result<(), ()> + Sync + Send>,
}Despite some renaming, I continually found it a little headache-inducing to divide up which state I was supposed to be messing with when jumping around in my work. More than once I found myself touching the wrong mask, for example.
My current high level sketch for the refactor here looks like this:
pub trait NewGameStream {
fn enqueue_send(&mut self, payload: Bytes) -> Result<(), EnqueueError>;
fn get_emissions(&mut self, num: usize) -> Option<Vec<Bytes>>;
fn recv(&mut self, payload: Bytes);
}
#[derive(Debug, Error)]
pub enum EnqueueError {
#[error("This stream doesn't support fragmentation.")]
FragmentationUnsupported,
}
/// As a message sender we need to continually monitor some ideas,
/// such as our send sequence counter, a bit mask of which of our
/// traffic our remote peer has successfully received (by processing
/// the ack header)
struct SenderState {
/// Current sequence counter for outgoing sends.
send_sequence: u16,
/// The ack mask my peer has observed
send_ack_mask: u32,
/// The most recent sequence id our remote peer has seen from us
send_ack_seq: u16,
}
/// Observations of our remote peer, namely tracking their most
/// recent send_id (for THEIR sends), and realizing a bit mask that
/// best represents their recent sends to us, which is all state
/// we'll reflect to them so they can constantly be informed of
/// our perspective of their behavior. This "reflection" of their
/// state is vital for them to make choices, or at least help them
/// be aware of the health of the connection.
struct ReceiverState {
/// The ack mask i've calculated based on messages
/// received from the remote peer.
recv_ack_mask: u32,
/// the latest sequence id i've seen from the remote peer
latest_recv_seq: u16,
/// The last point in time we've received any traffic from them
last_recv: Instant,
}
pub struct UnreliableBestEffort {
send_state: SenderState,
recv_state: ReceiverState,
}
pub struct ReliableOrdered {
send_state: SenderState,
recv_state: ReceiverState,
}Another mistake I made is trying to be too clever with the API for when GameConnection wants to ask the GameStream to take an arbitrarily sized (possibly larger than MTU) Bytes payload for sending. GameStream would do what was most appropriate and return a collection of Bytes to send out immediately to the UDP Socket:
pub fn get_sends(&mut self, payload: Bytes) -> Vec<Bytes> {..}This was fine for unreliable streams in my MVP work, since I was always sending less than our MTU ceiling (1430 bytes) and we could always send immediately. With reliability we may want to enqueue a ~15kilobyte or 15mb payload (god forbid). We may be already SENDING such a huge payload from a previous request, and the current API didn't lend itself well to that sort of enqueuing for a ton of ugly reasons internal to GameStream's coupled behavior with GameConnection and the ever-growing ReliabilityStrategy trait I just trashed.
The Good News
The good news is that overall I did a good job separating concerns, and a rewrite of my approach to GameStream does not dramatically impact GameConnection, and will actually simplify some little "ah fuck it"s I sprinkled in there.
I think I ended up here because I'm still growing as a Rust programmer. I write it in small fits once a blue moon and haven't had a project get complicated enough like this to tackle some rust-understanding problems. I had the wrong intuition with the tools available to me when I first sketched out these APIs and relationships, and also needed to get into the weeds more with the reliability algorithms.
I don't regret the path I took so far though, this is a "trust the process" stepping stone. Getting the MVP working e2e, for example, such that I could write a simple echo server, has been an incredible boon to my motivation and also a valuable "checkpoint" test when I want something more than unit tests to make sure nothing unexpected is happening.
Getting an MVP working and learning some lessons from that and revisiting some components is a healthy and normal part of just figuring stuff out.