mido

thinking so ur dog doesnt have to

I like making little games with friends :] Mostly shit posting and vibing.


a little bug preps refactors to start building reliability into the current architecture

Next Steps

Currently we have a very stubbed-out but functional end to end mvp working. We can create connections, the default "system" stream is configured as part of the setup, and we can send messages both ways on it.

This "system stream" concept is absolutely vital. It MUST be a reliably ordered stream, as it will be the library-user's (future me) only rock in the storm. It will ALWAYS be available so you can structure critical parts of your game's netcode around it. Super important meta categories of messages like scene-migrations, party invites, item trading, certificate rotations, general setup/teardown of game state, player disconnect messages, etc. Since this is so important, and we've no longer got obstacles, it's time to start building out the reliability components of our protocol to achieve this dream. Let's outline our objectives:

  • Reliable retransmissions
  • Contiguous and ordered message presentation to the application
  • Fragmentation & queuing of stream-sent messages exceeding MTU

Where we're at now

The GameConnection struct can hold 32 streams (I should probably rename them to channels). This structure is apparent in the struct itself:

pub struct GameConnection<State = Pending> {
    // ..snip
    // stream organization
    system_stream: Option<Box<GameStream>>, // system stream is enabled by configure_system_stream
    streams: [Option<Box<GameStream>>; 31],
    // ..snip
}

Our objective is to add new specializations in and around the GameStream type to allow us to have a few approaches for data reliability. GameStream looks like this currently:

pub struct GameStream {
    send_sequence: u16,
    /// The ack mask my peer has observed
    send_ack_mask: u32,
    /// Our sequence as observed by our peer in regards to this packet
    send_ack_seq: u16,

    /// The ack mask i've calculated based on messages
    /// received from my peer.
    recv_ack_mask: u32,
    latest_recv_seq: u16,
    last_recv: Instant,

    // outgoing packet buffer
    packet_buf_seq: [u16; PACKET_BUF_SIZE],
    packet_buf: [Option<Bytes>; PACKET_BUF_SIZE],

    // the sending side of the channel that will ultimately want raw
    // bytes off of this channel.
    recv_handler: Box<dyn Fn(Bytes) -> Result<(), ()> + Sync + Send>,
}

I don't want to get too far into it in this post, but I'm making a principled choice here to have all streams share and track the state shown here. Things like processing the ACK masks is cheap and will more or less always be useful in all situations, even if just for analytical purposes. So: a dead simple unreliable stream might not use the masks to do anything complicated, but it's easy to imagine the benefits from having stats about the health of our peers channels being derive-able from these ack concepts.

My current plan is to tackle three types of "reliability strategies":

  • ReliableOrdered - Your classic TCP-like stream. Will send and receive fragmented messages, will enqueue both outgoing and incoming traffic until receipt and construction can be verified. Only produces complete Byte payloads after reconstruction of incoming fragments can be completed.
  • UnreliableLiveEdge - This is an aptly named, and common approach in unreliable UDP channels: always discard old traffic when receiving out-of-order traffic. So if the remote peer sent ABCDFE, the application on the receiving peer would only receive the bytes: ABCDF, E was discarded. I'm undecided on if/how I'll handle fragmentation here.*
  • UnreliableBestEffort - This approach makes no special considerations and will always deliver to the application. This is a "use with caution, but useful in some contexts" stream where you might have a lot of spare game state that is already handling idempotent situations, and the out of order data is better than no-data. Game netcode programmers (future me) using this will probably want to encode application-relevant counters with packets in this stream, such as the frame this data represents, so you can do your own discriminations here.

Codin' Time

My current outline for these refactors is as follows:

/// This trait's goal is to allow us to handle categories of reliably
/// receiving information, while also being a place to provide some
/// signals to the users of the whole `Stream` concept.
///
/// One potential signal is how big of a 'window' we want to send
/// traffic for, before awaiting ACKs for those messages (if any).
pub trait ReliabilityStrategy: Send + Sync {
    /// Takes `seq_num` and `data`, and is responsible for deciding when
    /// to emit a Byte object.
    ///
    /// One example could be in a reliably-ordered stream, we may want
    /// to hold onto an out of sequence element until the previous
    /// pieces arrive.
    fn marshal_stream_part(&mut self, seq_num: u16, data: Bytes) -> StreamPartResult;
}

pub enum StreamPartResult {
    /// We are ready to present Bytes to the application, and are "done"
    /// receiving the current unit of work.
    Ready(Bytes),
    /// We are assembling data and waiting on more before we are [StreamPartResult::Ready].
    Waiting(Instant),
    /// Something is very wrong!
    Error(&'static str),
}

Before I got too far, I refactored the current stream.rs code to handle the ideas of this API, with the simplest strategy to implement and is technically what we area already doing:

pub struct UnreliableBestEffort {}

impl Default for UnreliableBestEffort {
    fn default() -> Self {
        UnreliableBestEffort {}
    }
}

/// Will just always present the bytes back to the application in the
/// order they arrive in. This can be useful in complex scenes with lots
/// of sparse data for disparate objects where you've gone out of your way
/// to handle idempotency problems yourself.
impl ReliabilityStrategy for UnreliableBestEffort {
    fn marshal_stream_part(&mut self, seq_num: u16, data: Bytes) -> StreamPartResult {
        StreamPartResult::Ready(data)
    }
}

All this does is immediately consume and return the Bytes passed into it, because it's job is to do 0 thinking and just always give the application the latest information.

Let's add this new type to the GameStream struct. I'm choosing to eat the dyn cost here so we can have multiple stream strategies sharing a single type and have a single collection in GameConnection:

/// The most important primitive of this entire crate. This is lets us manage the
/// state of the individual channels and their reliability strategies.
pub struct GameStream {
    reliability_strategy: Box<dyn ReliabilityStrategy>,

Our GameStream::new() inside of impl GameStream {..} now needs to be generic to handle our new strategy types:

impl GameStream {
    pub fn new<T>(recv_handler: Box<dyn Fn(Bytes) -> Result<(), ()> + Sync + Send>) -> GameStream
    where
        T: ReliabilityStrategy + Default + 'static,
    {
        GameStream {
            reliability_strategy: Box::<T>::new(Default::default()),
        //.. snip

And finally, at the very end of GameStream::recv(..), after we do all the ACK mask tracking and sequence number storing/updating we need to do, is when we finally need to emit Bytes to the rest of the application.

Before this refactor we just sent the bytes immediately with no extra thought or considerations:

// finally transmit the packet
if (self.recv_handler)(application_bytes).is_err() {
    panic!("TODO: handle the receiver shutting down gracefully (clean up stream?)")
}

But now we have our Box<dyn ReliabilityStrategy> locked and loaded and can do the following:

match self
    .reliability_strategy
    .marshal_stream_part(remote_send_seq, payload)
{
    StreamPartResult::Ready(application_bytes) => {
        // finally transmit the packet
        if (self.recv_handler)(application_bytes).is_err() {
            panic!("TODO: handle the receiver shutting down gracefully (clean up stream?)")
        }
    }
    StreamPartResult::Waiting(_) => todo!(),
    StreamPartResult::Error(_) => {}
}

There's a few other small areas that get touched by this change, but these are the important behavioral pieces. Overall I'm feeling good about this approach and we'll see where this goes in the coming days! I think we have the pieces we need to build out an ordering, reliability, and fragmentation strategy for the ReliableOrdered specialization we're going to build.

Things still work fine! Test echo server receiving "hello" packets from the test client:

     Running `target\debug\test_server.exe`
UDP Socket listening on 0.0.0.0:27015
Starting ingress...
Built new pending connection!
Query incoming
Promoting connection Ready for Some(127.0.0.1:60862)
New player added! id: 1 from: 127.0.0.1:60862
Adopted active connection for 127.0.0.1:60862
Received system event from 1(127.0.0.1:60862): "hello"

You must log in to comment.