mido

thinking so ur dog doesnt have to

I like making little games with friends :] Mostly shit posting and vibing.


I've embarked on a fools journey to write my own little multi-purpose, ground-up, pure-rust game networking library for specific multiplayer game ideas I want to pursue that would be onerous with Unity (Mirror, UNet, Photon, etc.) or Unreal's OOTB networking.

I'm gonna blog about it free-form a bit and maybe if people care I can write up more formal stuff, or share very specific implementations and code sample walk-throughs etc.

These game ideas are generally predicated on long-lived, single or multi-scene authoritative servers with state machines & side effects that wont risk leaking or becoming unstable after being online for long periods of time (months, years, etc), but that still provide options for realtime state propagation (players jumping, running, etc, in real-time) and flexibility for each game to make its own tradeoffs about how authoritative to be. One game might need basic rollback considerations, another might be tile-based and every state change is validated, another game might be "strand" based and need to trivially create/destroy temporary connections to small radius' of player activity on highly conditional relevancy.

You can certainly find ways to do all these things in UE5 or Unity etc, but if you're still reading this post you probably understand the allure of "Not Invented Here" in this specific case. There is a very big "pro" item in the pros/cons here when you have built up a toolbox of components for complex systems yourself. My stance here generally is that game networking is never independent of the game's design goals, and it can be a pain to fight assumptions an OOTB networking library has made for you that eliminate, or make difficult, certain game design possibilities. Having full awareness of my own toolkit here lets me keep the ideas I want to pursue in mind as I structure this library's architecture.

Outline

The primary goals for a useful MVP are simple:

  • Don't over-prescribe application-level concerns (incl. binary payloads).
  • Provide a "connection" abstraction over UDP
  • Pure Rust for core impl & compilation units.
  • Provide e2e encryption
  • Provide a "stream" or "channel" abstraction within the "connection" abstraction so different constructs of the networked game application can isolate concerns.
  • "streams" can be reliable-ordered or unreliable-with-partial-retransmission (and maybe other modes in the future)

basic diagram of protocol guarantees

Writing it in Rust 🦀💻

Rust is cool, I like Rust, I have written a lot of C and C++ in my life and for a long time it's siren call beckoned me for this specific task. Thankfully Rust came along and what better language to wrangle a bunch of bytes and write cross-platform-friendly networking code in? (Zig is also cool)

As of tonight I have a basic test echo server & client working with some of our core concerns accomplished: e2e encryption, a 'stream' abstraction on the protocol level and programming-API level. This little milestone was my inspiration to brain dump about this somewhere.

Echo Server

I want to share some code in this post, so let me share how how the highest-level part of the echo server is written, starting from main().

Before that, the tiniest bit of background on some of the layers at play. There are several "low level" rust crates I've written: protocol, networking and server_traffic. server_traffic composes protocol and networking into server-oriented concerns, and a future client_traffic crate will do similar.

The application, in this case the "Echo Server", needs to spin up a tokio task in the "server_traffic" crate in order to start receiving new connections and present this information to the application. We must do some book keeping first.

End-To-End Encryption

There's an immediate caveat we have to talk about though: I chose not to attempt dTLS1.3 or any little mTLS-lite approaches that exist on top of UDP. I'm talking out of my ass here just a bit but the impression I got from weeks of poking around in the wide world of internet protocols was that DTLS1.3 and things like it (QUIC) are not really intended for the sort of "pick and choose" reliability concerns of real-time game protocols. Many of the available secure-UDP options are additionally concerned with offering TCP's benefits (reliable ordering and retransmission) with less handshake overhead, which rules out levels of control we need for games and more or less puts us back into many the same problems that TCP would: namely head-of-line blocking concerns and an intentional desire for some classes of payloads to be unreliable/unordered.

I took a tour of a number of networking libraries out there, and something I kept comin' across was ChaCha20. As long as you never re-visit the same space in cipher, and use a nonce to randomize help cipher space access, and also rotate keys often, ChaCha20 appears to be both have multiple pure-rust implementations and be easy to work with. Another concern was AEAD, which thankfully this really good-seeming crate offers in pure-rust along with processor optimizations, and an audit (neat!). I'm not a cybersecurity expert so I'm sure I'll eat my hat here but the ChaCha20poly1305 crate has a very straight forward and sane interface and clear documentation, along with good suggestions on best-use.

To take advantage of ChaCha20poly1305 we need to trade secrets in some manner, and bare Diffie-Hellman over unauthenticated UDP isn't good enough! We need some PKI! Do we write or adopt some crazy stuff to resolve domains and certs etc to do all this over UDP? No! My solution is to make application-writers (ie: the "Echo Server") responsible for providing the building blocks necessary to construct a "connection" with the library's APIs, initially a 'pending' stage, with the initial secrets etc necessary. This way all UDP conversations, even initial packets, are already encrypted and we get the added benefit of using using trusted and low-maintenance pathways for trading these initial secrets (HTTPS). There are other benefits here, a lot of authorization/login mechanisms are marshalled over HTTP already, such as OAuth or maybe something like JWTs could be used in the future very conveniently.

To achieve this with our simple Echo Server, however, we're going to just make a simple REST endpoint that in a production environment would run over TCP+HTTPS+TLS but locally runs unencrypted with just plain HTTP. I'll spare you some axum boiler plate:

async fn join(
    State(state): State<Arc<UnboundedSender<IngressCmd>>>,
) -> (StatusCode, Json<HandshakeExchange>) {
    let ingress_tx = state.clone();

    let new_slot_id: u64 = rand::thread_rng().gen();
    let (key, b64key) = networking::generate_key();
    let challenge = networking::generate_challenge();

    if let Err(_) = ingress_tx.send(IngressCmd::BuildConnection(
        key,
        new_slot_id,
        challenge.clone(),
    )) {
        panic!("Not handling errors on send yet.")
    };

    // .. snip
    // the rest just makes sure the connection gets book-kept
    // appropriately and is in a "pending" state
    // then the server returns a 200 OK with a json body containing
    // the slot_id, chacha20 secret, and a 1024 byte challenge

Our little Echo Server stands up a tiny axum REST endpoint with a POST /join endpoint that returns a JSON payload like this:

{
	"slot_id": 556751621135530415,
	"key": "gu95p66voXmkinf75TwhhgrHAXzKHQdHG8EtqEvYU3I",
	"join_challenge": "QQoMqxBDi460wRRb[..snip]"
}

The client and server (via the library API) will both ultimately build the same ChaCha20 ciphers with the base64 decoded "key", and then the client can send the specialized 1-time "join" payload, which is encrypted and only contains the byte contents of the "join_challenge" here.

The client must then immediately try to send the "join" payload over UDP, which is basically:

// loosely illustrating the packet structure 
[slot_id: u64] // unencrypted
[chacha20encrypt(join_challenge): [u8]] // encrypted/ciphertext

"slot_id" is not encrypted, and is only usable for 5 seconds, it's job is to help the server resolve which pending connection should be able to decrypt the challenge that was traded over HTTPS earlier.

If both sides have the correct set of secrets, challenges, slot_ids etc, then the server will promote the connection from a "pending" collection to an "active" collection and write down the "from" address we received the valid join message from, and from then on that connection is bound to that IP:PORT within the server_traffic crate's constructs.

Spinning up UDP Ingress

We're necessarily jumping around a bit, but at this point we have a little HTTP REST /join endpoint that'll create pending connections, and now we need to actually receive UDP traffic. At the application level we do this via server_traffic's start_udp_ingress function, whose signature is such:

pub async fn start_udp_ingress(
    bind_address: &str,
    on_conn_promote: impl Fn(SocketAddr, Arc<RwLock<GameConnection>>) -> Result<(), ()>
        + Send
        + Sync
        + 'static,
) -> Result<UnboundedSender<IngressCmd>, Box<dyn std::error::Error>> {

This API is asking for which IP+Port to start listening to on this machine, and what function should it should call to announce new connections. This isn't dissimilar to basically every TCP listener socket's behavior in any networking library you've used: some lower layer work is happening to consider the connection "ready" for the application begin interacting with it. Under the hood the OS+kernel have done work to configure the TCP state machine before even telling you about it. This library has to perform similar hurdles, a connection gets created as a result of the POST /join handshake on the HTTP server in this implementation, and isn't "promoted" to active until a client connection sends a valid join payload.

#[tokio::main]
async fn main() {
    let (state_tx, state_rx) = unbounded_channel::<GlobalCmd>();

    let state_tx_clone = state_tx.clone(); // immediately moved into following closure
    let command_tx = match server_traffic::start_udp_ingress(
        "0.0.0.0:27015",
        move |addr: SocketAddr, conn: Arc<RwLock<GameConnection>>| {
            // this closure is basically our "on connection is fully promoted to active"
            // callback.
            //
            // it's main goal is to return a closure that will handle whatever side effects
            // the application writer needs for any bytes arriving off of the default and
            // required "system stream".
            //
            // nothing stops you from triggering additional side effects when a new connection
            // is formed, in our case here we're informing our "global state" construct of
            // a new player (which is a prescribed assumption from a fully promoted connection!)
            state_tx_clone
                .clone()
                .send(GlobalCmd::NewPlayer(addr, conn.clone()))
                .map_err(|_| ())?;

            Ok(())
        },
    )
    .await
    {
        Ok(channel_tx) => channel_tx,
        Err(e) => {
            println!("Couldn't start up udp ingress, error: {:?}", e);
            exit(1)
        }
    };

We can see we're passing a closure in as the value for the on_conn_promote parameter in start_udp_ingress(..), this allows the application to take whatever arbitrary actions it wants once a connection becomes "active" and can send/receive application traffic. In our case we have a "global state" construct in our Echo Server application which is a little tokio task that sends/receives token commands via simple mpsc unbounded channels. This is a lot like an actor-y framework (Akka, Erlang, etc) and I think is an attractive pattern when working with rust async and tokio.

Configuring the "system stream"

An early architecture choice I made with this networking library was to enforce a "default" stream, a "system stream", which must always exist and be configured for initial traffic to start flowing. If applications want to configure additional streams they can either do it by hard-coded convention or by configuration over commands send over the "system stream".

In the Echo Server's "global state" construct, which is really a tokio task that is just an event pump, things are wired up such that promoted connections emit a "New Player" token, and we build a simple struct to represent that "player" and tuck them away in a hashmap on the task stack. Shown here:

            GlobalCmd::NewPlayer(addr, conn) => {
                // we have a new player!
                global_id_counter += 1;

                {
                    let mut conn = conn.write().await;
                    let tx = tx.clone();
                    conn.configure_system_stream(Box::new(move |b| {
                        // all byte payloads arriving in a connection's default and required "system stream"
                        // will get routed back to this same event pump via the [SystemStreamEvent]
                        // envelope.
                        tx.send(GlobalCmd::SystemStreamEvent(global_id_counter, b))
                            .map_err(|_x| ())?;
                        Ok(())
                    }))
                }

                let new_player = ActivePlayer {
                    game_conn: conn,
                    remote_addr: addr,
                    global_id: global_id_counter,
                };

                println!(
                    "New player added! id: {:?} from: {:?}",
                    &global_id_counter, &addr
                );
                active_players.insert(global_id_counter, new_player);
            }

What we see here is important to the chapter title: the promotion of the connection to active means that the "global state" pump will almost immediately configure the system stream to send bytes back into the pump wrapped in a SystemStreamEvent, which is handled as such:

            GlobalCmd::SystemStreamEvent(from_id, bytes) => {
                let player = &active_players[&from_id];
                let string_bytes = String::from_utf8_lossy(&bytes);
                println!(
                    "Received system event from {:?}({:?}): {:?}",
                    player.global_id, player.remote_addr, string_bytes
                );

                {
                    // echo it back
                    player
                        .game_conn
                        .write()
                        .await
                        .send(networking::game_connection::StreamSelect::System, bytes)
                        .await;
                }
            }

As you can see we log about our received payload and then send it right back to the same "player" along the system stream.

Server:

UDP Socket listening on 0.0.0.0:27015
Starting ingress...
Built new connection!
Query incoming
Promoting connection from pending to active: 127.0.0.1:54509
New player added! id: 1 from: 127.0.0.1:54509
Received system event from 1(127.0.0.1:54509): "hello"

Client:

Attempting connection to: http://127.0.0.1:8080/join
Connection was built!
Received bytes: 'hello'

Wrapping this up

We covered an outline of this library's objectives, and a very narrow sample of it working, and some small details of how we accomplished some of the more basic objectives (e2e encryption having the largest impact on the overall architecture). There's a lot I could write about how the client is assembled with very different concerns using the same networking and protocol crates, but this already is way longer than I intended to write.

Feel free to drop a note or pick apart any problems you see, I'm just doing this for fun and I'm also not an expert rust programmer by any means.

Resources

Some good resources on general networking topics:


You must log in to comment.

in reply to @mido's post:

As long as you never re-visit the same space in cipher, and use a nonce to randomize help cipher space access, and also rotate keys often,

the only risk associated with unrotated keys in chacha20 is nonce reuse! if your protocol is safe against nonce reuse among all parties, you never need to rotate an already-established key. you can do this by any method, including sequential ranges assigned to each party, fully random nonces in 128 bits (good for a lot of "messages" which can each be very very large, such as a whole stream of up to 64<<64 bytes if you already know the order) or 192 bits with xchacha20, which is enough to be safe from collisions forever in practice.

you should generally not design in key rotations during a session unless there is a really specific need to do so, as the surface risk from the added complexity strongly outweighs any benefit of added secrecy.