• he/him

programming, video games, dadding. I happen to work for Xbox, but don't represent them.


more blog
dev.to/knutaf
discord
knutaf

akhcade
@akhcade

apparently <stdatomic.h> is just a header file and can't hurt you :3
(assuming the CPU you're compiling for supports atomicity)

ok extremely underrated feature of C imo, the fact that some standard header files work even with the -nostdlib flag, like YESSSSSsss

(why is -nostdlib important to me? read this short post!!)

click here if you're a NERD
#include <stdatomic.h>

#define mutex_t volatile atomic_flag
#define mutex_init(m) atomic_flag_clear(m)
#define mutex_get(m) while (atomic_flag_test_and_set(m))
#define mutex_free(m) atomic_flag_clear(m)

mutex_t m;

void _start()
{
  mutex_init(&m);
  volatile int exampleData = 0;
  mutex_get(&m);
  exampleData = 5;
  mutex_free(&m);
}

^ this compiles with -nostdlib !!!!!!!!!!!! \o/

(x86 has the atomic xchg instruction which makes this work, which btw atomically writes a value and returns the previous value - the trick is to loop until the previous value was 0, which means you just changed it to a 1 so now it's your turn, setting it back to 0 once you're done)

("but what about ARM cpus" im pretty sure every cpu should have a version of the xchg instruction ok i explicitly checked ARM and it does have an equivalent set of instructions that guarantees atomicity for modern ARM (and i think armv8.1 added a single instruction for it), basically any CPU that can do multi-threading is expected to have a lock sync mechanism like this ^.^ should be okay!)

BONUS: an earlier version that compiles the same (at least on x86) but more clearly shows what it's doing
#include <stdatomic.h>

#define mutex_t volatile atomic_uint_fast8_t
#define mutex_init(m) *(m) = 0
#define mutex_get(m) while (atomic_exchange((m), 1))
#define mutex_free(m) *(m) = 0

// ... (same as before)

it's literally an atomic byte that gets set to 0, and that atomic_exchange function returns the previous value of *(m) while setting it to 1


PolyWolf
@PolyWolf

i love talking about mutexes so now i'm going to make that everyone else's problem


ok so. indeed, that code is a relatively straightforward implementation of mutexes. i could digress into stuff about volatile and memory orderings, but instead, there is something more important we need to talk about before considering if this mutex is "good":

What Is A Mutex?

By its name, a mutex is something that provides Mutual Exclusion of a shared resource for concurrent tasks. In plain language:

Only one task can access the thing at a time

Programmers call it "acquiring" or "locking" the mutex when you attempt to access that shared resource, and "releasing" or "unlocking" the mutex when you had ownership of the resource and are giving it away for others to use.

Besides mutual exclusion, there's another property (not in the name) that's required for something to be a proper mutex:

Progress

No matter how many tasks are trying to acquire the mutex, if the resource isn't owned, someone will be able to get it

It's hard to mess this one up; in fact I'd say most buggy mutex implementations skew the other way by not correctly providing mutual exclusion. To give a concrete example of how progress could be violated, consider the following implementation: (for some reason i couldn't put it inside a <details>, sorry)

#include <stdatomic.h>

typedef struct {
  queue_t waiters;
  atomic_int owner;
} mutex_t;

static const int EMPTY = 0;

void lock(mutex_t *m, int pid) {
  if (!atomic_compare_exchange_strong(&m->owner, &EMPTY, pid)) {
    push_back(&m->q, pid);
    deschedule(); // Pauses execution until reschedule(pid) called
  }
  m->owner = pid;
}

void unlock(mutex_t *mutex) {
  int pid = pop_front(&m->q);
  if (pid != EMPTY) {
    reschedule(pid);
  } else {
    atomic_store(&m->owner, EMPTY);
  }
}
syntax highlighting by codehost
Can you spot an error with this? I'll hide the intended solution in case you want to puzzle it out.

Between the atomic_compare_exchange_strong line and the push_back line, the current owner could release the mutex without noticing that we were trying to put ourselves on the queue. This violates progress because a single task trying to acquire the lock was not able to.

There are certainly many other bugs, such as being able to race on m->q if it is not atomic. That is the thing about bugs, more pop up than u intend :)

Progress and mutual exclusion are basically required for a correct mutex, but this last one is just nice to have:

Bounded Waiting

If a task tries to acquire the mutex, it will eventually be the owner of the resource

This can be strengthened further by putting bounds on how long it will take (a function of the number of competing tasks, proportional to), or weakened to deal with probabilistic algorithms. In any case, it's common for implementations attempting to go for bounded waiting do so at the cost of progress or mutual exclusion (by accident, as was the case in the example). So it is refreshing to see a simple spinlock that won't even try to deal with it.

However! Bounded waiting is still a very important property, and any non-specialized implementation should have it. Consider the following code:

void mainloop(void) {
  while (!should_stop) {
    lock(&m);
    process(&shared_resource);
    unlock(&m);
  }
}
syntax highlighting by codehost

To be clear, this is is a terrible, effectively single-threaded design. Still, it's not all that uncommon in real designs for threads to relatively quickly acquire and release the same mutex over and over, which can starve other threads if there isn't bounded waiting on the mutex.

Can We Achieve Bounded Waiting With Only <stdatomic.h>?

My claim is yes! Here is an example of such a mutex, known as a ticket lock:

#include <stdatomic.h>

typedef struct {
  atomic_int next_ticket;
  atomic_int now_serving;
} mutex_t;

void lock(mutex_t *m) {
  // Get our ticket
  int ticket = atomic_fetch_add(&m->next_ticket, 1, memory_order_relaxed);
  // Spin until we are being served
  while (ticket != m->now_serving) {}
  atomic_thread_fence(memory_order_acquire);
}

void unlock(mutex_t *m) {
  // Let the next ticket be served
  atomic_fetch_add(&m->now_serving, 1, memory_order_release);
}
syntax highlighting by codehost

How it works is fairly simple, and pretty much entirely explained in the comments already. It's inspired by how you'd wait in line at a deli or pharmacy.

Some great things about this lock:

  • just one atomic operation to lock and unlock
  • small (8 bytes)

Some not-great things about this lock:

  • spin waiting :(
  • technically, ABA, but with 32-bit tickets this shouldn't be an issue in practice
  • doesn't detect misuse

Most of these deficits could be solved by better design, but i am too lazy rn :P

Now, you may have still some questions like "wait why didn't you use volatile?" and "what's with that memory order stuff?", but this post is a little long as it stands, so I figure I'll leave the discussion about How Your Compiler And CPU Lie To You for a future post. Or just read Tower Of Weakenings because it answers or has links to answers for those questions and more.

And by the way, it's entirely possible I missed something and this code has a bug I am unaware of, so please do ask questions if you think something is up! edit: I did get it wrong in multiple ways the first time. After talking with a few friends I am now more confident it is correct :) will explain in the later post edit 2: I got it wrong again! Definitely making a post because explaining how I got things wrong is quite a journey.


You must log in to comment.

in reply to @PolyWolf's post:

!!! first time i've accidentally nerd-sniped someone!!

i haven't thought about mutex implementations nearly as much as you have, as literally my first attempt was in that post of mine, but for a simple short-lived mutex lock, i figured my implementation didn't need any queue system for fairness (in my project, the lock is only meant to be used for fetching the next work instruction supplied by the main thread, the actual processing is done out-of-lock)

but at the same time i never thought how important a pid queue system would be for long-lived mutex locks!! so thank you for the insight!!! the one thing i don't get is how volatile is optional for optimizers that tend to stop checking variables in spin-loops if they "aren't modified in the loop", but i'm not super aware of how optimizers work so idk, i just do it out of caution

anyway, thank you for your informative post!!!! \o/

Yup that is exactly what you should be using mutexes for, short-lived critical sections :)

I double-checked the while loop just to be sure, and that did lead me to find a bug in my original implementation.

As you recall, in the original implementation, I had now_serving be a non-atomic variable. This meant that the while loop and the non-atomic update had a data race, where one was reading at the same time that another was modifying.

In my assembly-centric view, this was fine, since either result of the race was an acceptable outcome (write before read -> see immediately, read before write -> see on next iteration). However, according to the C standard, section 5.1.2.4, paragraph 35, "any such data race results in undefined behavior".

So indeed, the optimizer would see the two options of "variable is mutated from another tread => data race UB" and "variable is never mutated => infinite loop UB", and proceed to blow our feet off.

The fix was to make the write atomic, since now that is no longer a data race according to the standard, and we get the behavior we want.

Do note that all of the above had nothing to do with volatile. volatile is purely for more forcing the compiler to produce code that behaves like the C abstract machine with respect to reads and writes, and does not handle UB. I'll maybe tackle this a little in my next post (if i make one lol)

I don't think about synchronization primitives all that much, so this was fun. Also I love "spot the bugs". I didn't catch the one you intended because I got distracted by two other possibilities:

  1. I didn't see it mentioned, but queue_t needs to be able to do an atomic push_back and pop_front.
  2. Is it possible for an interrupt to occur between deschedule returning and the assignment to m->owner? And then someone else could acquire the mutex and then we'd stomp on the owner.

Glad you had fun thinking about it! Here's my assessment:

  1. Breaks both progress and mutual exclusion, even disregarding UB. While we could use an atomic queue, the only one I'm aware of requires allocation. and ideally you aren't allocating memory inside your mutex.
  2. Not a bug. In the case that a thread is woken from deschedule(), m->owner is non-empty the entire time. There may still be something there since the write to m->owner in lock() isn't atomic, but I think atomic_compare_exchange_strong ensures we never get into a state where "one thread sees EMPTY and clobbers m->owner but another was just woken up"
    Also if it's possible to enable/disable interrupts, that's basically a mutex already (minus memory ordering stuff), albeit not a great one.

I should probably be more clear that the bug I highlighted isn't the only bug, will edit that.

Thanks for reading btw!