jckarter

everyone already knows i'm a dog

the swift programming language is my fault to some degree. mostly here to see dogs, shitpost, fix old computers, and/or talk about math and weird computer programming things. for effortposts check the #longpost pinned tag. asks are open.


email
mailto:joe@duriansoftware.com
discord
jckarter

eniko
@eniko

There's a lot of crap out there about memory allocators. Try searching for explanations about slab allocators and there'll be a lot of prattling on about OS resources, which is something slab allocators can be used for, but it's not what they are. So I'm going to quickly cover the basics of several simple memory allocators!

Bump allocator

The bump allocator is the fastest allocator there is. All it does is give you a pointer to where it currently is, then advances the pointer by the requested size. Super simple. It can't free memory however. Bump allocators are useful because you can give them a fixed size chunk to allocate from and then reset them at some point where you know you're done with all the resources. For example, in gamedev you can allocate stuff into it that will only last until the end of the frame, then reset it. Very fast, very efficient, not great for general purpose.

Buddy allocator

A buddy allocator is assigned some power of two memory region and then creates a binary tree to allocate the smallest power of two size memory that will fit, often a minimum size of 4kb. So if you have a completely clear buddy allocator of 256k and you request 16k, it'll split into two chunks of 128k, then split one of those into two of 64k, then 32k, then 16k, and allocate one. When freeing you check if the buddy of the freed chunk is free, if it is you merge the chunks back up into the larger size.

Buddy allocators are relatively simple and have the advantage that you can store a bitfield outside of the memory it manages of which chunks at all levels are free/allocated so that the actual memory inside the allocator remains contiguous.

Slab allocator

Don't believe anything you read on Google about OS resources and other bullshit, a slab allocator is literally just a fucking array with a singly linked list pointing at the next free item. That is, slab allocators are meant to give you single elements of one size, which often means they're used for one specific type of item.

They're very fast and efficient and unlike a bump allocator can free items, but do need to maintain a linked list and are limited by a single size of allocation for a single item. You can fancy these up by creating an additional linked list that points at multiple buffers rather than using a single buffer, so you can dynamically grow and shrink the amount of memory the allocator is using.

Freelist allocator

By far the most complex, a freelist takes a region of memory and divides it up by chunks that fit the requested allocations, keeping a linked list of free chunks (hence the name) so that it only has to search that list to find a chunk that fits when allocating. When previously allocated chunks are freed they're merged with any free chunks that neighbour them.

The tricky bit is that since you're building a memory allocated you generally don't have a linked list implementation just sitting around (cause that requires a memory allocator, wahey!) so you have to implement the list inside the memory region that holds the allocations. This can be done many ways, but the easiest is for chunks to add a small section of memory before the memory intended to be used by the calling code, and storing whether the chunk is free or allocated, its size (including the header) so we can reach the next chunk, and:

  • if an allocated block, the size of the previous block, so it's easier to merge with a preceding block of free memory when being freed
  • if a free block, where the previous/next free blocks in the linked list can be found

You can get fancy about this. Some freelists use a singly linked list that's kept sorted (so previous is always behind and next is always ahead in memory), and some use a single bit for allocated chunks to store whether the previous chunk is free or not, then store a footer at the end of free chunks which also contains the size, to save on memory wasted. But that's all garnish, the basics aren't that bad.

Conclusion

These are the basics of several memory allocator types. General purpose memory allocators will tend to be a mixture or composition of multiple types of allocators. Feel free to ask questions and share if you found this helpful, cause someone else may too.

Additional reading:

Edit: oh and read the comments, people are saying helpful stuff in there!


You must log in to comment.

in reply to @eniko's post:

Yep. I've usually used the term "pool allocator" for what's here called a slab allocator. And a former coworker of mine used to call it a "slot heap"—though I've never heard the latter term from anyone else!

If there's a distinction (since as Tom said elsewhere terms are all over the place) then pool seems to be used for one large chuck for all elements and "slab" for multiple chunks. Personally I think about both as being pools though.

arenas are a related but slightly different concept; the given example of a use case for a bump allocator is a type of arena. an arena is just a bigger chunk of memory you part out bits of to memory related to your current task, then when you're done you throw away the whole arena in a very small number of very fast operations, rather than having to do work to deallocate each of the things you were using in there.

this means that anything inside the arena should never be responsible for managing memory inside it; if you have string references, that data had better also be inside the same arena because there won't be any teardown.

fortunately most of the time stacks are stacks, usually the callstack (windows even has alloca, which can do unwindable dynamically-sized allocation on the callstack

all very true though. in a field where most people are only the invocation of a few mystic incantations away from making their ideas real, repeated rediscovery and chaotic naming are the game

Oh, something I noticed - you say slabs "do need to maintain a linked list" but the trick is that the list lives inside the free items. So the linked list doesn't take any extra space - because it only needs to join up free items. And when you do an allocation you always hand out the first element in the linked list. And when an item is freed it's always added to the start of the list. So you never need to actually traverse the list, and it takes up no extra space. I love them, I use them everywhere!

This is one of the things that for some reason my mind cannot understand. This are my two problems with C.

  • I don't understand how to use allocators, when where. I dunno it confuses me.
  • I'm afraid of chats and strings and I don't get them at all.

I need something on my head to make all of this click, but I have no idea how. I guess actually try to do something and force myself to do it even if I don't get it at first.

Or accept that I might just want to stick to garbage collected stuff and remain there :P.

Honestly even as someone who enjoys C, I think manual memory management is terrible. I think the best way to enjoy C is to, if at all possible, avoid it entirely. That means that when you do use it its for very specific things that can't be done any other way, and since that'll be a more limited scenario you're less likely to free memory when you're done with it.

One thing you could try to get better at C is, paradoxically, to use qbasic. Qbasic was memory managed but really the only thing that's actually garbage collected is strings. Everything else can only exist on the stack. And once you know how to code in QB you know how to code without allocating memory, which is transferable to C. And you'll also know the times where in QB you'd be like "god if only I could manually allocate some memory and then free it later", and then you'll know when to do it in C.

Most of that stuff is seriously last-5% stuff. Important if you're doing OSes, but for an application you can just ignore it. The above stuff is all you need (and honestly the Freelist-style allocators are also overkill - pow2 is all you really need)

Yeah I guess I should've made the point clearer: it's interesting to folks who need to or are curious about how to squeeze out those last-5% cases.

Would not advise seriously mucking with these unless you're running a bunch of tests under reasonably-real-ish loads (i.e. bots).

[Or you are playing benchmark games because you've somehow exhausted the Zachtronics catalogue 😆]

I think nobody who cares about security maintains the free list inside the allocated region anymore: it makes use-after-frees and buffer overflows turn into widespread memory corruption much too easily. Instead, a security conscious freelist allocator is going to have a second region of memory to maintain the freelist (a slab allocator to manage this sounds reasonable but I don’t actually know what real ones are using).

When I wrote a toy allocator I definitely did the intrusive freelist and also intrusive metadata (allocation sizes placed right before the allocation) and I know that kind of metadata is also usually maintained externally for production allocators now.

Oh this reminds me of another reason why non-intrusive allocators are good: if you can afford to zero the memory on free, you can compress that memory better which pays for the cost of the zeroing by reducing system memory pressure.

For my toy kernel I implemented a bump allocator that allows freeing memory.

In front of each chunk there is a header with the length and flags. The allocator has a pointer to a valid chunk header that can be allocated. When a piece of memory is allocated, the allocator checks if the current chunk is either uninitialized, or has a length that fits the requested size. If the chunk is uninitialized, it gets initialized, the allocator pointer gets incremented, and the reference to the chunk is returned.

If the allocator gets to the end of the memory, it simply wraps around, pointing at the very first allocated chunk. When a piece of memory gets deallocated, it marks the chunk as free to allocate. This has the added benefit that I can check if the memory is freed or not, which works as long as the bump allocator hasn't gone full-circle.

When the allocator tries to allocate a piece of memory that is larger than the current chunk, it checks if it can merge the chunk with the next chunk. If it can, it clears the second header and increases the length of the first header. It does this until the requested allocation fits.

The worst case performance is if the allocator can't allocate on the current chunk, and has to iterate (possibly the entire memory) to find a new chunk. If you're running on a system with a high memory usage, this means very bad performance.