I know I was supposed to talk about memory allocators but I'm feeling self indulgent so instead I'm going to talk about the C strings API that I created for my fantasy console that runs on a Motorola 68000 CPU with Mega68k as a working title.
Because, you see, C strings suck. In fact C strings don't even exist. They're just like, a gentleman's agreement to think of things which aren't strings as strings. And that sucks. They're also unsafe af (any time you hear about a buffer over or underrun security issue it's basically always C strings that are the culprit) but that's not even my concern. They just feel bad to work with. C strings have bad vibes.
Part of that is that C has no automatic memory management. Strings can't refer to anything else so they're actually perfect candidates for reference counting, but C doesn't have anything like C++'s shared_ptr<T>... or does it? GCC and Clang actually have the cleanup attribute. Used like so:
void destroy_int(int* value) {
printf("destroying int %d\n", *value);
}
void foo(void) {
printf("before scope\n");
{
int __attribute__ ((cleanup(destroy_int))) i = 12;
printf("i = %d\n", i);
}
printf("after scope\n");
}
// output:
// before scope
// i = 12
// destroying int 12
// after scope
This means we can "clean up" a named variable once it goes out of scope, which brings us 80% of the way to automatic memory management. And that, my good reader, is where the fuckery starts in earnest.
Because you might think "well you can just use this to implement reference counting and that's super easy" and you'd be mostly right. But if you assign to a variable with a cleanup attribute, the cleanup routine doesn't trigger, so that's dangerous and gives us our first action item.
- Assignment to a local cleanable variable shouldn't be allowed
Of course, you can also copy a cleanable variable somewhere else longer lived than it is. You could assign it to a global variable, which will either wind up with a dangling reference to a destroyed object, or, because we can't apply the cleanup attribute to global variables, cause it to never be deallocated.
- Assignment to global variables shouldn't be allowed
There's also the problem of structs. If I stick the variable into a struct, well, that's like a combination of 1 and 2. You'd have to track the struct for cleanup too, and this is infectious (like IDisposable in C#) and would need to apply to each struct that contains a struct that contains a struct that contains a variable that needs to be cleaned up.
- Cleanable variables can't be stored in structs
Passing them as arguments to functions could be fine (as long as they can't escape off the stack anyhow) but returning them is also terrible so...
- Cleanable variables can't be returned from functions
Now you could just do the generally accepted C "thing" and say that these are footguns and you should simply never accidentally discharge the footguns and you'll be fine. I don't think that's acceptable. So how do you create a local variable that conforms to these rules? You can't. You can't define a type in C that only lives in the local scope.
OR CAN YOU?
There is actually a way to do a noop statement in C. (void)0; basically compiles to nothing. You know what a cool property of statements is? It's that they're only allowed inside functions! Not in the global scope, not inside structs! So we can do something like this:
typedef struct _sharedint_t {
int value;
} _sharedint_t;
#define sharedint_t(name) (void)0; \
_sharedint_t __attribute__ ((cleanup(destroy_int))) name;
By adding in a poison pill, the noop statement, we guarantee this can only be used inside the local scope of a function. You can't use this macro as an argument or a return value either, and it can't be put inside a struct. You could put _sharedint_t in those but that's a "mistake" that requires a conscious effort to make.
Unfortunately you can still assign to this variable. That's bad. It's hard to disallow assignment to a variable, but with enough macro fuckery, you can totally do it by just boxing it into an array of length 1. So the macro changes like this:
#define sharedint_t(name) (void)0; \
_sharedint_t name[] = {NULL}; \
_sharedint_t* __attribute__ ((cleanup(destroy_int))) temp = name;
Well that won't do. This way you can only have one temporary variable, so we need more macro fuckery.
// this just concatenates two things in a macro without spaces
#define CONCAT(a,b) a ## b
#define SHAREDINT(name, tmp) (void)0; \
_sharedint_t name[] = {NULL}; \
_sharedint_t* __attribute__ ((cleanup(destroy_int))) tmp = name;
// create a temporary name by adding a unique number to the end
#define sharedint_t(name) SHAREDINT(name, CONCAT(name, __COUNTER__))
Now we can't assign to our variable using =, so that solves all our problems! Of course our cleanup routine now needs to take a _sharedint_t** since it's a pointer to an array, which is a pointer. And of course if you add refcounting then you need all instances to be pointers to a single shared struct, so you actually get a type*** in the cleanup function, and using the variable by name results in a type**. In my string implementation I've also added a block comment in there in case someone uses the macro in the wrong place, so my code looks like this:
#define _TMPSTR_IMPL(name, tmp) ((void)/* Invalid scope for temporary strings!! */0); \
_string_t* name[] = {NULL}; \
_string_t** __attribute__ ((cleanup(_cleanup_string))) CONCAT(_string_, tmp) = name;
#define string(name) _TMPSTR_IMPL(name, CONCAT(name, __COUNTER__))
// string destructor
void _cleanup_string(_string_t*** _s) {
_string_t* s = **_s;
if (s == NULL) {
return;
}
s->refcount--;
if (s->refcount <= 0) {
strfree(s);
**_s = NULL;
}
}
Then the actual API works by using various methods instead of operators, always specifying a destination if the result is a string, so that it can be stored into a refcounted local variable. The sset function is actually the only place that needs to call _cleanup_string and decrease the reference count.
The data for each string is stored in a very simple struct that gets allocated on the heap. Well, actually, I want Mega68k to be able to run without using the heap (no malloc/free) since that seems very important to some people, so strings actually have a 4096 byte memory storage that they'll be allocated onto first, and if that runs out they get allocated onto the heap. The string struct looks like this:
typedef struct _string_t {
size_t length;
int refcount;
char data[];
} _string_t;
The only remaining problem then is that _string_t is by definition short-lived. It can't be stored anywhere, it can't even be passed to functions as arguments (that one's kind of a shame but I can live with it, and I could make an API for that later) so once a string goes off the stack it's gone forever. To that end the ssave function exist, which copies a string into a C array of characters masquerading as a string. So you do your string transformation work using the string API and then once you have your result you ssave it into a char* for storage.
Also, because working with strings without being able to use string literals easily I abuse the hell out of the _Generic macro feature to coalesce differently typed string functions into a single macro like this:
#define sset(s, value) _Generic((value), \
_string_t**: _sset, \
char*: _sset_c, \
const char*: _sset_c, \
const char[sizeof(value)]: _sset_c, \
char[sizeof(value)]: _sset_c)(s, value)
This means that sset(mystr, "Hello world!"); is totally valid code, you don't have to switch between sset and ssetc or something like that cause that sucks.
Anyway, in the end the code to work with strings looks something like this:
string(longer);
{
string(str1);
string(str2);
sset(str1, "Hello "); // str1 = "Hello ";
sset(str2, "world"); // str2 = "world";
sadd(str1, str1, str2); // str1 = str1 + str2;
sset(longer, str1); // longer = str1;
println("%L", slen(str1));
string(sub);
// sub = substr(str1, 3, 5);
ssub(sub, str1, 3, 5);
println("%s", sdata(sub));
println("end of scope");
}
println("after scope");
for (int i = 0; i < 3; i++) {
sadd(longer, longer, ".");
}
// sdata just returns (*s)->data
println("%s", sdata(longer));
string(boolean);
// generic tostr that works for numbers and bools
tostr(boolean, sequal(longer, "Hello world...")); println("%s", sdata(boolean));
tostr(boolean, true); println("%s", sdata(boolean));
tostr(boolean, false); println("%s", sdata(boolean));
string(num);
tostr(num, 655359999); println("%s", sdata(num));
tostr(num, -655359999); println("%s", sdata(num));
tohex(num, 0xDEADBEEF, true, true); println("%s", sdata(num));
println("Splitting...");
string(token);
splitpos_t index = 0;
while (index >= 0) {
index = ssplit(token, longer, "od", index);
println("%s", sdata(token));
}
// you can pass in an existing string or NULL to malloc a new one
char* cstring = ssave(NULL, 0, longer);
println("saved string: %s", cstring);
free(cstring);
// char* version of tostr
char* cnumstr = tocstr(NULL, 0, 655359999);
println("cnumstr: %s", cnumstr);
free(cnumstr);
// num = tryparse_int("-2147483648");
sset(num, "-2147483648");
int32_t convertednum;
if (tonum(num, false, &convertednum)) {
println("Converted: %l", convertednum);
println("Unsigned: %L", convertednum);
println("Hex: %H", convertednum);
}
else {
println("Failed to convert");
}
You'll have to forgive the fact that my formatting strings are different from printf. There's no way around it given GCC for m68k can be run using int as either 16 or 32 bit size that I could come up with.
Anyway hope you enjoyed my post about strings that don't suck ass in C! If you did like it please consider clicking that share button 🙏

