out of the thirty-odd variables being generated, only around six (the named ones) are actually user-declared. the rest are auto-generated temporary values, many of which are plain redundant. let's talk about that!
due to the cIR generator's modular code structure, it's not uncommon to see code like:
_0 = undef:Test;
_0.0 = 23:i32;
_0.1 = 35:i32;
_1 = _0:Test;
_2 = _1:Test;
this is mostly due to it having to convert between lvalues and rvalues at various points in the code, generally by storing them as temporaries. this is roughly how the above IR would be generated, for instance:
source code:
foo(Test { a: 1025, b: 322 }.sum());
IR generation steps (i think (it's been a long few months)):
- generate function call to `foo` with one argument
- generate argument zero using generate_expr(), which returns an rvalue
- it's a struct literal, so insert a temporary (_0) to initialize the members on
- after initializing each member, return the value at _0 as an rvalue
- honestly man i don't even remember where _1 is coming from. i'm tired, boss
- an argument may only be an lvalue, so insert a temporary (_2) to store it into
- finalize the function call
there are some utilities in the codebase to mitigate this slightly, but the resulting IR is still considerably more bloated than I'd like. while LLVM is very good at optimizing these temporaries away, mostly mitigating the performance cost they would otherwise introduce, i don't really consider that a solution. this stuff impacts readability and debuggability, as well as increasing the overhead that comes with analyzing and optimizing the IR in the middle-end, which is the main reason it exists in the first place. i just don't see this scaling well.
compiler devs, any advice? i've learned a lot about language development these past few months, but it is still my first major compiler project, so i'd love to know if this is a common problem and if there are any recommended approaches here.
