In the cargo xtask build for my micro:bit project, I'm running arm-none-eabi-size after the build to keep tabs on my code size. It's pretty fun to implement a feature and then think "well, that's a bit much, isn't it?" and start thinking of ways to make your code run with less code while you still understand why you wrote it.
It can also make the most mundane-seeming optimizations really pop out. Consider the following code I had written, which multiplies a period by either 12.5%, 25%, 50%, or 75% to get the intended duration of a pulse wave:
#[derive(Clone, Copy)]
pub enum PulseDuration {
Twelve,
TwentyFive,
Fifty,
SeventyFive,
}
impl PulseDuration {
pub fn apply(self, period: u16) -> u16 {
match self {
PulseDuration::Twelve => period >> 3,
PulseDuration::TwentyFive => period >> 2,
PulseDuration::Fifty => period >> 1,
PulseDuration::SeventyFive => (period >> 2) + (period >> 1),
}
}
}
The first optimization I made here was recognizing that three of those branches are exactly the same, but with a different right-hand side of the bit shift. I assigned explicit discriminants and used the value of the enum as the right-hand side:
#[derive(Clone, Copy)]
pub enum PulseDuration {
Twelve = 3,
TwentyFive = 2,
Fifty = 1,
SeventyFive = 0,
}
impl PulseDuration {
pub fn apply(self, period: u16) -> u16 {
match self {
PulseDuration::SeventyFive => (period >> 2) + (period >> 1),
_ => period >> self as u8,
}
}
}
This change reduced my code size by 1,172 bytes.
Then, recognizing that (period >> 2) + (period >> 1), the equivalent of "50% of the period plus 25% of the period", is three operations, I considered the alternative, "25% of the period times 3":
match self {
PulseDuration::SeventyFive => (period >> 2) * 3,
_ => period >> self as u8,
}
which somehow reduced my code size by another 80 bytes (presumably this function is now small enough that it’s getting inlined, so this probably is 4x the benefit if I was calling this function once).
I'm still pretty early in the code for this project, so these somewhat subtle optimizations to this single function ended up decreasing my code size by nearly 25%.