• any of em are fine

opinions of varying quality. fishcat with five hammers, not afraid to use them. made out of meat, but no nutritional value.


i have a struct with multiple fields, each of which have meaning associated with specific bits. so, it looks something like this:

#[repr(C)]
pub struct Instruction {
    first_item: u8,
    second_item: u8,
}

impl Instruction {
    fn is_cool(&self) -> bool {
        self.first_item & 0x80 != 0
    }

    fn is_nice(&self) -> bool {
        self.second_item & 0xc0 != 0
    }
}

now, i have some logic to run only when an instruction is_cool and is_nice, which would look like

pub fn unfortunate_check(x: &MultipleFields) -> bool {
    x.is_cool() && x.is_nice()
}

why is this unfortunate you ask? rustc will turn this into...

example::unfortunate_check:
        cmp     byte ptr [rdi], 0
        sets    cl
        cmp     byte ptr [rdi + 1], 64
        setae   al
        and     al, cl
        ret

which is to say, it checks is_cool, then is_nice, and ands the results together. but as long as the bytes are sequential you could actually check both conditions at once! looking at the layout of the struct fields and masks...

byte 0       byte 1
first_item   second_item

mask 0       mask 1
0x80         0xc0

!!!!
mask (u16):  0xc080
!!!!

you could mask the whole instruction by 0xc080 in one step and be done with it! check both conditions at once! get two bytes with one test! and in fact:

pub fn check_but_better(x: &MultipleFields) -> bool {
    let lol: &u16 = unsafe { std::mem::transmute(x) };
    *lol & 0xc080 != 0
}

then compiles to

example::check_but_better:
        movzx   eax, word ptr [rdi]
        test    eax, 49280
        setne   al
        ret

two fewer instructions, presumably ever-so-slightly faster.


You must log in to comment.

in reply to @iximeow's post:

looks like llvm only combines/promotes loads during isel

https://rust.godbolt.org/z/Mc5MnTe98 https://llvm.godbolt.org/z/vGM8jr1fK

*** IR Dump After Module Verifier (verify) ***
; Function Attrs: argmemonly mustprogress nofree norecurse nosync nounwind nonlazybind readonly willreturn uwtable
define i16 @_ZN7example3foo17h89a63081efa2868dE(ptr noalias nocapture noundef readonly align 1 dereferenceable(2) %arg) unnamed_addr #0 {
  %i = load i8, ptr %arg, align 1
  %i1 = getelementptr inbounds { i8, i8 }, ptr %arg, i64 0, i32 1
  %i2 = load i8, ptr %i1, align 1
  %i3 = zext i8 %i2 to i16
  %i4 = shl nuw i16 %i3, 8
  %i5 = zext i8 %i to i16
  %i6 = or i16 %i4, %i5
  ret i16 %i6
}
# *** IR Dump After X86 DAG->DAG Instruction Selection (amdgpu-isel) ***:
# Machine code for function _ZN7example3foo17h89a63081efa2868dE: IsSSA, TracksLiveness
Function Live Ins: $rdi in %0

bb.0 (%ir-block.0):
  liveins: $rdi
  %0:gr64 = COPY $rdi
  %1:gr16 = MOV16rm %0:gr64, 1, $noreg, 0, $noreg :: (load (s16) from %ir.arg, align 1)
  $ax = COPY %1:gr16
  RET 0, $ax

# End machine code for function _ZN7example3foo17h89a63081efa2868dE.

You can probably get the same result without unsafe with

let lol = first_item | (second_item << 8);

tho there are a bunch of caveats about lack of struct layout guarantees and the remote possibility of running on a bigendian machine.