• any of em are fine

opinions of varying quality. fishcat with five hammers, not afraid to use them. made out of meat, but no nutritional value.


iximeow
@iximeow

"disassembly" is a procedure by which you can turn an unambiguous sequence of bytes that have one interpretation into a mushy structure that takes extra work to figure out how to execute. i do not like the wrinkle of being able to disassemble 3300 into (xor, eax, [rax]) because that representation also suggests your disassembler could one day say (xor, [rax], [rax]). but that will never happen! so you develop checks and edge cases for things that were literally impossible without this mushy intermediate representation.

i don't know what to do with this and i'm not sure if it's actually better for anyone if the computer could inline the exact logic to handle some decoded instruction and not a word more. maybe the explosion in generated code is worse overall. buh.


iximeow
@iximeow

i've been marinating over @dougall's comment here and continuing to stew over "using a disassembler makes my gbc emulator extremely slow". the real problem i'm dancing around in OP is "uses of a disassembler are usually specialized". print this instruction as text counts as specialized, for example. SO FAR AS I CAN IMAGINE, really generic "i'd love to have an IR describing this instruction" stuff only becomes useful when you're doing code analysis, and even then at the level of a disassembler it's still "specialized" in that it's just "for this instruction, produce that IR. then sometimes over there print an instruction's text as well".

generally, "a disassembler" would need to be split into "decode these bytes" and "do something with the decoded bytes" phases. but because many details about an instruction are found piecemeal as you decode bytes (e.g. you often figure out an opcode far before operands), it looks a lot like callbacks as part of a parser - because it is a parser, i guess...

so then you would have a function to decode bytes, parameterized on something providing a bunch of handlers, and "by default" that might be "put everything into a generic Instruction struct". then for specialized cases ("i'm scanning only for jumps, calls, rets"), you could leave every handler other than on_opcode_determined as a no-op.

if this all works out, it does address a wrinkle i've always been disappointed by w/ yaxpeax-x86: it's a horrible length decoder. rustc isn't smart enough (for good reason) to know that if a user only inspects the length of an instruction, it doesn't need to codegen most of the "save results" part of decoding. but in theory this would leave lots of branch arms entirely empty and ripe for the dead code eliminating.

how horrifying. i guess i'll try it out on yaxpeax-sm83.


iximeow
@iximeow

this is most of the sm83 decoder. writing this against a Handler-Based thingy involved adding a fn on_word_read(&mut self), to increment a length when you've read a new byte for an instruction.. so a neat test for "is this useful" is "can i finally coax a useful length decoder out of a yaxpeax crate"

one

impl DecoderHandler for &mut u8 {
    fn on_decode_start(&mut self) {
        **self = 0;
    }

    fn on_word_read(&mut self, _word: <SM83 as Arch>::Word) {
        **self += 1;
    }
}

later and i can compare the "full disassembler" vs the "length-only disassembler", used like so:

#[inline(never)]
#[no_mangle]
pub fn test_length_decode(decoder: &InstDecoder, data: &[u8], mut length: &mut u8) {
    let mut reader = U8Reader::new(data);
    yaxpeax_sm83::decode_inst(decoder, &mut length, &mut reader).unwrap()
}

#[test]
fn test_length_decode_works() {
    // test_display(&[0xea, 0x34, 0x12], "ld [$1234], a");
    test_length_decode(&InstDecoder::default(), &[0xea, 0x34, 0x12], &mut 0);
}

the test_length_decode is 167 instructions, many of which relate to code that's just a hack for this morning. the full disassembler is (currently) 676 instructions. (interpret_operands doesn't quite melt away but i think i can fix that this evening)

so, sick. not being able to use yaxpeax-x86 as a length decoder has been really irritating for a long time. finally have a way to fix that.


iximeow
@iximeow

here is a length-only decoder for SM83 instructions:

#[inline(never)]
#[no_mangle]
pub fn test_length_decode(decoder: &InstDecoder, data: &[u8], mut length: &mut u8) -> bool {
    let mut reader = U8Reader::new(data);
    yaxpeax_sm83::decode_inst(decoder, &mut length, &mut reader).is_ok()
}

and here is a full Instruction decoder for SM83 instructions:

#[inline(never)]
#[no_mangle]
pub fn test_inst_decode(decoder: &InstDecoder, data: &[u8]) -> Option<Instruction> {
    let mut inst = Instruction::default();
    let mut reader = U8Reader::new(data);
    if yaxpeax_sm83::decode_inst(decoder, &mut inst, &mut reader).is_ok() {
        Some(inst)
    } else {
        None
    }
}

the goodfile for yaxpeax-sm83 now records how many bytes these functions add to an empty binary: ... drumroll ...

length decode size (bytes, upper bound)	 616
inst decode size (bytes, upper bound)	 3352

finally if you were to, say, implement an emulator as an impl DecodeHandler for EmulationEnvironment... it would be nine times faster than decoding a struct Instruction and matching on it. rustc can do a really good job of specializing the decode function now!!!


You must log in to comment.

in reply to @iximeow's post:

"unambiguous" seems a bit optimistic.

but i like the idea... i've had a similar thought that a disassembler could return a mnemonic id and a "shape" structure with just enough information for getter functions to quickly extract the actual operands from the original bytes. no idea if it's a good idea, but it may have potential as a way to confuse (or gain an 'unfair' advantage in) benchmarks

in reply to @iximeow's post:

In the specific case of GB code, since the initial byte always determines the size of the full instruction (when it's well formed), can't you just read a byte, index into an array of how many "extra" bytes to skip, and then loop? It seems basically similar to utf8.

in reply to @iximeow's post:

very cool and as someone who has quite recently been haranguing no one in particular about the state of gb reversing tools, can i just say thank you for documenting the journey from "this is fuzzy and i'm frustrated" to "hey i made a small corner of some stuff wildly better!". feel like i don't see enough of this around RE stuff

i'm actually still not sure i've made anything really better!! the gb cpu is so simple that i'm pretty sure this is still probably worse in all ways than just "put the decoder in the emulator". for example, a length decoder could - apparently - be just 256 bytes (plus a few for instructions to index): https://github.com/nekronos/gbc_rs/blob/37146d6d1ebd8b14390284ac44d3f355d0e4938a/src/gbc/opcode.rs#L37-L46

BUT for more complex architectures (x86) this might be more useful in that you could generate IR for some Dastardly Purposes directly from a decoder. and in those cases it still wouldn't be as perfect as if you wrote the thing by hand. i'll still have to settle for "reasonably close to ideal" here, which is fine when the library is more generic.

eventually i'll start actually doing the rest of the emulator :')

i meant documenting and including your thought process/emotional rollercoaster is v underrated so i had an excited

but also i mean yeaH a simple solution is probably quicker in-emulation like that but idk something speaks to me about building the IR like it's almost picking at the intent of the instruction set. maybe that is only useful for analysis and weird hooking or w/e on other architectures. i've been reading art papers tonight so maybe i'm just on one