Mikaela, Lily, Violet, and Ciri — a plural collective of nerdy, quoiromantic, poly, lesbian computer engineers and leftists.

Current media obsessions: Persona 5, RWBY, Cosmere


cr1901
@cr1901

As will become clear, what I like about each may differ very drastically from what is desirable in a computer architecture today :D. This is also a list of "design decisions I respect".

  • 6502
    • Easy to remember instructions.
    • Bus interface is simple, so it's fun to use to make new custom retro computers.
  • 65816
    • Built-in bank switching.
    • Easy to remember instructions.
    • Multiprecision math and stack manipulation is less painful than on 6502.
    • Actually supports virtual memory via an ABORT pin. However, I've only ever seen this used once with a custom CPLD program. I asked for/have the source somewhere...
  • Acorn ARM
    • 26-bit program counter- the remaining 6 bits are reused for the flags register.
    • The 3-stage pipeline doesn't bother propagating the PC register to save space. The EX stage just uses PC + 8 instead for PC-relative calculations.
    • No delay slots.
    • Can generate 4096 possible constants from a single 32-bit instruction using the creative formula:
      • A << (B * 2), where A is 8 bits and B is 4 bits.
  • ARM Thumb
    • Setting the lowest address bit to indicate Thumb instructions is creative.
    • The ISA isn't a compressed wrapper over 32-bit ARM.
  • AArch64
    • AIUI, the ISA doesn't bother with back compat with 32-bit ARM. A clean slate sounds like a good idea. But I'm not all that familiar w/ AArch64 sadly.
  • AT&T Hobbit
    • A stack machine optimized for running the C programming language. That sounds pretty cool in and of itself.
  • DEC Alpha
    • Even the original in-order version (Alpha 21064) has a very weak memory model.
    • The original versions also could only load/store 64-bits at a time. I respect the hell out of this, even if it didn't work out (C++11 atomics essentially mandate byte load/stores).
  • Hitachi SuperH
    • 32-bit CPU with only 16-bit instructions :D! Only the constants -128 to 127 can be loaded in a single instruction. Very bold!
  • Intel 80286
    • Segments are free position-independent code, and source compat with Intel's previous 8-bits :P.
    • Give 16-bit Protected Mode a chance, the poor 286 is trying its best. Actually, fun fact, Real Mode on the 2/386utilizes much of the same circuitry as 16-bit/32-bit (?) Protected Mode.
  • Intel 8051
    • No matter how many times you kill it, it won't die :). 8051 will be here after we're long gone.
  • Intel iAPX 432
    • Like nothing you've ever seen before or since. Okay, this is a cop out. At some point, I'll give it the time and attention it deserves :).
  • Intel Itanium
    • Okay, so in truth, I don't remember much about Itanium. It's a form of VLIW arch, right. That's pretty interesting, and DSPs do VLIW just fine. Experimenting is good :D!
    • Most of what I learned about Itanium came from a convo on hellsite a while ago. Unfortunately, I don't remember most of it. However, I do remember thinking: "Wow, Itanium had some neat ideas. It's a damn shame it failed, and the good parts seemingly weren't utilized in the future."
  • Lattice LM32
    • 6-stage pipeline instead of the usual 5 (Fetch is split into Address Calculation and Fetch).
    • Relatively easy-to-follow Verilog.
    • Big-endian in a little-endian world.
  • Microchip PIC10
    • It's adorable :D :D! Only GPIO pins and timer peripheral. 256 words. Registers are RAM (16 or 32 bytes?). Limited stack depth. Functions use static storage for locals (and thus re-entrant functions get duplicated).
    • I'd love to try a "high" level language on it. Limitations breed creativity.
  • Mill
    • This section intentionally left blank. Maybe if it ever comes out, I'll have more to say.
  • MIPS
    • The original version had not only branch delay slots, but also load delay slots. How bold!
  • Motorola 68k
    • The ultimate assembly language. Lots of addressing modes so even your assembly code "gets to the point". A memcpy loop is like 3 instructions (don't ask me to implement one :P).
  • NEC v810 (My favorite RISC. Pity it's not used much outside of Virtual Boy and Turbografx CD.)
    • No delay slots.
    • Defaults to 16-bit instruction width, with 32-bit instructions as needed to avoid the need for constant pool. Fixed width is overrated anyway.
    • A unique set of bit-string instructions that uses r27 to r31` IIRC. You can implement bit search, copy, set, etc. And the state of the instruction is kept between interrupts!
  • National Semiconductor NS3200
    • The first (fully) 32-bit CPU. That's neat in and of itself. There used to be a NetBSD port, which implies that it supported virtual memory (unlike Linux, NetBSD requires an MMU).
    • Seems like a template for many other CPUs to follow and refine (read: CISC => RISC). I really need to play with it at point.
  • Rekursiv
    • A heavily microcoded CPU that allows the instruction set to be changed by the end user by switching out the microcode program. I'm not certain that working hardware still exists.
  • Reduceron
    • A CPU optimized for running Haskell! Unfortunately, I don't understand how it works well (I wish I did). But it's something different, so I embrace it.
  • RISCV
    • The base I and E specs are unapologetically minimal, even foregoing multiplication and possibly config registers. I unironically love this.
    • The base instruction encoding is pretty nifty, and tries to reuse the same bit positions for as many instructions as it can.
    • The L standard (floating point decimal arithmetic) could be very interesting.
  • Sun SPARC
    • Register windows are cool. Although, repeatedly calling and returning from a function which causes all the registers to be spilled/restored seems less so :P.
  • TI MSP430
    • It's a PDP-11 in microcontroller form. What's not to love?
    • Only 4 addressing modes built-in, but creative use of the opcode encoding of read-only registers gets 3 more addressing modes by writing to said registers.
    • Actually, programming in MSP430 assembly would kinda bite without all the emulated instructions :P.
  • Zilog Z80
    • Having a DRAM refresh counter built-in is pretty cool.

That's all I can think of for right now. Comment if you want more and I actually remember more archs I've played with.


You must log in to comment.

in reply to @cr1901's post:

mine's microcoded too! and it's not pipelined, probably not going to meet all of the spec. i may even patch the kernel so i can avoid atomics too with the only exception being lr and sc (but there'll only be one reservation slot, the entire memory space, so if an interrupt or exception happens while a reservation set is active it'll go away and the operation will have to be retried)

EFFICIENCY!

but this is what i get for choosing to implement in chips XD

A Minddump Follows

I should probably make this into a post, sorry :P.

Single Core Atomics

I hope the A-spec allows suitable-for-single-core-only impls. For single-core only, even "LL/SC fails only on exception/interrupt" provides a user-insn-set-level primitive to create critical sections, high-level atomics, mutexes, etc. I even know some RISCV enthusiasts who are upset that LL/SC isn't in the base I-spec under the assumption that "LL/SC fails only on exception/interrupt" is valid.

Interrupt Woes

The below problem isn't unique to a single core impl for A-spec, but I wonder if it's possible to prevent forward progress if an interrupt fires fast enough to always cause the LL/SC to fail? ARM does not appear to fail a LL/SC automatically on interrupt- see Resetting Monitors. Does RISCV auto-fail it? I thought it does, but I'll have to reread the A-spec.

One Solution To The Forward Progress Guarantee- Multi-Core Edition (Yes, I Have This Bookmarked):

One standards-compliant way I'm aware of to guarantee forward progress with respect to another processor is:

  • No more than 16 insns between LL/SC.
  • Insns must be "simple" (no mul/load/store/etc).
  • Insns must be all in same cache line.
  • If cache eviction request happens, ignore it for 16 clock cycles or until a "non-simple insn" is encountered, whatever comes first. Any other CPUs trying to get to that atomic var will be blocked by the cache coherency protocol.

Possible Solution To The Forward Progress Guarantee- Single Core Edition

Perhaps this can be adapted to interrupts (in other words, guaranteeing forward progress of a CPU with respect to itself :P). I don't like the idea of delaying responding to an interrupt for up to 16 instructions if a LL reservation was taken, but I can't think of a better option at 12:42 in the morning :).