Error 404: Nothing about this indie game developer was found, try again later.


I like emulation and indie games :3


From Brazil



icculus
@icculus

So I tweeted this thing and invited people to figure out wtf.

Example: this is easy to translate to C, but can you figure out why they shifted these values by 10 and 6 bits before packing them into %esi?

And people had some guesses, and a few people hit it, which impressed me to no end.

Here's what this is doing:

It's pretty clear that this is trying to pack two variables into a single register, one in the top 16 bits and one in the bottom, hence the AND instructions masking them as such followed by the OR that mushes them together, but what's up with the bitshifts taking nonsense pieces of each value?

Turns out these variables don't hold integers, they hold fixed point values.

Fixed point? Is that like...floating point?

Yes, it is! Sort of.

This game comes from an era where you might have had a desktop machine that didn't have a floating point unit, which is to say that if the game wanted to deal with fractional values, it had three options:

  • Require an FPU (so a Pentium or later, or a 386/486 with optional x87 hardware installed).
  • Fake floating point in software, which is what everything that ever used a "float" variable did before FPUs were available. So a simple addition operation (like x + y) might end up being dozens of instructions.
  • Do their work in fixed point.

This game opted for the third option. Fixed point is pretty easy to explain, so here goes. There are different ways you can do this, but most things went like this:

  • You take a 32-bit integer.
  • The top 16 bits are your whole number.
  • The bottom 16 bits are the fractional portion.
  • The fractional portion just halves on each bit, the same way that integers double on each bit.

So if you wanted a fixed point version of 5.75, it would look like this in bits:

00000000 00000101 11000000 00000000

The top 16 bits, if you look at just those, is the "5", and this works just like binary integers always do:

VALUE:    (etc)    128 64  32  16  8  4  2  1
 BITS: 00000000      0  0   0   0  0  1  0  1

(The first and third bits are set, one plus four is five, tada, we're counting in binary.)

Binary integers are powers of 2, so the next bit doubles the previous one: 1, then 2, then 4, then 8, etc.

The other half of these bits are the fractional part, and here, each bit halves the previous one, starting from the other side of the value:

VALUE: 1/2  1/4  1/8  1/16  1/32  1/64  1/128  1/256       (etc)
 BITS:   1    1    0     0     0     0      0      0       00000000

One half plus one quarter (0.5 + 0.25) is 0.75, so there's your fraction.

More or less, that's it.

It's pretty simple to represent, but it also has a neat little feature to it: when you operate on two fixed point numbers, they just work with integer operations:

So 5.75 + 0.5, when you add those two binary blobs as if they were integers, will give you:

  00000000 00000101 11000000 00000000   // 5.75
+ 00000000 00000000 10000000 00000000   // 0.5
=====================================
  00000000 00000110 01000000 00000000   // 6.25

Since the fractional bits overflow into the whole number, it correctly sums this up, even when you have a result that changes the integer portion.

So you can do fractional math using only integers, by just cheating about what the bits represent!

The benefit of fixed point, in this era where you couldn't count on floating point support in the CPU:

  • It's pretty simple to explain and comprehend.
  • It only needs integer math, which means it works (and works quickly) without FPU support.
  • It's Good Enough precision for a lot of things you want to do, especially in this era of video games.

There are downsides:

  • You probably can't look at this in a debugger without knowledge of the datatype; it'll likely show you "376832" instead of "5.75"
  • Precision is not great; you're limited to 16-bit whole numbers and 16 bits of fractional precision. Part of what's nice about floating point is that the radix point floats, so you can have larger whole numbers at the cost of less precision and vice-versa. Fixed point is, well, fixed, you can't change the basic rules. Feel free to read the really dense Wikipedia page on floating point, for more information, if that's your kink.

Anyhow, for a game that targeted systems that might not have an FPU, had to write parts in assembly language for speed (so it couldn't have a compiler slot in FPU or fallback code trivially at runtime), and didn't need particularly complex values, using fixed point was an easy choice here.

Ok, great, so what's the deal with the weirdo bitshifts already?

In this particular subroutine, all the values they expect don't have large whole number ranges, and while they want a fractional piece, it doesn't have to be super-precise. In fact, this is part of the game's texture mapping code, back when a 32x32 texture was an embarrassment of riches.

So they pull the fixed-point global variables into registers...

mov esi, _htexxstart
shl esi, 10
and esi, 0ffff0000h
mov eax, _htexystart
shr eax, 6
and eax, 0ffffh
or  esi, eax

...and then do the same for the "step" variables (how much it will increment the texture coordinates between samples), which they mush into %edi in the same way.

So let's say you have _htexxstart and _htexystart, and the bits look like this...

AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

...after you shift _htexxstart's value to the left 10 bits...

BBBBBBCC CCCCCCDD DDDDDDXX XXXXXXXX

...and AND it against 0xFFFF0000...

BBBBBBCC CCCCCCDD 00000000 00000000

Now let's shift _htexystart's value to the right six bits...

XXXXXXAA AAAAAABB BBBBBBCC CCCCCCDD

...and AND it against 0x0000FFFF...

00000000 00000000 BBBBBBCC CCCCCCDD

... and then OR our two shifted/ANDed numbers together...

BBBBBBCC CCCCCCDD BBBBBBCC CCCCCCDD

Look at that: the same bit pattern, twice.

What they've done is cut off the top bits from the whole number part (so you can only represent a value between 0 and 63 with the six bits they kept), and the bottom bits from the fractional part (so you get a less precise equivalent of the same fraction!).

But now you have two variables in a single register and never need to go back to memory to recover either of them again during the texture mapping loop.

Here's the core of that loop:

mov al, BYTE PTR [ebp+ebx]  ; 1
shld ebx, esi, 22           ; 2
shld ebx, esi, 6            ; 3
mov al, [eax]               ; 4
and ebx, ecx                ; 5
add esi, edi                ; 6
mov BYTE PTR [edx], al      ; 7

(SHLD is "shift left double", which in this case shifts %ebx:%esi as if it was a single 64-bit blob, storing 32 bits of the result to %ebx, fwiw.)

And what it does:

  1. Get the texture pixel from the texture into register %al. %ebp is the texture's base address, %ebx is the current offset into the texture. Pixels are 8 bits each.
  2. Fill %ebx with 10 bits of garbage, and 22 bits from %esi.
  3. Slide those 22 bits over in %ebx and slide in 6 more bits from %esi. The end result is the bottom 12 bits are now the two whole number portions of our fixed numbers. This clamps out the fractional part and lays out %ebx so you have an offset into the texture data that represents an X,Y coordinate.
  4. Do a palette lookup for the texture pixel. I love this trick: %eax already has a pointer to a 256-byte table that starts on an 256-byte aligned address, so when the first instruction overwrites %al with a pixel value, %eax now points to the exact address in the table for the palette entry it wants.
  5. Make sure %ebx doesn't overflow the texture by ANDing it against the mask (the texture size in bytes minus 1, in this case). This makes the texture repeat if necessary.
  6. Here's the magic. Remember how you can add fixed point values like integers and it'll work out? It works when you have two of them at 16-bit precision, mushed into two halves of the same register, too! %edi has the step values I mentioned, so this one ADD instruction updates two fixed point numbers in one opcode here...being in the same 32-bit register, when the lower bits overflow, it will bias the fractional part of the upper fixed-point number. I'm not sure if this just turned out to be no big deal, since we're talking about fairly low fidelity graphics here, or if this actually helps with texture mapping at various angles.
  7. Write the final pixel value to the framebuffer. [edx] was more complicated than this, but I chopped out some stuff to make this more readable.

I guess what I want you to take away from this is that everyone thinks assembly code is really verbose, and sure, yeah...but you can also cram so many side effects and darn-near-magic into a handful of instructions. It's pretty mind-blowing.


You must log in to comment.

in reply to @icculus's post:

One more benefit, fixed point math is deterministic across all hardware (assuming identical 32/64 bit sizes). Meaning given the same inputs, you will get the same outputs everytime. Which can be a huge advantage to developing multiplayer games, or in situations where you want to reproduce the same output game state given identical inputs.

Unlike floating point which can depending on hardware and runtime config have small error values that can accumulate over time leading to divergent non-deterministic simulations.

Fixed point representation saw a full revival with the older GLES 1.1 embedded hardware on pre-and-around the first iPhone, before being all but wiped out by GLES 2.0. The specification explicitly allowed 16.16 fixed point representations to work with cruddy CPU cores that didn't have an FPU (or had a virtual FPU unit), but somehow still had a GPU core. Of course, the secret was that client code did all the work to turn floating point numbers into fixed point representations at the API boundary, then sent it to the driver, which internally converted them back to floating point.