I'm so proud of the title of this article because the staircase objects in Super Mario Bros. are complicated, and this can happen if there are two near each other in a level's data:
The Super Mario Bros. convolution function. Convolution is a mathematical operation on two staircases (f and g) that produces a third staircase (f ✱ g) that expresses how the shape of one is modified by the other.
Staircases were only ever meant to be as big as the ones you see at the end of most levels, which is nine blocks wide and eight blocks high. This corresponds to an internal length of 8. An objects length is defined by a 4-bit number, which can take on any value from 0 to 15. So larger staircases are possible, but they glitch out a bit. Also when larger staircases are near each other, they glitch out as well.
By the way, this program is SMB Utility, a level editor for Super Mario Bros. It actually emulates the game in order to render its display, so it's great at rendering the bugs too!
It turns out staircases are a pretty unique object in Super Mario Bros.' object list, so lets take a deeper look!
In my video about the Super Mario Bros. level format, I explained how the tiles that make up the level are built on the fly as Mario makes his way through the level. This is opposed to building the entire level from the start during a loading period at the beginning. The NES doesn't have enough memory to hold entire levels at once, and future games would need to include extra memory in the game's cartridge to do this. You might want to watch that video first if you haven't already to get a better understanding of what's going on.
Click this image to watch that video! I'll still be here when you get back.
Since each column of tiles is built on the fly, objects that take up more than one column (like a pipe takes up 2 columns), need to be held in memory for a bit until they're done processing. Like shown in the video, the game does this by creating a queue that can hold up to 3 objects at a time. The queue also holds how many more columns are remaining for each object. This is how each object knows which column to display.
For example, if the game is drawing a vertical pipe, and sees that it has 1 column left to go, this must be the left side of the pipe. If there are zero columns left to go, this is the last column for the pipe, so this must be the right side.
Every kind of object keeps track of this value.
This effectively makes this value a column index, but starting from the right side instead of the left (which would feel more natural). Objects that can have a variable length use that length to initialize this value, so that the proper number of tiles are drawn. Notice how since its the object's length that determines this index, the left-most column of these objects are different.
Drawing a row of bricks only requires knowing the object's Y position, and not necessarily each individual brick's index.
This doesn't matter so much since, for almost all objects that can be a variable width, all of the columns are drawn identically. There are three exceptions: the green tree, the orange mushroom, and the staircase.
Notice the index of the left-most tile of these objects changes depending on their lengths.
The green tree and orange mushroom get off scot free however, since they can take advantage of a certain flag that always gets set on the first column each object is loaded. These objects' columns are almost drawn identically--it's only their left-most and right-most columns that are different.* The right column is easy to check for, since that is when the column index is zero. The left column is tricky since its index can be different depending on the length. Fortunately, when an object is first added into the queue, the carry flag is set. This is the indication for these objects that they should draw the left-most column of this object.
; $9BAC: get an object's length and set values appropriately
; X = object index in queue
CheckLargeObjectLength:
JSR GetLargeObjectAttributes
; $9BAF: set queue remaining size to object length
; X = object index, Y = object length
CheckLargeObjectFixedLength:
LDA ObjectLength,X
CLC ; by default, clear carry
BPL .alreadySet
TYA
STA ObjectLength,X
SEC ; set carry if we are initializing
.alreadySet:
RTS
; $9BBB: get an object's Y position and length
; X = object index in queue
; $07 <- object's Y position
; Y <- object's length
GetLargeObjectAttributes:
LDY ObjectQueue,X
LDA (LevelTileData),Y
AND #%00001111 ; last 4 bits of first byte
STA $07 ; are the Y position
INY
LDA (LevelTileData),Y
AND #%00001111 ; last 4 bits of second byte
TAY ; are the object's length
RTS
You can find these routines in the SMB1 disassembly by doppelganger here.
However, the staircase object does not get this luxury, since it's not just the left-most column that is different, it is all columns. If only we had an index starting from the left side, this would be easy, because we could use this index to determine how tall this stack of blocks should be.
The number of blocks to draw would be this new index plus 1. Easy! Well, except for that last column...
So instead, the staircase object creates its own counter that it initializes when it is first loaded, and increments it by one each time a column is processed. It uses this counter directly as an index into a table in the ROM that defines how tall each step in the staircase is.
; $9AA5: the number of blocks in each staircase step
StaircaseStepHeight:
db 7, 7, 6, 5, 4, 3, 2, 1, 0
; $9AAE: the Y position of the first block in each step
StaircaseStepPosition:
db 3, 3, 4, 5, 6, 7, 8, 9, 10
Link to these tables here.
The first table defines the number of square blocks to draw in this column, minus one. The second table is the number of tiles from the top of the screen to draw the highest block in this step, which is effectively this step's Y position. Oh yeah, its also backwards, so I lied earlier. The staircase counter is actually initialized to 9, and decremented once before each column is processed. That means the first step in a staircase always has index 8, and it counts down to 0 for the last step in the biggest staircase.
For the biggest intended staircase, the counter lines up exactly with the column index.
So it's quite easy to see what will happen if we make our staircase bigger than 9 columns wide. Our index overflows to 255, and we start reading junk data from those two tables.
Hello random block out of nowhere!
We can see exactly what junk data we end up reading and what will be output to the screen.
| Index | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 255 | 254 | 253 | 252 | 251 | 250 | 249 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # of Blocks | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 7 | 6 | -80 | 13 | -32 | -24 | 6 | -95 |
| Y Position | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 3 | 187 | 32 | 96 | 210 | 16 | 136 | 7 |
| Blocks Drawn | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 8 | 7 | 1 | 14 | 1 | 1 | 7 | 1 |
Most of these Y positions end up with steps that don't start until below the screen, so they just don't show up at all. The 7th out of bounds entry just barely sneaks in though. Also, since the routine that draws the tiles always draws one block before checking if the value is less than zero (the highest bit is set), one block is drawn in this case despite this step's side being negative. This also means for those steps where the Y position is very large (greater than or equal to 13), the buffer that holds the column of tiles currently being processed gets indexed out of bounds and a whole variety of things can happen.
Let's add another staircase. Here's two staircases with a length of 5 at the same X coordinate in the level.
A Dutch staircase.
That sure is a staircase. So while that queue allocates a byte each to store the number of columns remaining for each object in it, the staircases only ever reference a single memory location for that special counter. Normally, there would never be two staircases overlapping each other anyway! So when they do overlap, they both use the same counter and end up clashing, resulting in this staircase that goes up two blocks at a time.
; $9AB7: process a staircase object
ProcessStaircaseObject:
JSR CheckLargeObjectLength ; get size of the staircase
BCC .notFirstColumn
LDA #9 ; set the counter to 9 if
STA StaircaseCounter ; we're on the first column
.notFirstColumn:
DEC StaircaseCounter ; decrement counter
LDY StaircaseCounter ; use the counter as
LDX StaircaseStepPosition,Y ; an index into the position
LDA StaircaseStepHeight,Y ; and size tables
TAY
LDA #$61 ; square block metatile
JMP RenderUnderPart ; draw the tiles
RenderUnderPart will be discussed another time, it is also interesting!
In general, this can result in lots of mutant staircases, as once a new staircase object is loaded, the counter restarts at 9 and counts down again. This results in the number of blocks resetting down to 1 and counting up again.
Looks like some old ruins or something.
Using multiple staircases, we can see even more out of bounds indices for our staircase tables, which allow for some interesting block structures. Using three staircase objects at the same X position with the highest length of 15, we can see 37 out of bounds entries in addition to the normal 9 entries in these tables.
The maximum staircase. Due to the column buffer being indexed out of bounds, this staircase will also turn the scenery into clouds, and enable the second quest mid level.
* Nope! The mushroom's stem is also different. The game somehow has to know where the midpoint of the mushroom platform is. And unlike the staircase, there can definitely be more than one mushroom loaded at once (see 4-3). To solve this, the game stores the half-length of the mushroom platform in a separate table in memory, essentially saving it in the queue as well. It is odd that the mushroom platform got this special treatment but the staircase didn't. It is also weird that the staircase and mushroom platforms don't share this table for both purposes--it would have totally worked!
