Super Merryo Trolls

Part Two

  1. Introduction
  2. The Challenge
  3. Crash Course In Assembler
  4. The PEA Field
  5. Stupid PEA Tricks
  1. Further Graphic Barbarity
  2. The Blockset
  3. Drawing Sprites
  4. Message Blocks
  5. The Level Maps
  1. The Kings Quest Level
  2. Dungeon God
  3. Infrastructure
  4. Support Programs
  5. The Game

Further Graphic Barbarity

To round out the tour of cheap visual effects, we'll discuss the "fog effect" of Merryo Trolls World 4: Gothic Land.

In a modern 3D engine, the name "fog effect" would describe a situation where shapes on the screen get progressively more colored as they move away from the viewer in 3D space. In the more primitive world of 2D graphics, however, a "fog effect" is usually a flat tinted overlay that moves around on top of a bunch of other layers, giving the impression that they're behind a cloud bank. The popular Super Nintendo game Castlevania IV used this effect generously, and it looked pretty good since the Super Nintendo had a special graphics mode to facilitate it, and could display thousands of different colors on the screen at a time.

Compare that with the situation inside the Apple IIgs: No special graphics mode, and unless you worked within various restrictions, only 16 colors on the screen at once. In Gothic Land, the sixteen colors we used were these:

Doing any kind of graphics with only 16 colors is an invitation for hilarity, but ... a "fog effect"? In color? You may ask why we even bothered. But we were teenagers, and this was a new frontier and our obsession. We were no different from the kids who hung around the school's auto shop, installing neon tubing under their cars and and welding ugly metal "spoilers" onto them. People laughed and called it pathetic, but we thought it was terrific and had a great time.

... Of course, that wasn't how we saw it during High School. We considered ourselves the intellectual elite, and mocked the auto shop guys along with everyone else. High School truly is a terrible time.

Anyway, here's how the "fog effect" worked:

At the simplest level, transparency is acheived by taking two pixels and averaging them mathematically into one. Do this for two whole images, and the images appear superimposed. (You can't really tell which one is supposed to be "on top" without some other visual cue, like one image being of bricks and the other of clouds, or one image scrolling faster than the other.)

An even cheaper way to convey the illusion of "fog" is to swap dark colors on a source image for light colors, to create fingers of mist like in the screenshot above. This isn't real transparency because the colors always brighten instead of averaging, but it's easier to apply and allows for easier re-use of colors. For example, in the Gothic Land palette, we replace color 1 with color 2, turning blue into lighter blue. We replace color 0 with color B, turning black into the darkest available grey. The "foggy" version of all the lightest colors ends up being white, hence colors 2, 6, and A all turn into F. We picked a similar transition for every color in the palette:

There's no easy mathematical algorithm to describe these transitions, but if we just create an array of the lightened colors, then picking the right color becomes trivial:

Using an array, the lightened version of color 0 is at array position 0, the lightened version of color 1 is at array position 1, and so on. So all we need to do is load single pixel values from the PEA Field, use those to grab replacement pixels from the array, and then write the new pixels back over the old ones. Instant color lightening.

Doing it one pixel at a time is a valid method, but it's disturbingly slow. (Reminder: Everything on the Apple IIgs is slow. It's disturbingly slow that you must worry about.) The problem is, each pixel is a four-bit value (from 0 to F), so each byte of memory covers two pixels. The byte $7D, for example, is a pixel of color 7 next to a pixel of color D. The smallest chunk of information that the Apple IIgs is able to work with efficiently is one byte, so working with one pixel at a time requires a lot of hairy sub-byte operations ... ANDs and EORs and ASLs and lots and lots of CPU cycles; yeech. Fortunately, another trivial workaround is available: Just use a bigger array.

If we can make an array for lightening one pixel, then we can just as easily make an array for ligtening two pixels at a time. In fact, it's much easier because the two adjacent pixels stored inside each byte already form one convenient index value, which can point directly to their replacement byte.

Let's consider some examples:

The rest of the array can be built along the same lines, covering all the possible combinations of two colors. Here's a visual representation of the completed array used in Gothic Land:

(Building such a data table by hand would have been tedious. Instead, we wrote a program in BASIC that coughed it out.)

So with a little 256 byte table, color lightening can happen a byte at a time. Our three-step sequence of "load old color, load replacement color, store over old color" becomes "load old byte, load replacement byte, store over old byte". And that can translate cleanly into three Assembly Language commands:

   LDY $pea_field
   LDA $color_table,y
   STA $pea_field

Looks great. If we put the Apple IIgs CPU registers into "8-bit" mode, so they only load and store a single byte at a time, then these commands will work perfectly with the 256-byte array. There's only one catch: The fog needs to move. In order to draw your fog at different offsets in the PEA Field and achieve animation, you need to involve a variable in all your loads and stores, so you can change it each time you run through the code.

Fortunately, the Apple IIgs CPU has three registers: A, X, and Y. We're already using A and Y, but we can use X as an index to our loads and stores in the PEA Field:

   LDY $pea_field,x
   LDA $color_table,y
   STA $pea_field,x

In theory, this is everything we need to make cheap animated fog on the Apple IIgs. However, our final step - adding the X register - has actually introduced a huge problem.

Follow the link for all the gory details ... But to make a long story short, because of this huge problem, we used an even bigger array: An array built to draw the fog four pixels at a time. Almost. The largest array that the Apple IIgs can index with one register is 65536 bytes, or just enough for a grid of 256x256 bytes. But what we really wanted was a grid of 256x256 pairs of bytes, and that was impossible ... so we compromised and skipped every other value, making a 128x256 array. We could only substitute even values with it - $2A02 and $2A04 for example, but not $2A05 - which messed with the quality of our art - but we weren't out to win awards for artistic design anyway.

So we built a 65536-byte lookup table, consisting of 128x256 pairs of 16-bit fog values. (Actually we generated it dynamically from the 8-bit table when the game started.) We used the blockset to draw the PEA Field, insuring that all the 16-bit values in the PEA Field were even numbers. We had our background, and we had our table. Everything was set up for drawing fog. So what did we draw, and how did we make it move?

This is the fog overlay. It's a screenful of mist, drawn in Deluxepaint, carefully designed to loop horizontally. It's made of four-pixel chunks, so the edges of the mist always occur on handy four-pixel intervals. Since any part of it can be drawn four pixels at a time, we can use our huge 128x256 array to draw it.

Since Guido would be scrolling left and right on the level, we had to be able to draw the fog in small vertical stripes. That way we could just draw the stripes of fog that were new to the screen with each update, instead of updating the whole thing. But drawing new fog was only half the problem: We also had to make it move relative to the background.

With the fog already divided into vertical stripes, it was easy to see the most efficient method of animating it: We didn't have to redraw each entire stripe with every update. Instead, we just had to draw and erase the little bits of each stripe that made it different from it's neighbor. For example, if we had drawn the sequence D E F G H, we just needed to alter stripe D until it turned into stripe E, and alter stripe E until it became stripe F, and so on, resulting in the sequence E F G H I. The end result is that all the stripes appear to move one slot to the left, and animation occurs.

This animation trick is an old one. Older than dirt. What's unique here is our method of implementing it:

Write a program to:

  1. Isolate each vertical stripe in our fog template image (shown above).
  2. Locate the stripe to it's left. (If we're on the first stripe, we wrap around to the very last stripe and use that.)
  3. Compare the left stripe with the current stripe, one four-pixel chunk at a time.
  4. If the chunk is only present in the current stripe, spit out Assembly Language commands that draw that chunk, suitable for embedding in the Merryo Trolls source code.
  5. If the chunk is only present in the left stripe, spit out Assembly Language commands that erase that chunk, by loading it from a fresh (un-fogged) copy of the background.
  6. Gather all of these bits of code together into a collection of subroutines, one per stripe. In each stripe, group all the drawing commands before the erasing commands.

Here's a sample of the source code that program generated. The first three lines draw one chunk of fog. A typical stripe of fog would require about a dozen of these. Then comes a single TYX command, which moves our pointer from the Y register to the X register, and we're ready to erase chunks. The remaining two lines are an example of one erase operation: A "LDAL" from a pristine copy of the background, and a "STA" into the foggy PEA Field, effectively de-fogging that chunk. After about a dozen of those, the sequence is complete, and one stripe has been converted into it's neighbor.

  ...
   LDX ForeFld+43200,y   ; draw a chunk
   LDA Fogtable,x
   STA ForeFld+43200,y

   TYX

   LDAL BackFld+40800,x  ; erase a bit of the previous stripe
   STA ForeFld+40800,x
  ...

You astute members of the audience may notice that the "LDA" command in the drawing code, which gets the replacement pixels from the fog table, appears to be loading from the same memory bank as the PEA Field. A memory bank is only 64k, and the fog table alone is 64k ... so wouldn't there be no room for it and the PEA Field in one bank?

Well, the beginning of the Trolls source code contains these two definitions:

   ForeFld = $040000   ; Foreground feild bank
   FogTable = $04FFFF  ; (claims all of 5)

So in effect, our "LDA" command is starting way up at the very last byte in memory bank $04, and adding the X register to it, which skips it over into bank $05. Why in the world do we do this?! So we can use a "LDA" instruction to refer to data in the next bank, instead of doing the proper thing and using the "LDAL" to refer to the bank explicitly. By doing this, we save one cycle per draw. Ha ha ho!

Anyway, yes yes, we're all very clever. We also optimized the drawing in other ways. At the end of the code for each stripe, we inserted the following instructions:

  ...
   INY
   INY
   INY
   NOP

Then we immediately began drawing the next stripe. At the end of all the stripes, we put a "JMP" instruction that jumped straight up to the beginning of the whole set. So, left unaltered, if we jumped anywhere into this code and began drawing stripes, the code would loop around forever and hang the machine, drawing all the stripes over and over. Why did we set it up this way?

Because our code was self-modifying. See that "NOP" instruction? Remember how it does nothing? Well, while Merryo Trolls was running and you were actually playing the game, the program would locate one of those "NOP" instructions inside itself and replace it with a "RTS" instruction. It would then jump into the (formerly infinite) loop just after the spot where it put the "RTS". The code would execute from there, starting with that stripe, altering all the stripes in the PEA Field, one after the other, incrementing the Y register appropriately along the way (all those INY commands after each stripe) ... and then hit the "RTS" instruction and exit the loop, returning to wherever it came from.

So we looped through all the stripes exactly once, in exactly the order we needed each time, yet we never had to use any code to keep track of the loop. It just executed that way. Unrolling loops is an old trick that your modern compiler now does for you, when appropriate. Back in the day, we made these choices by hand. Every cycle counts!

The only detail left to address is the scrolling that happens as Guido runs around the level. The scheme described above is for animating fog that's already on the screen - making it move over the background - by altering the edges of the clouds. We save a lot of CPU time by just altering those edges. However, when Guido walks, and a new stripe of block data scrolls onto the screen, we have to draw a whole stripe of fog all at once, on top of that block data. We pre-compiled each full stripe into code, of course, but the code was much longer than the code for "animating" the stripes. ... So large that it took more than one bank of Apple IIgs memory to hold it all. A price we gladly paid for the speed increase, of course.

So that's how fog worked. Probably not the way you'd expect. Any sane person these days would use a couple of clever OpenGL calls to make something twice as good, and the impact on the CPU would be next to nothing. The details of implementation have changed drastically with 3D hardware. Still, the logic that informs our choices is the same.

The Blockset

Remember that film "Contact", based on the Carl Sagan novel? A character in it describes an encoded radio message from aliens with the phrase "Efficiency functioning on multiple levels and in multiple dimensions." That phrase is also an excellent description of the way good programmers like to work. Unfortunately, there is a phenomenon directly opposed this effort, and we all become aware of it the day we start college or begin our first job: A phenomenon known as a ship date.

The programming described in these pages was done with no deadline whatsoever, and it shows. That's not to say it's sloppy code -- quite the opposite, in fact. Given free reign, we optimized the hell out of everything. Even if it gave us a speed increase so small that we couldn't detect it, we optimized on general principle. Optimization was practically an end unto itself. It was probably more fun for us to optimize the code than to play the game we were coding for ... which was not a good sign if we'd ever decided to publish it.

Anyway, this is the blockset we designed for World 1 of Merryo Trolls:

We developed the blocks using paint programs whose canvas was screen-shaped. One full screen of data in "Super Hi-Res Mode" was about 32k, and when we were done editing the picture we could save it to disk in a simple uncompressed format, and then extract the raw block data from that format with a little BASIC program. The program would read the blocks out one row at a time, breaking each block into tiny horizontal strips, and append all the strips one after the other in a tall column that ended up being 30k in size. We made eight screens of blocks, for Worlds 1 - 8, and created eight columns of block data on disk.

But that was just the beginning. As we designed World 1, we came up with ideas for stuff that we wanted on the screen - like grass, sandy hills, various pipes, et cetera - so we broke these items down into blocks and added them to the blockset. As we added them, we realized that there was a lot of redundancy. The blocks describing the green hills, for example, were almost identical to each other. When we looked at the column of block data we were extracting from each screen, we discovered that because of all the similarities, some of the blocks we needed could actually be drawn by combining the lower half of one block in the column with the upper half of the next block.

So as we added to the blockset, we carefully rearranged the blocks on the screen to make certain blocks adjacent, so they could share parts in the column. Then we wrote an editor in BASIC and Assembly Language that allowed us to take each column of blocks and design a big table of pointers for it. And that table became the real blockset for each World.

The following animation shows the process step-by-step:

Desktop computers have changed so much that you might wonder: Why did we write so much code just to lower redundancy in a 32k blockset? Wouldn't the size of the code cancel out the savings in data? It probably would have, if we wrote it in C++ and had all the variables padded out to 32 bits. But in Assembly Language, all the logic to draw the blocks via the table, including an unwound loop for added speed, took less than 100 bytes of space.

Even so, why bother with a specific solution, when we could have used a general one? Couldn't we have just made a huge blockset, and then dumped the whole thing into a zipfile, and let the zip compression algorithm deal with it? Well yes, and in fact we did use compression on disk - with source code borrowed from the FTA (a notorious group of French Apple IIgs hackers) - but the algorithm wasn't that great, and the tools to manipulate the compressed files were very crude. It helped to reduce our disk load, but it couldn't address our bigger concern, which was memory usage.

We'd set a hard limit of 30k for the block data in a given World, and as long as we didn't go over that limit, we could fit all the block data on one screen and edit it all at once, which was hugely convenient. Using the tables of pointers, we could also define many redundant blocks without a high memory cost. Like ten blocks of green grass, for example, each defined at a slightly different offset in the column, making grass grow at ten different heights on the map. We used this effect in Gothic Land to smooth out the interior of the rolling hills in Area 1:

That looked nice, but as we developed more levels, we found ourselves almost never using the technique. Also, our obsession with speed was overriding. Drawing a strip of block data into the PEA field (and two strips if Guido was running) often involved a lot of redundant re-loading of registers, since many of the graphics had repeating patterns in them, or open spaces that were all one color. We were using an unwound loop for speed, and that code made the redundancy very obvious.

The unwound loop drew a vertical stripe of data for a single block all at once, and looked like this:

   LDA Blok+$0000,y        ; Block chunk 0
   STAL ForeFld+$0E11,x    ; Storing in reverse order, since the PEA field
   STAL BackFld+$0E11,x    ; draws upside down
   LDA Blok+$000A,y        ; Block chunk 1
   STAL ForeFld+$0D21,x    
   STAL BackFld+$0D21,x
   LDA Blok+$0014,y        ; Block chunk 2
   STAL ForeFld+$0C31,x
   STAL BackFld+$0C31,x
   LDA Blok+$001E,y        ; Block chunk 3
   STAL ForeFld+$0B41,x
   STAL BackFld+$0B41,x
   LDA Blok+$0028,y        ; Block chunk 4
   STAL ForeFld+$0A51,x
   STAL BackFld+$0A51,x
   LDA Blok+$0032,y        ; Block chunk 5
   STAL ForeFld+$0961,x
   STAL BackFld+$0961,x
   LDA Blok+$003C,y        ; Block chunk 6
   STAL ForeFld+$0871,x
   STAL BackFld+$0871,x
   LDA Blok+$0046,y        ; Block chunk 7
   STAL ForeFld+$0781,x
   STAL BackFld+$0781,x
   LDA Blok+$0050,y        ; Block chunk 8
   STAL ForeFld+$0691,x
   STAL BackFld+$0691,x
   LDA Blok+$005A,y        ; Block chunk 9
   STAL ForeFld+$05A1,x
   STAL BackFld+$05A1,x
   LDA Blok+$0064,y        ; Block chunk 10
   STAL ForeFld+$04B1,x
   STAL BackFld+$04B1,x
   LDA Blok+$006E,y        ; Block chunk 11
   STAL ForeFld+$03C1,x
   STAL BackFld+$03C1,x
   LDA Blok+$0078,y        ; Block chunk 12
   STAL ForeFld+$02D1,x
   STAL BackFld+$02D1,x
   LDA Blok+$0082,y        ; Block chunk 13
   STAL ForeFld+$01E1,x
   STAL BackFld+$01E1,x
   LDA Blok+$008C,y        ; Block chunk 14
   STAL ForeFld+$00F1,x
   STAL BackFld+$00F1,x
   LDA Blok+$0096,y        ; Block chunk 15
   STAL ForeFld+$0001,x
   STAL BackFld+$0001,x    ; One extra byte over, to be in front of the PEA

That's 16 loads and 32 stores. Each "LDA" grabs one 16-bit chunk of block data, which is written to the foreground and background PEA Fields with a pair of "STAL" commands. (We used that background PEA Field for erasing sprites.)

So what if we used this routine to draw, for example, a stripe of the "blue sky" block? Each "LDA" command would be loading the same value (#$DDDD) into the A register every time. That's 15 redundant loads! And what if Guido is outdoors, running across a platform that's only two blocks tall, and the remaining ten blocks are empty sky? That's 15 * (10 blocks) * (2 stripes for running), making 300 redundant loads! That particular kind of "LDA" takes 5 cycles, so we're wasting 1500 cycles! Aiiiiighh, the pain! The PAAAAAIIINN!

So what did we do? We wrote code that would analyze the blockset, and convert it into optimized Assembly Language. Each stripe of each block became its own little subroutine, referenced by a big jump table. When we found a duplicate stripe, we just referenced the same routine twice. Because of that reduntant stripe elimination, the "compiled" blockset actually took up less space in memory than the original data.

The new approach eliminated all 1500 redundant cycles from the example above, and saved an additonal 150 because now one of the "STAL" commands didn't have to be a long bank-crossing store. (We could set the bank register to a bank with one of the PEA Fields in it, rather than setting it to the bank we loaded block data from.)

We ran the build at the beginning of each World, and we did it for the sprites as well as the blockset:

If we'd stored the blockset on disk in compiled form, we wouldn't have made the player wait for it to convert. ... I don't know why we didn't do that. If there was a reason, I no longer remember what it was! Faster development time maybe?

Drawing Sprites

So by now you've probably noticed the pattern: The biggest tool in our optimization toolbox was the ability to take things that were typically data structures, and convert them into code structures. The offscreen drawing buffer was turned upside-down and made into the PEA Field. The fog overlay was cut up and converted into sequences that drew and erased their neighbors. The blockset was analyzed and boiled down into a collection of subroutines and a big jump table. Same for the sprites, and various other visual effects.

(To summarize: We wrote code that generated a field of code, and code that converted data into other code, some of which was self-modifying code, and that code continuously modified the field of code, which we finally executed, drawing our graphics to the screen. Sounds great. Ship it!)

Of all these conversions, the sprites were the most complicated, for three reasons. First, we had to worry about erasing them between frames of animation. Second, many of our sprites - especially Guido himself - were made up of moving parts that had to combine in different ways. And third, almost all of them were irregularly shaped, so we sometimes had to perform hairy sub-byte operations to draw single pixels.

This is the sprite set for Guido in World 1, as it looked in our editor, blown up to 2x size so you can make out the messy details:

If you divide the set down the middle, you come up with two regions of sprites that are mirror images of each other. The region on the left is used when Gudio is facing to the right, and the region on the right is used when Guido is facing to the left. The handful of sprites at the bottom is for use when Guido is facing forward, which only happens when he goes down a pipe or through a door.

Using this collection of parts, we drew Guido in various poses and with various powerups. The majority of the parts are for his lower half, which had an eight frame running animation. We tried very hard to keep the borders of the pieces on two-pixel increments, so that drawing them wouldn't require masking bits out of individual bytes in the PEA Field, but in some cases it was unavoidable.

Stage 1
To convert one of these sprites into code, the first thing we did was find all the repeated sequences of four pixels, and group them together. We drew each group by loading the A register once and then writing to the PEA Field with "STA" multiple times. (This is the same optimization as we did for the blockset code.) Once we handled the repeating sequences, we drew the leftovers one chunk at a time with individual loads and stores, until all the four-pixel chunks were drawn. That usually eliminated of most of the sprite.

Stage 2
Next, while we still had the CPU in 16-bit mode, we handled the chunks of three pixels at the edges of the sprite. These chunks couldn't be drawn with a simple write operation because a "STA" can only write 2 or 4 pixels at a time. If we'd just dropped all 16 bits of the A register into the PEA Field with a "STA", we would have messed up a pixel beyond the edge of the sprite.

To preserve that background pixel, we actually loaded it in from the PEA Field with a "LDA", and performed a masking operation on the pixels around it to replace them with part of the sprite. Then we'd "STA" the modified value back to the PEA Field. In total, it took four operations to deal with each of these chunks.

For example: "Bouffant Guido", pictured on the right, has only two chunks that are three pixels wide. The code we generated to draw these two chunks to the PEA Field was:

   LDA ForeFld+$00FA,x    ; First chunk
   AND #$00F0             ; Set all but four bits to 0, preserving one pixel
   ORA #$0008             ; Add in a brown pixel; the other two stay black
   STA ForeFld+$00FA,x    ; Store it back into the PEA Field
   LDA ForeFld+$00F4,x    ; Second chunk
   AND #$0F00             ; Same thing - but keeping a different pixel
   ORA #$8000             ; Add in a brown pixel; the other two stay black
   STA ForeFld+$00F4,x    ; Back it goes

(Old-skoolers may perk up and declare that we should have used the TSB and TRB operators here. But note that those operators do not have indexed addressing modes.)

This looks like a lot of work to draw three pixels at a time ... but without pre-compiling the sprites this way, so we know which pixels to mask in advance, it would be even slower. Think of all the time we'd waste if we tested every four bits of a 320-byte sprite for transparency every time we drew it. The idea just makes my skin crawl. Huurgh.

Stage 3
Anyway, with all the continuous two-byte regions of the sprite taken care of, we issued a "SEP #$20" command and put the CPU into 8-bit mode. Now we could write data one byte at a time and draw all of those two-pixel pairs along the edges without doing any masking. Of course, we grouped the duplicate values first, issuing multiple "STA" commands to save time re-loading the A register.

Stage 4
All we had left were the single-pixel edges of the sprite. These required masking operations just like before, but with the CPU in 8-bit mode this time, the code took up less space and ran slightly faster. For example, to draw that pixel on the upper right edge of "Bouffant Guido", we generated this code:

   LDA ForeFld+$0C31,x    ; Load byte (two pixels)
   AND #$F0               ; Set one pixel to black, keep the other
   STA ForeFld+$0C31,x

Notice how the "ORA" instruction is missing. That's because the pixel we wanted to draw was black. Usually the "AND" sets pixels to black and the "ORA" changes them to something else by setting various bits, but if black is all we need, the "ORA" can be chucked out. Optimization rocks!!

We did this four-stage Just-In-Time compiling procedure for every sprite, streaming the code into memory and adding each starting point to a jump table. Overall it tended to occupy about the same space as the original sprite graphics, since many of the sprites were a lot smaller than their delimiting block.

At the same time we accumulated this drawing code, we also generated erasing code for each sprite, which had a much simpler structure. We ignored any single-pixel-masking and just generated "LDA" and "STA" commands to load values from a backup copy of the PEA Field and store them to the the same location in the "working" PEA Field, drawing back over the sprite. We erased the majority of the sprite with 16-bit "LDA"s and "STA"s, then switched the CPU to 8-bit mode to get the edges and save a few CPU cycles.

Of course, there's more to sprites than just drawing them on the screen - they have to animate, collide, and exhibit other sophisticated behaviors - but the way we addressed those problems was not nearly as unique as our drawing method. In the shortest possible terms, here's what we used: A set of doubly-linked lists populated by a sorted array of map triggers which were activated either by scrolling or by Guido's actions. Each sprite in the list was handled by various "sprite handler" subroutines, which sometimes cascaded into each other depending on how complicated the sprite behavior was. The end.

Message Blocks

Sprites and blocks are good for a simple game, but we also wanted a more literal way to communicate our revolting sense of humor to the outside world. Everyone was familiar with the "question mark" block: You hit it, and coins popped out, or other goodies. In "Super Mario World" for the Super Nintendo, players also found blocks with short text messages stuck inside them. Hit the block with Mario's head, and it would spew out playing instructions, a tip on where to find a hidden room, or a bit of plot development. For Super Merryo Trolls we made three kinds of message blocks: A "Tips" block for play instructions, a "Hints" block for finding secrets, and a "Stuff" block for non-seqitors, rude comments, programming rants, and the like. So we drew a block with a glowing T, one with a glowing H, and one with a glowing S.

Well, actually, the only reason we drew three message blocks intead of one was so that we could line them up in a row with the "question mark" block somewhere on the map, and spell "SH?T".

Oh, don't look so surprised.

The graphics hardware in the Apple IIgs has a lot of screwy traits. One of them is that, in "Super Hi-Res" mode, you can switch between two different screen resolutions on a per-scanline basis. For example, you can have eight rows of 320x200-style graphics, and then suddenly have five rows of 640x200-style graphics just below that. The 320x200 mode allows 16 colors per scanline, and the 640x200 mode allows four. Whichever mode you set, the same amount of memory is used to display the scanline, since 320 pixels of 4-bit color uses the same amount of memory as 640 pixels of 2-bit color.

The upshot of this is, you can put a horizontal band of graphics somewhere on the screen that has a finer resolution, but fewer colors ... perfect for drawing smooth readable text. For example, you can set the first 8 scanlines of the display to 640-mode and put a score counter up there. You can also draw a big window across the middle of the screen, set those lines to 640 mode, and display a cheerful little message. That's what we did, and it looked like this:

To draw text you need a font, and we rolled our own font as well, the same way we created the blocksets. Each block was 8x16 pixels, and we drew a lot of them like puzzle-pieces to combine and make larger characters, like the two-block italic typeface shown above, and a three-block "extra large" typeface. The font data was small, and we wanted it to be in memory all the time, so after we drew it we ran it through a BASIC program that turned it into Assembly Language source code, as big chunks of predefined arrays. We stuck that code in a file called "FONT.S" and included it when we compiled the game.

We did some optimization here, but not as much as we could have ... there was font code in the Amiga demo scene that beat the pants off ours. But that's par for the course, really - no matter what code you're writing, be assured that someone in the Amiga demo scene has written it first, and theirs ran faster.

To get our messages into the game, we first wrote them in a text editor (ProTerm usually) with little bits of pseudo-code scattered around. The codes were used to position the cursor on the screen, draw the window boxes, select the font style, change colors, et cetera. We fed the text files into a BASIC program that would interpret some of the code commands, and convert the rest into another format, exiting with the results in memory. Before conversion, a typical message would look like this:

~B0005791300~B0003790500~P0203~I02
_1Welcome to _4SUPER _8MERRYO TROLLS!_1 (c) 1993 Garote, Ace, Zog.
The object of this game is to travel through the eight lands to rescue the
princess from the clutches of the ultimate evil.
You control Guido (a troll) who can run various directions and jump fairly
high. Collect coins for points, and collect food to keep your "calorie
counter' from reaching zero. You can pick up turtle shells by holding the
second button and running into them. Release the button to throw the shell.
Squash enemies! Break bricks! Gather mushrooms! _3HURRY UP, GUIDO_1!!!

We would convert all the messages at once for each World, and graft the results onto the end of each World data file. Compare this to the scenario for a modern project: We would build our messages in an XML file and drop them into a resource fork, and do any and all conversion internally, while the game was running. The data would be platform independent, easier to change and build, and easier to internationalize. Yes, we had fun writing this game, but frankly I do not miss this era of stone tools and mammoth hides.

The Level Maps

Every once in a while, some n00b programmer logs on to a game developer forum and posts a message that goes: "HI, IM MAKING A 2D PLATFORM GAME. I CAN DRAW BLOCKS & SPRITES, & MAKE THEM SCROLL BY DRAWING THOUSANDS OF QUADS IN IMMEDIATE MODE. BUT HOW DO I STORE ALL THE MAP DATA FOR A LEVEL? PLZ POST CODE SAPMLES. KTHX BYE."

Well, in the interest of fairness, they usually know about the caps-lock key. But the question they ask is often the same. Why do they ask this question? Because even though 2D platform games are all over the internet, it's surprisingly hard to find a tutorial on how to build one. As a service to those forum visitors, this section will explain the map data format we used for Super Merryo Trolls. Those of you who aren't writing a platform game will find this section rather dry, so feel free to skip it.

It may seem against the grain of this essay to discuss an implementation that hasn't changed appreciably in twenty years (Wow, has it really been that long?), but keep in mind that the 2D platform game is itself an anachronism, and anyone who ditches the 3D frontier in favor of 2D, or even uses 3D graphics to implement a side-scroller, has embarked on a programming safari into the past. Techniques that don't make sense any more can become annoyingly relevant.

So. Here's how we did it. (Take all your notions of Object-Orientedness and throw them out the window.) In the Super Merryo Trolls implementation, all the map data is packed together and loaded from disk as a single chunk. We keep four of these chunks - four complete levels - in memory at a time. When Guido enters a level, we make a copy of that particular chunk so we can alter it as Guido runs along smashing bricks and killing things. When he dies or passes the level, we throw the copy away.

The data in each chunk has the following sections:

Sections 1 and 2

The most important sections are 1 and 2. Together, these describe the blocks in the level.

With the behavior of a block described separately from its appearance, the game never has to do any guesswork about how Guido and the other sprites will interact with it. As a bonus, it's trivial to create hidden coins, secret doors, invisible platforms, et cetera.

Section 3

Next in the rank of importance is section 3, the sprite list. This section contains a doubly-linked list of items that vary in size from five to about 15 bytes each. The content of each item varies, but the first five bytes always serve the same purpose. Those first five bytes are:

  1. Offset to the previous item in the list
  2. Offset to the next item in the list
  3. Horizontal location (column number) of the sprite on the map
  4. Unique Identification value (UID) for this sprite
  5. Type of sprite

If any custom data is needed by the particular sprite type, that data begins at byte 6, and keeps going for as long as necessary.

All these sprites are sorted according to their horizontal location (byte 3). As the play window (the visible portion of the level) scrolls to the left or right, the game traces back and forth along this linked list by comparing the edges of the play window with the value in byte 3. Any sprites that pass within a few blocks of the window are copied out of the list (the first two bytes are dropped) and grafted into the "active" list, elsewhere in memory.

This procedure for activating sprites has a serious deficiency though: What if the window activates a turtle sprite, and the turtle walks out onto the screen, and then Guido scrolls back the other way, then scrolls forward again? Will a duplicate of the turtle walk out onto the screen, making two?

To fix that bug, we use byte 4 of the sprite data. Before the sprite is put on the "active" list, its UID value is compared with all the sprites there, and if there's a match, the sprite is not added. Thus, the turtle will not be re-triggered if it's already walking around, but as soon as it falls down a hole (removing it from the list), it will be ready to appear at its starting place again.

Byte 5 is self-explanatory. However it's worth mentioning that the concept of a "sprite" can be used very loosely. For example, you can (and we did) use the sprite list to trigger visual effects, like fading the sky from blue to black, or altering the intensity of the "heat wave" effect. Or you can activate sprites that modify the terrain, causing holes to open, water to rise, doors to slam shut, et cetera.

Section 4

Section 4 is a table we call the "door connections". It's a table of 32 items, 8 bytes per item. The first two bytes of each item identify a door on the map, and the remaining six bytes describes where that door goes. (Pipes that Guido walks or slides into are considered doors.) Byte by byte, it goes:

  1. X coordinate of door block
  2. Y coordinate of door block
  3. X coordinate of destination location
  4. Y coordinate of destination location
  5. X coordinate of the left edge of the play window where Guido arrives
  6. Horizontal offset within the block where Guido arrives
  7. Extra data (Acid Land heat wave intensity, or Gothic Land fog status, for example)
  8. First four bits: Destination type
    Last four bits: Destination number

Byte 8 requires additional explanation. The first half of the byte can be a value from 0 to 3.

Section 5

This section is an array of 256 bytes, one byte per column of blocks on the map. Each byte contains a set of one-bit flags that describe various traits of a column. Even though we had 8 bits in each byte to work with, we ended up only using three.

Section 6

ClmInfo = MapBl+$2900 ; Map column flags bit 0 1 2 3 4 5 6 7 bit 5: solid above bit 6: no-scroll right flag bit 7: no-scroll left flag MapBl = $028000 ; Map MapSpr = MapBl+$C00 ; Map sprite list Sprites: value 1: pointer to previous value 2: pointer to next srite data: value 3: horizontal location of sprite (lookahead added them while several blocks offscreen) value 4: sprite unique ID value 5: sprite type values 6 ... ?: sprite data When activated, all values from 3 on up were spliced into the sprite array, with extra padding added depending on the sprite type. The extra padding was used for any temporary variables that the sprite needed. Even if I were implementing this today, I wouldn't use real dynamic memory for the linked list, but would do the same thing I did this time: I'd stack newly active sprites into the open space of a large pre-allocated block, and if that ever got full before the level ended, I would trace up through the active sprites a hundred at a time and compress them down over the gaps left by "deleted" sprites to free up space. Why invoke the overhead of dynamic memory handling for your linked list, when you're consistently managing pety amounts of RAM? MapVl = MapBl+$1000 ; Map call #'s Connects = MapBl+$2800 ; Map door connects GenInfo = MapBl+$2A00 ; Map general info- time, name, etc. name = geninfo ... 32 time name ... 2 LevPal = MapBl+$2d00 ; Map initial SCB arrangement and palette MsgList = MapBl+$3000 ; Map message list Making data accessible, with format standards and delayed processing, is a practice that has done wonders for software development. color fade, other palette tricks rez-out credits scrolling the Kings Quest level sprite handling: self-sorting array sprite handlers based on macros example: mushroom turtle television message blocks seven fonts - small regular, small italic, small bold, medium regular, medium italic, medium bold, extra large infrastructure: Prodos 8 loader decompressor code borrowed from FTA, essentially implementation of ZIP custom interrupt handler from Ace tracker code from Acetracker custom mouse routine from Acetracker support programs: zog's text-editor mapmaking program keyboard-driven map editor font editor and format-converter demo-recording routine - recorded state changes in mario with a timer, for game demos later on: Windows-based message embedder program Windows-based mouse-driven map editor the game the beginning world themes the ending code package additional PEI PEA PER info: http://bobsc5.home.comcast.net/aal/1986/aal8608.html#a8

--G