Super Merryo Trolls

Part Three

  1. Introduction
  2. The Challenge
  3. Crash Course In Assembler
  4. The PEA Field
  5. Stupid PEA Tricks
  1. Further Graphic Barbarity
  2. The Blockset
  3. Drawing Sprites
  4. Message Blocks
  5. The Level Maps
  1. The Kings Quest Thing
  2. Dungeon Faster
  3. A Gallery Of Other Weirdness
  4. Support Programs
  5. Resurrection And Code

The Kings Quest Thing

These days, computers don't often have to be restarted. But in the 1990's we regularly restarted them just to switch tasks, or bring them back to life, or because they got glitchy and needed what my friends and I called a "courtesy flush".

It may horrify you to learn this, but when you restarted a really old computer like the Apple IIgs, almost all the contents of memory were left intact. Any document you were writing, or picture you were editing, was still sitting there and could read by the next program to be launched, or the next person to use the computer. In practice this didn't matter because the next person using the computer was always you, but it had an interesting consequence: You could jam down the reset keys in the middle of any running program, including your favorite game, and then when the machine started back up, go poking around in memory to see what that program was doing.

Programmers could keep few secrets. Using this one weird technique with a few hand-rolled programs, I discovered stuff like this:

The picture on the left is a chunk the game "King's Quest" left behind when I murdered it with the reset key. The picture on the right is from "Dungeon Master".

King's Quest was one of my favorite games as a teenager, but I was always frustrated by how slow it ran. The main character strolled around through these cool fantasy landscapes, passing in front of and behind objects, but it took an entire minute for him to get across one screen. I didn't know how most of the game logic worked, and it would have been painful to reverse-engineer it, but one thing I did know was graphics hackery, and I was certain I could make Guido stroll around the screen the same way the guy in King's Quest did, except Guido would go ten times faster.

The image I recovered from the corpse of King's Quest told me all I needed to know: The way the programmers made the character interact with a scene was by drawing two versions of it - one regular, and one in silhouette - and using the second version as a key.

So I declared World 8 would have an homage to Kings Quest, and set about building one that was exactly two screens in size: Just large enough for Guido to appear next to one side of a castle, and then walk over to the other side and go in the front door. At the bottom I could put a little text entry box, like the original game, but instead of offering any meaningful gameplay I could just make the computer jabber nonsense and make jokes at the player no matter what they typed in. So, basically like meeting me in real life.

The world of King's Quest doesn't scroll. Instead of moving the whole screen around, all I needed to do was redraw Guido and a little bit of the area around him to cover up his previous position. The only crafty part was knowing whether he should be obscured by various pieces of scenery, and for that I used the key.

It's dead simple. In the key, objects from the scene are represented as big colored blobs of the same shape. 16 pixel colors to choose from means 16 shapes. I assigned each color a number between 0 and 160, making sure that the number I chose matched the vertical location of the base of an object on the screen. For example, the color for the tree got the number 130, since the base of the tree is 130 pixels down. Then during the game I just took Guido's vertical position, and if it was less than 130, I didn't draw any part of him that intersected that color in the key. If he walked forward (down the screen), his vertical position would increase, and eventually it would reach 131 and he'd appear in front of the tree.

There are lots of creative ways to speed this up. The only thing I did was make a second smaller key, with 16 masking bytes in it. Every time Guido changed vertical position I recalculated the masking bytes. So if Guido was at location 131 for example, the color for the tree blob would get a masking byte of $FF. As I drew Guido, I loaded values from the big key, and then used those values to assemble a mask from the smaller key one pixel at a time. I used that on Guido, then inverted all the bits and used it on the background, then wrote the result to the screen. The point was, I didn't have to waste time looking at every pixel and asking "Is Guido supposed to be in front of this color at 131 or not?" because that question already had a yes/no answer in the masking bytes, in the form of an $F or a $0.

All I needed then was to make sure he couldn't walk through objects. King's Quest did that by drawing white lines on the key, and not allowing the player to cross them. So I did the same thing.

Dungeon Faster

Another of my all-time favorites for the Apple IIgs was a game called "Dungeon Master". Again, it was an immersive, clever fantasy adventure game with a semi-3D graphics style ... and it ran in perpetual slow-motion.

This is the game running in Crossrunner. That's cycle-accurate timing! I'm pressing keys as fast as the game will read them. I honestly have no idea how a kid with undiagnosed attention deficit issues could handle walking through a dungeon this slowly. It must have hypnotized me.

(Hey, how can you tell if someone grew up playing adventure games in the 80's? You say the word "dungeon", and they imagine something like the basement of a U-Haul storage center.)

When I knocked Dungeon Master unconscious and fired up the forensics tools, I discovered big rectangles of game art, mostly walls and floors, which the program composited together one pixel at a time to create a view from anywhere in the dungeon. It was a generic approach, requiring no detailed knowledge of the hardware. I should have expected that, since it was ported from other platforms and written in C. They probably dumped the game code into a compiler and tweaked it just enough to run, then kicked it out the door. Acceptable for a small company shipping a game on unpopular hardware. I could do better.

I wrote a program that compiled all the art into PEA fields, then used pixel-masking only for the edges. When you've got a winning strategy, why not use it? I ran the PEA code to layer all the walls to a different chunk of memory off-screen, then used a loop of PEI instructions to shove that whole thing onto the play window. Using this pipeline, you could sprint around the dungeon so fast that the only thing limiting you was the repeat speed of the key when you held it down.

It's tempting to think that if I'd been older and connected to the industry, I could have approached FTL (the publishers) and pitched them on releasing the sequel for the Apple IIgs using this improved engine. But I was in no state to impress anyone: My High School GPA was barely enough to graduate, and my version of "dressing up" was wearing the T-shirt without the holes. Besides, when I made the prototype in 1992 the entire Apple II line was already dead as a gaming platform, and everyone was moving on. The only people left to impress were us hardcore weirdos.

So my plan was to start up this interface as soon as Guido entered the castle from the King's Quest scene, and make him the leader of a four-person party including me, Ace, and Zog. He could roam around a castle dungeon with the co-developers of his own game, who would all have terrible stats and get killed immediately.

A Gallery Of Other Weirdness

Ya Gotta Have Credits

While messing around with the PEA and PEI instructions, Ace and I came up with a bunch of wacky variations that didn't fit in the game. Since the code was already written, we decided to string them all together with a blocky font and make a credits crawl.

These modes are all variations of "do math on the stack pointer" and "skip over data in your PEI sequence".

You may notice that the credits don't go all the way to the sides of the screen: Each line is 256 pixels wide at the most. That's because we made the graphics smoother by drawing all the lines twice to an area offscreen, the second time with a one-pixel offset.

To do the squeezing effect for example, we would make a PEI sequence that switched between the offsets at regular intervals, allowing us to skip single pixels as each line got pushed to the screen. The more times we switched, the more compressed the line got. Unfortunately PEI instructions can only address a range from $00 to $FF, which means that in any sequence of PEIs, you better have everything you need within a 256-byte region or you're going to have to move the Direct Page pointer around in the middle of your sequence. That would have been messy, so instead we restricted each line to 256 pixels, which was 128 bytes, so both versions of each line could fit in the Direct Page side-by-side.

The Joystick: How Hard Can It Be?

If you're making an action game, you want players to use a joystick. But how do you read the position of the joystick in the Apple IIgs? It's like this: Throw the joystick up into the air. Now start a timer. Stop it when the joystick shatters on the ground. The number of seconds that took is the position of one axis of the joystick, X or Y.

Okay, that's just an analogy, but it captures the absurdity of the process: What you really do is poke a memory location, and then poke a different one until it changes, and count how long that took. There's no hardware to do the counting for you. Also, your counting code better take exactly the same number of cycles per tick or your result will be gibberish, and to do that you have to stop everything else you're doing and stand there counting milliseconds, like a chump, in a computer that is already as slow as a tree sloth filing papers at the DMV.

Making this worse, there are two axes on a joystick, and potentially two joysticks to read. Unless you're clever, you have to do this poke-and-wait routine four times to get the numbers you need. Oh, and if both joysticks are in the "upper left" position, all four axes will be near zero, which will make all the reads finish quickly... And your player will go in that direction faster, which is disorienting!

Now that you understand the lunacy of the problem, here's some code to consider:

         SEP   $30
         LDAL  $E0C070
         LDY   #$60
         STZ   JoystkReadX
         STZ   JoystkReadY
Loop     LDAL  $E0C064
         BMI   XAOn
         NOP
         NOP
         BRA   XAOff
XAOn     INC   JoystkReadX
XAOff    LDAL  $E0C065
         BMI   YAOn
         NOP
         NOP
         BRA   YAOff
YAOn     INC   JoystkReadY
YAOff    LDAL  $E0C066
         BMI   XBOn
         NOP
         NOP
         BRA   XBOff
XBOn     NOP
         NOP
         NOP
XBOff    LDAL  $E0C067
         BMI
         NOP
         NOP
         BRA   YBOff
YBOn     NOP
         NOP
         NOP
YBOff    BRA   Extra
Extra    DEY
         BPL   Loop

I'm not going to torture you with a line-by-line interpretation. I'll just say that this code - cribbed and then modified from those French hackers I mentioned a while back - is the only elegant way to solve a really gross problem.

What it does, is start four counts at once, and then it runs in a loop, checking all four axes in a row for a fixed amount of time. If a given axis reports "I'm done" (if the BMI or "branch on minus" instruction doesn't branch), it skips incrementing the counter for that axis. The really clever part is those NOP instructions: They waste just enough time in the CPU of the computer to be the equivalent of running an INC instruction, so the length of time spent in each iteration of the loop is exactly the same whether the computer incremented four counters, or two, or none, and the counts don't interfere with each other.

With this code, it costs the lowest possible fixed amount of time to read the joystick no matter what the position is. But that fixed amount is like a tax you pay in the CPU for using the joystick, and the CPU is a desperately limited resource. The hardware design that led to this software hack was probably the best they could do in 1986, especially to keep backward compatibility with games from earlier Apple II machines, but can you imagine doing this now? No. The operating system handles this sort of thing.

Today's approach is much simpler:

  1. Put "how to read joystick xinput dotnet" into a search engine.
  2. ?????????
  3. Done!

The Rez-Out Effect

Here's a slowed-down video of the (already slow) effect in Super Merryo Trolls:

This is our attempt to copy an effect in Super Mario World. The game console does it at 30 frames a second of course, but we managed to get the idea across.

It's relatively simple for how cool it looks. First it makes a backup copy of everything on the screen. Then it runs a loop of code that makes another screen-sized copy, this time replacing every odd pixel with the even one after it, using mask-and-shift code like we use for sprite drawing. Then it does the same thing for three-pixel intervals.

Those redraws are very CPU intense, which is why they're prepared in advance. From there, we run this code:

* Display 2x version
     LDX   #BackgPeaField+$8000
     LDY   #SHR
     LDA   #$77FE
     MVN   BackgPeaField+$8000,SHR

* Display 3x version
     JSR   Fade
     LDX   #ForegPeaField
     LDY   #SHR
     LDA   #$77FE
     MVN   ForegPeaField,SHR

* Make and display 4x version
     JSR   Fade
     LDX   #0
]lp2 LDY   #80
]lp  LDAL  BackgPeaField+$a1,x
     AND   #$000f
     STA   Tmp
     STA   Tmp+1
     ORA   Tmp
     ASLS  4
     ORA   Tmp        ; No need for XBA this way
     STA   SHR,X
     STA   SHR+$A0,X
     STA   SHR+$140,X
     INX
     INX
     DEY
     BNE   ]lp
     TXA
     CLC
     ADC   #$A0*2
     TAX
     CMP   #$A0*192
     BCC   ]lp2

* Make and display 5x version
     JSR   Fade
     LDX   #0
]lp2 LDY   #32
]lp  LDAl  2*$a0+BackgPeaField,X
     AND   #$F000
     STA   Tmp+1
     ...

This moves the 2x and the 3x-zoom versions of the image to the screen, calling the "Fade" subroutine each time, then constructs and moves a 4x-zoom version, then 5x, and so on. The subroutine moves the palette one step closer to all zeros, so the screen fades a little bit with each zoom level.

We keep going like this for 6x, 7x, 8x, 10x, 12x, 16x, 20x, 24x, 32x, and finally 40x, then just fade the rest of the way to black.

The only tricky part is, we don't pre-calculate any of the zoom levels beyond 2x and 3x, so they need to be relatively efficient. That means writing new code for each zoom level, and that means writing code that does the calculation in chunks that fall on 2 or 4 byte intervals so it can be repeated. For example, the "7x zoom" code had to handle 7-byte intervals, 14 pixels at a time, since going 7 pixels at a time falls halfway across a byte.

The Autopilot Sprite

Back In My Day™, games usually had an "attract mode" showing a few seconds of a game playing itself. That's very optional now, since people learn about games differently, but in 1994 it was standard, so Merryo Trolls wouldn't be legit unless it did the same thing. What was the cheapest way we could do it?

Well, the Apple IIgs is the opposite of a multi-tasking machine. If you feed it the same inputs with the same timing, it will always respond the same way. All the inputs needed to play the game were on the joystick: Two axes and two buttons. Technically you could finish level 1 without using the second button or moving the joystick up or down, so all we really needed was the X-axis and one button. The state of those could be crammed into a single byte.

We needed a delivery mechanism for this input data. Something we could start and stop, like the cruise control on a car, because we also needed to take over the controls after Guido slid down the flagpole at the end of a level. We already wrote code to manage sprites; why not use that? Anything could be a sprite...

So we created a "recorder" sprite, and added a few lines of code that checked the keyboard for the "r" key at the beginning of each game cycle. If it was pressed we created an instance of the sprite, with a counter set to zero. Every cycle through the game code after that, when the sprite handler was executed, we looked at the X-axis and button data. If it was different from the previous value, we noted down the change and the number in the counter. Then we increased the counter by one, and went back around again.

The status of the inputs could potentially be different on every cycle of the game, but in practice the joystick got held in one position for many cycles in a row, and we discarded all the repeats. The result was a pretty small chunk of data for a relatively long recording. Once the data was there, how did we extract it? Easy: Reset the computer.

As I mentioned before, the corpse of any previous program lingers in the Apple IIgs memory after a reset. So we just ran a BASIC program to read the recording out of memory and convert it into data statements that we could dump into the Merlin Assembler document alongside the rest of our Merryo Trolls code:

Demo
HEX 0206040484090405020C040C84070400
HEX 03000207040184018200020404058201
HEX 0200820102080300040A020182038401
HEX 04020300020F03158304030102030300
HEX 04030300020E82038300840404168407
HEX 04080206820283008401040102170403
HEX 84038200020382040214030004008401
HEX 040F8408040102058407040102060401
HEX 030002050302020D030102100006800D
HEX 0003010002036969

Once we had the data, we wrote a "player" sprite that did the inverse of the "recorder" sprite. To make the game play itself we just spawned that sprite at the start of the level. Done!

You've probably noticed by now that Assembly Language on home computers in the 80's basically had no rules. There was no security at the hardware level. If you wanted to write a sprite routine that reached over into your joystick code and plugged in fake numbers, that was totally your deal. If screwy code design brought the whole machine down in flames, that was also your deal. Just jab that restart button...

Support Programs

It takes a lot of support programs to make a game. In 30 years this hasn't changed at all. A surprising amount of developer time gets poured into software that never reaches the device of any customer, but goes to art departments, sound departments, level designers, QA personnel, et cetera. Most of the programs we wrote for Merryo Trolls have been mentioned already, but there are a few more worth a look.

In 1992, the internet was still dial-up modems and email, except for universities and a handful of fancy people. Late in the year Zog went away to college, but he wanted to keep developing maps for the game. He had access to a computer lab full of UNIX machines, but none of us knew how to write software for them, let alone some kind of networked software that would allow him to edit maps remotely. What to do?

In a surge of caffeine and industrial music, he invented a solution:

It's map design over email!

The above is a draft of World 1-3, assembled by Zog in the computer lab and sent through the tubes in 1992. When Ace and I received it, we constructed a translation key and a BASIC program so we could dump the email straight into a level file for playtesting. It was tedious work for Zog because there was no software on his end to help out. He made drafts on notebook paper, then used vim to build maps in the lab and then pipe them into an email. It was painful so he didn't do it often, but it kept development going.

Now let's make a comparison by skipping ahead, past 1994 when this Apple IIgs code last saw any attention, to 1996 when I was in college. Everything had changed. The death of the Apple II platform had pushed me reluctantly onto a Windows box, and the new hotness in home computer graphics was DirectX, which I wanted to learn.

Why not make Merryo Trolls a Windows application? It would have to be a total rewrite, from Assembly Language to C, with different file formats and updated art, and a much more by-the-book graphics system. I did manage to cough out a working prototype but the whole exercise felt silly, because the Nintendo 64 had appeared a year earlier and everyone was declaring 2D action games to be dead, dead, dead.

The source code exists but there's nothing interesting about it. It's amusing to see the editor, though, in all its Windows '95-era hideousness:

From editing maps over email, to this sort of thing, in four years. The computer industry has always been nuts.

On a personal note, writing this has led me to confront the question - for about the hundredth time - why in the world did I find this stuff so fascinating as a child? The machines were obtuse and uncooperative. Assembly Language programming is incredibly abstract, and uses huge amounts of discrete math and algebra. I didn't start with a "natural talent" for those things, I stubbornly hammered them into myself over a period of years.

By contrast I loathed doing math homework. Doing long division with pencil and paper made me want to die. And then I'd sit down in front of this machine and lose entire days poking numbers into it, with the only concrete reward being a beep or some flashing lights. Why?

Don't get me wrong: I love what I do now. But what I do now is much more diverse. What about Assembly Language was appealing to a 14-year-old?

Resurrection And Code

And now, we return from our safari 30 years into the past, to the year 2024. That was weird, right? Doing graphics by pushing numbers onto the stack with the CPU is hopefully not a strategy we will ever see again. You may have noticed it has a lot of intriguing variations, though - as many as there are opcodes for the CPU - and the self-contained little world of the Apple IIgs is a fun place to explore them. For example, over the last few years the stalwart engineer Lucas Scharenbroich has developed an Apple IIgs game engine that uses multiple versions of the stack technique, and presented it for Kansasfest in 2022.

This is an interesting year for the Apple IIgs. There's a brand new emulator called Crossrunner, and it's the very first cycle accurate emulator for this ancient system. Until now, Super Merryo Trolls either didn't launch at all in emulators, or was unplayably fast. Thanks to Ian Brumby you can finally see this game in its original screen-tearing single-digit frames-per-second glory.

And just as remarkably, you can make your own version, because the source code for Super Merryo Trolls has survived all this time, carried across four platforms and two dozen OS upgrades, lo these thirty years, and is now uploaded to Github.

You can edit and build it in a modern environment, thanks to the Merlin32 assembler and the CiderPress II disk image utility. For example, in the making of this document I've been adding commentary to the ancient code and cleaning it up a bit, using Visual Studio Code, which even has a Merlin Assembler syntax highlighting plugin.

To accompany the code on github I've put together a small set of editing tools in the modern style, including a browser-based level editor that works directly with the original file formats of the game.

If Merryo Trolls inspires you to revisit the Apple IIgs, or make something just as weird for another platform, let me know so I can tell Zog and Ace about it.

Have fun!

--G