Case Study in Binary Data: The Vocabulary of Daventry

I wax nostalgic about old adventure games and then start ripping them apart byte by byte.

Rumpelstiltskin. Rumplestiltskin? Nikstlitslepmur?

That guy!

The AGI engine was developed by Sierra On-Line in 1982 for the initial release of King’s Quest for the IBM PCjr. In a move that was brilliantly innovative for the time, Sierra didn’t just write the game for IBM’s new platform, but rather wrote an engine that compiled the game into a form that a generic interpreter could then play back for the end user.

This design allowed the company to easily release their games to multiple platforms. By developing a reusable engine and porting the interpreter to the majority of personal computing platforms, they were able to focus on narrative and game design within their engine and easily release the games to their fans. Ultimately 14 different games were released for eight competing platforms between 1985 and 1989 before technology demanded a more fulfilling platform for contemporary hardware.

These games were a fundamental part of my childhood and it turns out I was not alone! Flash forward about 30 years and the internet has long since reverse engineered the platform and torn it apart. As an exercise for myself, I started reading how the thing works to see how it was implemented. The assets for the games were compiled into data that is processed by the interpreter to present to the player and respond to their input.

The graphics were a vector graphics format, with bytecodes explaining how to draw the primitives and the interpreter would then render the display by writing to the video buffer. Sounds varied wildly depending on the capabilities of the end platform, from bytecodes explaining the frequency and attenuation for voice channels on the IBM PCjr to straight up MIDI wrappers as the market became more capable. A proprietary scripting language called LOGIC was compiled to bytecode for puzzles and character behavior and other internal game logic.

I decided to start by exploring the games’ vocabulary data, a snack-sized puzzle to solve one Saturday morning. The player UI was almost entirely text based with keyboard arrows for character, a minor step up from the text adventures that paved the way for the graphical adventure genre. Rather than straight ASCII data, the game’s vocabulary is converted through a very simple encryption and a form of compression that is almost useless for actually saving file size. I speculate that it was used more to obfuscate the vocabulary in an effort to prevent the player from cheating.

The first 26 2-byte words of the file are essentially indexing the offsets at which you can seek to, finding the vocabulary words that start with a particular letter. The entire file’s vocabulary is in alphabetical order, so this must have been a timesaving measure for the slower I/O speeds of the era.

Once at a particular letter’s offset, there is some very odd encoding of the ASCII data. Each byte is interpreted as follows:

  • The first byte of every vocabulary word is actually an unsigned integer, telling the interpreter how many of the previous vocabulary word’s letters are repeated in the first word. (That is, if the word is “battle” and the previous word is “ball” it starts with the byte 2 for repetition of the first two letters of the previous word.)
  • Every following letter of each vocabulary word is initially the ASCII encoding XOR by 0x7F.
  • The last letter of each word is offset by 0x80, marking the end of each word.
  • Lastly there is a two-byte integer that is an internal “word number” that is used by the game’s logic scripting to translate words to logical in-game actions. Words that share the same word number are considered synonyms that refer to the same thing.

Armed with this knowledge, I threw together a short Python script that implements the algorithm to build the in-game vocabulary. The code for the script is located on a public repository on my Github account along with the vocabularies from Sierra On-Line classics King’s Quest 1, Space Quest 1, and Police Quest 1.

That troublesome gnome’s name by the way?


Author: Mike Gallagher

Technology enthusiast and expert on geek culture

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.