Case Study in Binary Data: The Vocabulary of Daventry

I wax nostalgic about old adventure games and then start ripping them apart byte by byte.

Rumpelstiltskin. Rumplestiltskin? Nikstlitslepmur?

That guy!

The AGI engine was developed by Sierra On-Line in 1982 for the initial release of King’s Quest for the IBM PCjr. In a move that was brilliantly innovative for the time, Sierra didn’t just write the game for IBM’s new platform, but rather wrote an engine that compiled the game into a form that a generic interpreter could then play back for the end user.

This design allowed the company to easily release their games to multiple platforms. By developing a reusable engine and porting the interpreter to the majority of personal computing platforms, they were able to focus on narrative and game design within their engine and easily release the games to their fans. Ultimately 14 different games were released for eight competing platforms between 1985 and 1989 before technology demanded a more fulfilling platform for contemporary hardware.

These games were a fundamental part of my childhood and it turns out I was not alone! Flash forward about 30 years and the internet has long since reverse engineered the platform and torn it apart. As an exercise for myself, I started reading how the thing works to see how it was implemented. The assets for the games were compiled into data that is processed by the interpreter to present to the player and respond to their input.

The graphics were a vector graphics format, with bytecodes explaining how to draw the primitives and the interpreter would then render the display by writing to the video buffer. Sounds varied wildly depending on the capabilities of the end platform, from bytecodes explaining the frequency and attenuation for voice channels on the IBM PCjr to straight up MIDI wrappers as the market became more capable. A proprietary scripting language called LOGIC was compiled to bytecode for puzzles and character behavior and other internal game logic.

I decided to start by exploring the games’ vocabulary data, a snack-sized puzzle to solve one Saturday morning. The player UI was almost entirely text based with keyboard arrows for character, a minor step up from the text adventures that paved the way for the graphical adventure genre. Rather than straight ASCII data, the game’s vocabulary is converted through a very simple encryption and a form of compression that is almost useless for actually saving file size. I speculate that it was used more to obfuscate the vocabulary in an effort to prevent the player from cheating.

The first 26 2-byte words of the file are essentially indexing the offsets at which you can seek to, finding the vocabulary words that start with a particular letter. The entire file’s vocabulary is in alphabetical order, so this must have been a timesaving measure for the slower I/O speeds of the era.

Once at a particular letter’s offset, there is some very odd encoding of the ASCII data. Each byte is interpreted as follows:

  • The first byte of every vocabulary word is actually an unsigned integer, telling the interpreter how many of the previous vocabulary word’s letters are repeated in the first word. (That is, if the word is “battle” and the previous word is “ball” it starts with the byte 2 for repetition of the first two letters of the previous word.)
  • Every following letter of each vocabulary word is initially the ASCII encoding XOR by 0x7F.
  • The last letter of each word is offset by 0x80, marking the end of each word.
  • Lastly there is a two-byte integer that is an internal “word number” that is used by the game’s logic scripting to translate words to logical in-game actions. Words that share the same word number are considered synonyms that refer to the same thing.

Armed with this knowledge, I threw together a short Python script that implements the algorithm to build the in-game vocabulary. The code for the script is located on a public repository on my Github account along with the vocabularies from Sierra On-Line classics King’s Quest 1, Space Quest 1, and Police Quest 1.

That troublesome gnome’s name by the way?


Some General Advice for Learning

Recently I’ve been looking at Stack Overflow a lot. It’s a website that focuses on answering questions related to programming. I’ve always bumped into this website without fully engaging it. As you encounter those small problems when you’re programming and the immediate explanation isn’t always obvious, the natural instinct for a 21st century being is “Google it!”

Chances are, you’re not the first person to have this problem so Stack Overflow very frequently pops up in your results. You skim the listed answers, find one that fits into your brain to help you grok it, implement the solution or workaround and move on. It’s always been a passive thing for me that I’m entirely grateful for.

But people need to write those answers! What better way to help out someone who is struggling with the puzzles in programming? The site doesn’t discriminate, with filters and categories for questions both big and small.

So where do they get their answers from? Experience is most likely, but so is copying another answer they got from someone else, or a copy-paste when they ran into the problem initially, or documentation, or a YouTube video or a Udemy course. One of those should stand out: documentation.

When a new product is released or a new public API is developed or a new protocol is established, documentation holds the keys to understanding. The internet, despite all of its shortcomings, houses one of the most important and remarkable things in modern society: freely accessible and open information. What better way to learn how a thing works than by reading the specifications written by the people that made it! (Aside from exploring the source code, but that requires a different headspace.)

For an engineer, answering a question isn’t just about the answer, it’s understanding how to arrive at that answer. Reading isn’t always easy or interesting and takes longer than a ready-to-eat solution, but it’s okay to reinvent the wheel when you want to learn how the wheel works.

Some helpful references for developers: