Expanding the Wing Commander Memory Reader – VGA Capture

Demystifying how VGA programming worked and working with its binary data.

A follow up to the previous post.

Weapon status (left) and Navigation data (right) displayed on the in-game VDUs of a Broadsword fighter-bomber.

Another idea was proposed in the CIC Discord after I developed the prototype. An ongoing feature of the Wing Commander series are in-cockpit VDUs (Visual Display Units.) Essentially, they are small computer monitors inside your in-game cockpit that are used to give vital information to help you navigate the game and quickly access options while in combat. They might contain a systems damage readout, information on an enemy target, navigational data, or a short animation of your wing man’s communication to you in the form of a talking head.

Your wing man in the first mission of WC2, Shadow, tells you that enemy fighters have been sighted.

Obviously such a thing would be a very cool feature for my previously mentioned external cockpit! I knew that the Arduino I used in my previous blog entry was capable of driving a color LCD so I began exploring how to extract such information from the game. The games run in DOSBox, which emulates the VGA graphics standard that was common in MS-DOS games of the era of their release. Though I didn’t have much experience with this exact kind of programming, I knew the basics of how it worked from a high level: by writing pixel color data to a reserved portion of memory, the graphics card would read this exact data back to display to the user’s screen.

Things were a lot simpler back then. The CPU would do all the work of rendering a bitmap and then write it to this reserved memory. Though different graphics cards would offer various features, the basics were the same throughout. I knew that DOSBox must provide a similar buffer of memory for its emulation. After all, the games themselves expect there to be a buffer of memory to write to, so it must be there somewhere!

I began by doing some research on how VGA worked so I could find the data that I needed to do this. I found a wonderful website maintained by David Brackeen called 256-Color VGA Programming in C that explained so many things in a wonderful mix of technical and abstract explanation! I genuinely would not have been able to complete this step of my project without his information.

Viewing the VGA buffer at memory address 0xA000 in DOSBox’s debugger.

I utilized DOSBox’s debugger to view the VGA buffer (known to be at the emulated memory address of 0xA000) on an in-game screen that I knew would remain static and not change. I then was able to return to this screen in the more common non-debugger version of DOSBox and use Cheat Engine to locate the binary data in the (non-emulated) memory of DOSBox itself!

I was able to narrow this down to a consistent memory location after a few consecutive tests.

The next step was to determine the custom palette that Wing Commander 2 uses. Again, David Brackeen’s site gave me a lot of information on how to program this. Unfortunately, I wasn’t able to easily determine where in memory DOSBox stores this information (although I’m sure it’s there somewhere.. It is a puzzle I am determined to solve!) I did come up with an alternate solution to the problem. I took a screen capture in DOSBox to a PNG file. Because PNG stores its palette information in the file itself, I could easily process this file using some convenient .NET classes to generate a reusable palette.

Armed with the memory address of the VGA frame buffer and the palette the game generated, I could now capture video directly from memory! I spent some time building a GUI in Visual C# to not only display these video captures, but also create a more comfortable user interface for connecting to the Arduino.

Live capture of game data and VGA frame buffer from DOSBox.

Obviously the next step would be to connect a color LCD to the Arduino and see how it works. However I haven’t yet acquired one to test. In theory, it should be able to work, though perhaps a bit slowly. The Arduino communicates with the PC over a serial connection, which is fairly slow compared to more modern technology. After doing some Googling, I was able to find that the Arduino should be capable of at least 1 Mbps, or approxmiately 125 kilobytes of data per second.

Each VDU ranges in size, depending on the ship you’re flying, but are at most 75×75 pixels, or 5.625 kilobytes each. If you have two VDUs, we’re looking at 11.25 kilobytes. The frame rate at 1 Mbps would therefore be around 10 frames per second for a live feed in a perfect world. In my research, I saw claims of up to 2 Mbps for the Arduino, but I’d have to get my hands on some hardware and do some more research to find out the capabilities.

I can’t thank David Brackeen enough for keeping this information available on the internet. MS-DOS programming is a bit of a historical artifact at this point but it’s great knowing that there are people out there keeping it alive for the rest of us.

The current source code for the Memory Reader GUI is available on my Github profile. As of this writing, I haven’t yet added the Arduino functionality to it but it will be a quick add via a commit within a day or two.

Case Study in Binary Data: The Vocabulary of Daventry

I wax nostalgic about old adventure games and then start ripping them apart byte by byte.

Rumpelstiltskin. Rumplestiltskin? Nikstlitslepmur?

That guy!

The AGI engine was developed by Sierra On-Line in 1982 for the initial release of King’s Quest for the IBM PCjr. In a move that was brilliantly innovative for the time, Sierra didn’t just write the game for IBM’s new platform, but rather wrote an engine that compiled the game into a form that a generic interpreter could then play back for the end user.

This design allowed the company to easily release their games to multiple platforms. By developing a reusable engine and porting the interpreter to the majority of personal computing platforms, they were able to focus on narrative and game design within their engine and easily release the games to their fans. Ultimately 14 different games were released for eight competing platforms between 1985 and 1989 before technology demanded a more fulfilling platform for contemporary hardware.

These games were a fundamental part of my childhood and it turns out I was not alone! Flash forward about 30 years and the internet has long since reverse engineered the platform and torn it apart. As an exercise for myself, I started reading how the thing works to see how it was implemented. The assets for the games were compiled into data that is processed by the interpreter to present to the player and respond to their input.

The graphics were a vector graphics format, with bytecodes explaining how to draw the primitives and the interpreter would then render the display by writing to the video buffer. Sounds varied wildly depending on the capabilities of the end platform, from bytecodes explaining the frequency and attenuation for voice channels on the IBM PCjr to straight up MIDI wrappers as the market became more capable. A proprietary scripting language called LOGIC was compiled to bytecode for puzzles and character behavior and other internal game logic.

I decided to start by exploring the games’ vocabulary data, a snack-sized puzzle to solve one Saturday morning. The player UI was almost entirely text based with keyboard arrows for character, a minor step up from the text adventures that paved the way for the graphical adventure genre. Rather than straight ASCII data, the game’s vocabulary is converted through a very simple encryption and a form of compression that is almost useless for actually saving file size. I speculate that it was used more to obfuscate the vocabulary in an effort to prevent the player from cheating.

The first 26 2-byte words of the file are essentially indexing the offsets at which you can seek to, finding the vocabulary words that start with a particular letter. The entire file’s vocabulary is in alphabetical order, so this must have been a timesaving measure for the slower I/O speeds of the era.

Once at a particular letter’s offset, there is some very odd encoding of the ASCII data. Each byte is interpreted as follows:

  • The first byte of every vocabulary word is actually an unsigned integer, telling the interpreter how many of the previous vocabulary word’s letters are repeated in the first word. (That is, if the word is “battle” and the previous word is “ball” it starts with the byte 2 for repetition of the first two letters of the previous word.)
  • Every following letter of each vocabulary word is initially the ASCII encoding XOR by 0x7F.
  • The last letter of each word is offset by 0x80, marking the end of each word.
  • Lastly there is a two-byte integer that is an internal “word number” that is used by the game’s logic scripting to translate words to logical in-game actions. Words that share the same word number are considered synonyms that refer to the same thing.

Armed with this knowledge, I threw together a short Python script that implements the algorithm to build the in-game vocabulary. The code for the script is located on a public repository on my Github account along with the vocabularies from Sierra On-Line classics King’s Quest 1, Space Quest 1, and Police Quest 1.

That troublesome gnome’s name by the way?