doomfs

DOOM From Scratch!

WAD File Parsing
References

WAD File Parsing

WAD files are the bread and butter of DOOM. They contain all the in-game data for the game. Yes, ALL the in-game data. WAD stands for Where’s All the Data? [3] so a single file contains all the assets of the game. Being able to read and parse them is crucial for creating DOOM from scratch. This is why, the first chapter of this project is devoted to properly parsing and interpreting all the content from WAD files. To that end, we have defined a WAD class in include/wad.hpp which will be responsible for holding (in an structured manner) all the data from the WAD file and also provide methods to load and parse such files. Everything that we mention in this chapter will be implemented on that class unless otherwise specified.

WAD File Format and Data Types

DOOM WADs are binary files that are stored in Big Endian order, i.e., the most significant bytes are stored before lesser significant ones. In other words, the most significant byte is stored at the lowest memory address. This is important because modern computers do not use this kind of endianness but rather use Little Endian. This means that we must be careful when reading multibyte data from the WAD file so we need to reverse the byte order of each variable.

The DOOM WADs may contain only five different datatypes [1] which are summarized in the following table.

Type	Bytes	Range
Unsigned Integer	4	[0, 4294967295]
Signed Integer	4	[-2147483648, 2147483647]
Unsigned Short	2	[0, 65535]
Signed Short	2	[-32768, 32767]
ASCII Character	1	[0, 255]

Loading the WAD File

The routine our WAD class constructor will invoke is the WAD loading one void load_wad(). This procedure will open an specified WAD file to be read in binary mode. The whole file will be read and stored in a class member std::unique_ptr<uint8_t[]> m_wad_data, i.e., a unique pointer to an array of bytes. Two design decisions were taken here: (1) we decided to use the uint8_t type, contained in the cstdint header due to its portability and implementation independence, and (2) we are using smart std::unique_ptr pointers to hold WAD data to make this data handling process less error prone providing that we will have an unique owner for the WAD data (the WAD class itself). We will use a class member unsigned int m_offset to specify the current byte offset to the file and keep track of it.

void load_wad(const std::string & filename)
{
	std::cout << "Reading WAD " << filename << "\n";
	std::ifstream wad_file_(filename, std::ios::binary | std::ios::ate);

	if (!wad_file_)
		throw std::runtime_error("Could not open file " + filename);

	std::streamsize wad_size_ = wad_file_.tellg();
	std::cout << "WAD file size is " << wad_size_ << "\n";

	m_wad_data = std::make_unique<uint8_t[]>((unsigned int)wad_size_);

	wad_file_.seekg(0, std::ios::beg);
	wad_file_.read((char*)m_wad_data.get(), wad_size_);
	wad_file_.close();

	std::cout << "WAD read successfully!\n";

	m_offset = 0;
}

Reader Functions for WAD Files

We need five basic functions to read all the data types from the WAD files. These functions will get the appropriate bytes from the WAD file data (the uint8_t array) and then change their order to generate a little endian representation. The following functions are the originally ones coded by movax13h [1] but adapted to use the modern C++ pointers std::unique:ptr<uint8_t[]> instead of just plain pointers uint8_t *.

Shorts are read using the short read_short() and unsigned short read_ushort() functions. Both of them apply the same logic: they get two consecutive bytes from the WAD data pointer at the specified location, e.g., 0xAABB in Big Endian, and then shift the bytes to generate a 0xBBAA Little Endian representation for the short which is returned by the function.

short read_short(const std::unique_ptr<uint8_t[]> & rpData, unsigned int & offset)
{
	short tmp_ = (rpData[offset + 1] << 8) | rpData[offset];
  offset += 2;
	return tmp_;
}

unsigned short read_ushort(const std::unique_ptr<uint8_t[]> & rpData, unsigned int & offset)
{
	unsigned short tmp_ = (rpData[offset + 1] << 8) | rpData[offset];
  offset += 2;
	return tmp_;
}

The same logic but extended to four bytes instead of two is applied by the integer reading functions int read_int() and unsigned int read_uint():

int read_int(const std::unique_ptr<uint8_t[]> & rpData, unsigned int & offset)
{
	int tmp_ = (rpData[offset + 3] << 24) |
		(rpData[offset + 2] << 16)	|
		(rpData[offset + 1] << 8) |
		rpData[offset];
  offset += 2;
	return tmp_;
}

unsigned int read_uint(const std::unique_ptr<uint8_t[]> & rpData, unsigned int & offset)
{
	unsigned int tmp_ = (rpData[offset + 3] << 24) |
		(rpData[offset + 2] << 16) |
		(rpData[offset + 1] << 8) |
		rpData[offset];
  offset += 2;
	return tmp_;
}

The last helper function to read data from the WAD data pointer is the one to read non-null-terminated strings of ASCII characters. This void copy_and_capitalize_buffer() function reads bytes from the data pointer one by one, capitalizes each single byte character, and appends it to the provided reference to an std::string (which is previously emptied). Since the ASCII strings are not terminated by a null character, we must provide the number of characters we need to read to the function and it will stop at that point or if the data pointer comes to an end.

void copy_and_capitalize_buffer(
	std::string & rDst,
	const std::unique_ptr<uint8_t[]> & rpSrc,
	unsigned int & offset,
	unsigned int srcLength)
{
	rDst = "";

	for (unsigned int i = 0; i < srcLength && rpSrc[offset + i] != 0; ++i)
		rDst += toupper(rpSrc[offset + i]);

  offset += srcLength;
}

All those methods are implemented in the include/readers.hpp header. Using those methods, we will be able to parse the whole WAD file and extract its contents.

The WAD header is the first thing that appears at the beginning of a WAD file. It consist of a sequence of 12 bytes divided into three 4-byte parts. Such parts are the following ones:

(4 bytes) a not null-terminated ASCII string which indicates the type of the WAD file: either “IWAD” or “PWAD”. The first one is an original file while the latter is a patched WAD file.
(4 bytes) an unsigned integer that specifies the number of directory ENTRIES (total number of LUMPS) contained in the file.
(4 bytes) an unsigned integer which holds the exact byte offset from the beginning of the WAD file where the DIRECTORY begins.

To store the WAD header, we declared a struct, namely WADHeader:

struct WADHeader
{
	std::string type;
	unsigned int lump_count;
	unsigned int directory_offset;
};

Then we added a member to our WAD class WADHeader m_wad_header; which we can fill with the header information coming from our WAD file thanks to the read_header() method. This procedure reads a 4-byte ASCII string into m_wad_header.type, a 4-byte unsigned integer into m_wad_header.lump_count, and another 4-byte unsigned integer into m_wad_header.directory_offset.

#define WAD_HEADER_TYPE_LENGTH 4
#define WAD_HEADER_LUMPCOUNT_LENGTH 4
#define WAD_HEADER_OFFSET_LENGTH 4

[...]

void read_header()
{
	assert(m_wad_data);

	copy_and_capitalize_buffer(m_wad_header.type, m_wad_data, m_offset, WAD_HEADER_TYPE_LENGTH);
	m_wad_header.lump_count = read_uint(m_wad_data, m_offset);
	m_wad_header.directory_offset = read_uint(m_wad_data, m_offset);
}

After reading the sample doom1.wad file, we can print the following header information:

WAD file
Type: IWAD
Lump Count: 1264
Directory Offset: 4175796

Palettes

Palettes are stored in the PLAYPAL LUMP. Such LUMP consists of fourteen 256-color palette. Since each RGB triplet takes three bytes (one for each channel which can be in the range [0, 255]), each palette takes 768 bytes in the LUMP. This means that the PLAYPAL LUMP takes 10752 bytes in total. For more information see [2] chapter 8-1.

Palettes will be stored in the class member std::vector<std::vector<WADPaletteColor>> m_palettes. Each one of them is represented as a vector of WADPaletteColor, which is a struct that contains three fields representing an RGB triplet:

struct WADPaletteColor
{
	uint8_t r;
	uint8_t g;
	uint8_t b;
};

To read the palettes, we can just look for the PLAYPAL LUMP in our map and then get the corresponding byte offset to its start from the DICTIONARY. Then we will read chunks of 256 RGB colors (three bytes each one) and create a new palette for each chunk that will be stored in our palette vector. We will keep reading until we get to the end of the LUMP using our common byte offset (this decision was taken in order not to hardcode the number of palettes to 14 so that if any patch is applied we can read as many palettes as we want). The palette reading routine is void read_palettes().

void read_palettes()
{
  assert(m_wad_data);
  assert(m_lump_map.find("PLAYPAL") != m_lump_map.end());

  WADEntry palettes_ = m_directory[m_lump_map["PLAYPAL"]];

  m_offset = palettes_.offset;
  while (m_offset < palettes_.offset + palettes_.size)
  {
    std::vector<WADPaletteColor> palette_(256);

    for (unsigned int i = 0; i < 256; ++i)
    {
      WADPaletteColor color_;
      color_.r = m_wad_data[m_offset++];
      color_.g = m_wad_data[m_offset++];
      color_.b = m_wad_data[m_offset++];
      palette_[i] = color_;
    }

    m_palettes.push_back(palette_);
  }
}

We also coded a brief procedure, namely void write_palettes(), which takes each palette in the palette vector and produces a PPM image for each one of them so that we can visualize them. Since palettes are written as small 16 x 16 images, we also scaled them to 1600% the original size and converted them to PNG using mogrify -scale 1600% palette* and mogrify -format png *.ppm respectively. The resulting palettes are shown below:

Colormaps

The COLORMAP LUMP contains 34 color maps that map colors from palettes down in brightness. Each one of them is 256 bytes long so that each byte indexes pixels whithin a palette, i.e., the second byte of the first color map applied to the third palette indicates which pixel within that palette corresponds to the second position of the very same palette. Color maps were mainly used in DOOM for sector brightness (being color map 0 the brightest one and 31 the darkest). See [4] for a more detailed explanation of how color maps work.

Color maps will be stored in our WAD class in the std::vector<std::vector<uint8_t>> m_colormaps member. Each color map is represented as a vector of uint8_t since we are dealing with just pixel indices. To read the color maps, we just look for the COLORMAP LUMP in our map and then get the corresponding offset to the its start from the DICTIONARY. Then we read chunks of 256 bytes and create a new color map for each one of them. Those color maps are inserted in the aforementioned class member which was previously preallocated to 34 items. The color map reading routine is void read_colormaps().

void read_colormaps()
{
  assert(m_wad_data);
  assert(m_lump_map.find("COLORMAP") != m_lump_map.end());

  WADEntry colormaps_ = m_directory[m_lump_map["COLORMAP"]];

  m_offset = colormaps_.offset;
  while (m_offset < colormaps_.offset + colormaps_.size)
  {
    std::vector<uint8_t> colormap_(256);

    for (unsigned int i = 0; i < 256; ++i)
      colormap_[i] = m_wad_data[m_offset++];

    m_colormaps.push_back(colormap_);
  }
}

We also coded a brief procedure, namely void write_colormaps(), which applies all the colormaps to each palette and produces a PPM image that combines all of them. Each row is a palette after applying a color map. For instance, the first 34 rows represent the 34 color maps applied to the first palette. We also scaled the output to 200% the original size and converted it to PNG using mogrify -scale 200% colormaps.ppm and convert colormaps.ppm colormaps.png respectively. The resulting image is shown below:

colormaps

Graphics

struct WADSprite
{
  unsigned int width;
  unsigned int height;
  unsigned int left_offset;
  unsigned int top_offset;
  std::vector<WADSpritePost> posts;
};

struct WADSpritePost
{
  uint8_t col;
  uint8_t row;
  uint8_t size;
  std::vector<uint8_t> pixels;
};

void read_sprites()
{
  assert(m_wad_data);

  std::vector<std::string> sprite_names_ { "SUITA0", "TROOA1", "BKEYA0" };

  for (std::string s : sprite_names_)
  {
    assert(m_lump_map.find(s) != m_lump_map.end());
    WADEntry sprite_lump_ = m_directory[m_lump_map[s]];
    m_offset = sprite_lump_.offset;

    WADSprite sprite_;

    // Each picture starts with an 8-byte header of four shorts. Those four fields are the following:
    //  (a) The width of the picture (number of columns of pixels)
    //  (b) The height of the picture (number of rows of pixels)
    //  (c) The left offset (number of pixels to the left of the center where the first column is drawn)
    //  (d) The top offset (number of pixels to the top of the center where the top row is drawn).

    sprite_.width = read_ushort(m_wad_data, m_offset);
    sprite_.height = read_ushort(m_wad_data, m_offset);
    sprite_.left_offset = read_ushort(m_wad_data, m_offset);
    sprite_.top_offset = read_ushort(m_wad_data, m_offset);

    // After the header, there are as many 4-byte integers as columns in the picture. Each one of them
    // is a pointer to the data start for each column (an offset from the first byte of the LUMP)

    std::vector<unsigned int> column_offsets_(sprite_.width);
    for (int i = 0; i < sprite_.width; ++i)
      column_offsets_[i] = read_uint(m_wad_data, m_offset);

    // Each column data is an array of bytes arranged in another structure named POSTS. Each POST has
    // the following structure:
    //  (a) The first byte is the row to start drawing
    //  (b) The second byte is the size of the post (the amount of pixels to draw downwards)
    //  (c) As many bytes as pixels in the post + 2 additional bytes. Each byte defines the color index
    //      in the current game palette that the pixel uses. The first and last bytes of this arrangement
    //      are TO BE IGNORED, THEY ARE NOT DRAWN
    //
    // After the last byte of the POST, there might be another POST with the same structure as before
    // or the column might end. A 255 (0xFF) value after a post indicates that the column ends and the
    // following pixels are transparent. Note that a column may immediately begin with 0xFF and no post
    // at all. In such case, the whole column is transparent.

    for (int i = 0; i < sprite_.width; ++i)
    {
      m_offset = sprite_lump_.offset + column_offsets_[i];

      while (m_wad_data[m_offset] != 0xFF)
      {
        WADSpritePost post_;
        post_.col = i;
        post_.row = m_wad_data[m_offset++];
        post_.size = m_wad_data[m_offset++];

        // Skip the first unused pixel
        m_offset++;

        for (uint8_t k = 0; k < post_.size; ++k)
          post_.pixels.push_back(m_wad_data[m_offset++]);

        // Skip the last unused pixel
        m_offset++;

        sprite_.posts.push_back(post_);
      }
    }

    m_sprites.insert(std::pair<std::string, WADSprite>(s, sprite_));
  }
}

trooa0 trooa0 trooa0