Blog

Atom feed is here.


2025-189

As an unintentional follow-up to yesterday's post, I ended up writing a simple bump allocator. The motivation was that I wanted to build a routine to convert relative paths to absolute paths. For this, I need a place to store the absolute path. The relative paths are relative to the folder the game binary sits in, so I can't predict the full string length of the absolute path, meaning I need to dynamically allocate.

Here is the API for the bump allocator; no surprises if you've seen one before.

struct Allocator {
  void* base = nullptr;
  size_t size = 0;
  size_t used = 0;
};

/**
 * Set up an allocator. The allocator's memory is guaranteed to be zeroed.
 *
 * @param a A pointer to an allocator to set up; cannot be `NULL`.
 * @param size The number of bytes to allocate. Must be greater than 0.
 * @param addr The beginning address to allocate from. Pass `NULL` to defer base
 *             address selection to the operating system. Note that any other
 *             value greatly reduces portability.
 * @return `false` if setting up the allocator failed for any reason; `true`
 *         otherwise.
 */
bool allocator_init(Allocator* a, size_t size, void* addr);

/**
 * Allocate memory from the allocator.
 *
 * @param a The allocator to allocate from; cannot be `NULL`.
 * @param size The number of bytes to allocate. Must be greater than 0.
 * @return If successful, returns a pointer to the beginning of the allocated
 *         region. If unsuccessful for whatever reason, returns `NULL`.
 */
void* allocator_alloc(Allocator* a, size_t size);

/**
 * Reset an allocator, meaning that its memory pool is considered entirely
 * unused and the next attempt to allocate will start at the beginning of the
 * backing memory pool. This does not deallocate anything from the operating
 * system's perspective, but anything previously allocated in the allocator's
 * memory pool should be considered invalid after calling this.
 *
 * @param a The allocator to reset; cannot be `NULL`.
 */
void allocator_reset(Allocator* a);

/**
 * Destroy an allocator.
 *
 * @param a The allocator to destroy; cannot be `NULL`.
 * @return `false` if destroying the allocator failed for any reason.
 */
bool allocator_destroy(Allocator* a);

I give the user the option to specify the base address of the memory pool or let the operating system pick it. In production, I'll let the operating system pick it, but in development, I'll control the base address so I can predict the general location of things in memory across program runs.

In the implementation of allocator_init, I use mmap instead of malloc, like so:

bool allocator_init(Allocator* a, size_t size, void* addr) {
  assert(a);
  assert(size);
  int flags = MAP_PRIVATE | MAP_ANONYMOUS;
  if (addr) {
    flags |= MAP_FIXED_NOREPLACE;
  }

  a->base = mmap(addr, size, PROT_READ | PROT_WRITE, flags, -1, 0);
  if (!a->base) {
    return false;
  }

  a->size = size;
  a->used = 0;
  return true;
}

In short, we permit reading and writing on the allocated space, the memory region is private to our process (MAP_PRIVATE), we do not use file-backed memory (MAP_ANONYMOUS and -1 for the file descriptor parameter), and should the caller request a specific base address, we tell the operating system so a fixed address and force failure if our requested address has already been mapped by us elsewhere in our process (MAP_FIXED_NOREPLACE).

I'm using this as a frame allocator, meaning that at the beginning of each frame (at the beginning of each iteration through the game loop), I call allocator_reset. I set up the allocator right after initializing SDL near the top of main, outside of the game loop. This allows me to use the allocator before entering the game loop for any one-off scratch work I need to do, and then immediately upon entering the game loop (with the exception of capturing the frame start timestamp), the allocator is immediately reset.

Before entering the game loop, I use my relative-path-to-absolute-path utility to build the full path to the game logic library so that I can check its last modification time via stat. I then periodically re-build the full path in the game loop when I go to re-check the modification time, so in the future, I will likely also introduce separate, persistent storage to cache such things.

Instead of having to implement my own way to get the base path of the game binary across operating systems (using convenience functions in Windows and checking /proc things in Linux, for example), I can thankfully just leverage SDL_GetBasePath.

The interface of my routine to build absolute paths looks like this:

char* abs_path(const char* path, Allocator* a);

It takes an allocator to use to obtain storage for the final string. The Jai language I mentioned in a previous post does something similar for routines that need to dynamically allocate but passes the current allocator to the function implicitly in an argument called the context. I don't have that language feature in C/C++ without approximating via OOP, globals, or function/method partials, so the allocator is an explicit parameter in my case. I don't like this signature much, but it accomplishes what I need. I haven't covered this detail yet, but I have also separated more of the game logic code from the "platform" and exposed a higher-level game_update function from the game logic library to the platform. The game_update function will likely eventually receive a handle to the frame allocator so that it can allocate whatever it needs in a single frame in a manner I can easily control and introspect. That is how game logic code will also be able to use things like abs_path without owning any particular allocator or backing memory pool.

Here are a couple other things I learned along the way. First, it is not only Linux that has alloca; Windows has it as well. The main difference besides the literal function name is that Linux's causes undefined behavior on stack overflow whereas Windows's will raise a structured exception. I could use this for building strings as I don't anticipate memory required for paths to be so large that it would cause a problem on the stack. The second thing I learned is that man pages can even reference books. I noticed the man page for mmap(2) (Linux man-pages 6.7, 2023-10-31) referenced what appeared to be a book, and upon looking it up, indeed, it is "POSIX. 4: Programming for the Real World," by Gallmeister. Interesting, might want to skim through it.


2025-188

In my game, I want to control the pool of memory from which all allocations are made. This will help with debugging and also general stability, if I can allocate all memory I will need up front and then parcel that memory out internally on-demand. I haven't gone into much detail about why I am not using a game engine, but having more control over allocations or at least the memory pool being used is one of them.

SDL provides SDL_SetMemoryFunctions which a developer can use to specify the functions to call in place of the typical malloc, calloc, realloc, and free. I have previously implemented my own simple linear allocator, from which all I ever did was "malloc" a few times. SDL is its own animal, though, and until I read its code, I need to treat it as if it could ({de|re})allocate at any time. This is unfortunate since it complicates the problem for me, but it's one price I pay for using a powerful abstraction like SDL.

Although I am compiling SDL3 from source for my game, I have not read into the code yet. To get a rough sense for how many allocations are happening and which memory-related functions are actually called, I wrote shims for the memory functions, registered them, and ran my game. One such shim looks like

static int my_malloc_count = 0;
void* my_malloc(size_t s) {
  printf("my_malloc called; c: %d\n", my_malloc_count++);
  return malloc(s);
}

What I see is that a lot of allocation happens initially, both with calloc and malloc. This is SDL initializing its subsystems. On shutdown, a lot of freeing happens, which is cleanup. On the first pass through the game loop, calloc is called hundreds of times, realloc is called maybe about ten times, and malloc is called a few times. realloc is also called, although extremely rarely relative to the other calls. Presumably SDL lazy-initializes some things, and the first pass through the game loop is having SDL finish its initialization as I use various subsystems. On all other iterations through the game loop, I see only the following (keep in mind my game loop is relatively sparse right now).

my_calloc called: c: 1160
my_free called; c: 251
my_free called; c: 252
my_malloc called; c: 297

Debugging shows that SDL_RenderPresent is what I call that in turn calls other functions which are responsible for the (de)allocations above. For what it's worth, I am using the GL rendering backend. I switched to the software rendering backend and experience the same behavior.

In a previous life, the linear allocator I mentioned didn't need to support any efficient freeing because I only ever allocated a fixed number of times to set up buffers I would later reuse. What I see here is that SDL will routinely allocate and free. So should I want to point the memory allocators at my custom memory pool, I need to design the pool and the functions that use it in a way that will efficiently recycle memory.

By the way, I was initially calling SDL_Log in my allocator shims for debugging, but this was a bad idea as SDL_Quit also uses the memory functions after it has turned at least some subsystems off, causing SDL to request re-initialization of the logging subsystem and then hang indefinitely in what appears to be a sleep-and-check loop.

my_free called
^C
Program received signal SIGINT, Interrupt.

(gdb) where
#0 in __GI___clock_nanosleep
#1 in __GI___nanosleep
#2 in SDL_SYS_DelayNS
#3 in SDL_Delay_REAL
#4 in SDL_ShouldInit_REAL
#5 in SDL_InitLog
#6 in SDL_CheckInitLog
#7 in SDL_GetLogPriority_REAL
#8 in SDL_LogMessageV_REAL
#9 in SDL_Log
#10 in my_free
#11 in SDL_free_REAL
#12 in SDL_RemoveHintCallback_REAL
#13 in SDL_QuitLog
#14 in SDL_Quit_REAL
#15 in SDL_Quit
#16 in main
(gdb) quit

I also noticed SDL_aligned_alloc in the SDL documentation. It has a corresponding SDL_aligned_free function as well. The documentation says that SDL_aligned_free must be used to free memory allocated with SDL_aligned_alloc, so I'm not concerned about having to accommodate these (de)allocations in my implementation. But their presence does raise the question of whether they would ever be called in my runtime. If they are called, my hopes of centralizing and controlling memory allocation are dashed. SDL3/SDL_Surface.h defines the surface flag SDL_SURFACE_SIMD_ALIGNED with the documentation

Surface uses pixel memory allocated with SDL_aligned_alloc()

and grep suggests the aligned (de)allocator is generally only used when leveraging SIMD. I also see the aligned memory functions used in implementations for the PS2, PSP, and PS Vita, but these are not targets of mine. I checked whether that surface flag was set on surfaces I am loading, and it is not. So I can probably safely ignore the aligned (de)alloc interface.

So now the question is, how much work is it going to be to implement my own memory pool and allocator to support SDL's behavior? Regardless of the implementation effort, there is also the testing effort, such as unit tests and then fuzzing with asan enabled or something like that.

...

This explanation of slab allocators in older verions of Linux tells me I'm going to be spending a lot of time writing an efficient allocator. The slab allocator is an interesting concept and an appropriate one to use in my case, in my opinion. One feature is that it caches previous allocations of particular sizes, anticipating that such allocations may be requested again. This is exactly the case in my game loop. It also does more advanced things such as adding "coloring" (padding) between slabs to try and keep different allocated regions in different cache lines, so that re-allocating one region does not evict neighboring regions from the cache.

Then there is the backing buddy allocator which handles coarser-grained allocations, typically on the order of pages. I don't need to bother much with this, as for my own pool I'd make a single mmap or malloc call to the system to get the memory for my pool.

I would like to leverage something like an arena allocator for per-frame storage and just clear the storage at the top of each iteration of the game loop, but that SDL_RenderPresent calls malloc and doesn't seem to free that allocation until the next call through tells me I shouldn't free the underlying memory on a frame boundary. The Jai language has a dedicated temporary storage that is recommended to be used in just this way for applications with something like a game loop in them. (Aside: Jai is my favorite language in the C++-contender space. I am just using my favorite dialect of C++, which is C with funtion and operator overloading, for early game development work because I want to reduce the amount of things I need to get better at all at once.) I'd even accept two memory pools, one persistent across the lifetime of the application for things like SDL subsystems, and the other being that frame-temporary storage. Alas, I don't think I can find the lifetime boundaries of allocations made by dependencies so easily and thus don't know when an arena could be reset.

So I wondered, could I use something like jemalloc and just hand over a memory pool to the library and have it allocate from there? The answer is yes, although this is non-trivial, and it of course brings in another sophisticated dependency. So at the expense of making my game harder to trace and debug, I would have more control over the memory (de)allocations. I'm not ready to make that trade.

Lastly, I am toying with the idea of just letting SDL use the system-provided allocators and introducing a simple bump or arena allocator for per-frame storage. The current conclusion of my investigation is to put this off to a later time. The control/complexity tradeoff is not in my favor yet.


2025-187

I implemented basic code hot reloading. Say in your game, you are in the middle of playtesting some long, complicated level, and you find some undesirable behavior. You want to modify your game code to fix the behavior, but then you have to recompile your game and get back to where you found the issue in order to test your change--that or hack up a minimal reproducible example for this one case--to verify if your fix worked.

Another approach is to separate the code for the game logic from the platform code and then dynamically load the game logic into the running platform. This way, you can change the game logic while keeping the game with all of its game state alive and test changes much more quickly. I have to thank Handmade Hero for this idea. I'm currently developing on Ubuntu, so this post will cover how I achieved hot reloading on Linux.

First, a simple demonstration. The video below shows a running game instance where I have moved the player somewhere away from its origin. The tile the player currently occupies is initially red, but then I change the game logic to render it as green, recompile the game logic library, and the game instance detects the library change, reloads the library, and then the tile is rendered as green, all while keeping the game and its state alive.

Until now I kept all the game code in a single file, but I pulled some (rather arbitrary) code out into a separate file to have something to compile into the game logic library while I was standing up hot reloading. Once we do this, we need to add a layer of abstraction between the functions the game logic library provides and the callsites outside the library. Essentially, we replace direct calls into some code with calls through function pointers that we point to implementations of routines we want to call. The implementations exist somewhere in our game logic library which we will now be dynamically loading into our platform.

The gist is like this.

#include <dlfcn.h>

// ...

constexpr int64_t TARGET_FPS = 60;
constexpr int64_t GAME_LIB_CHECK_EVERY_N_FRAMES = TARGET_FPS; // Every second.

typedef int (*FooFn)(Foo*, Bar);

struct stat lib_stat = {0};
const char* lib_pathname = "./path/to/lib.so";
int stat_ret = stat(lib_pathname, &lib_stat);
assert(stat_ret == 0);
uint64_t lib_stat_frame_counter = 0;
void* lib_handle = dlopen(lib_pathname, RTLD_LAZY);
assert(lib_handle);
FooFn foo_fn = (FooFn)dlsym(lib_handle, "foo");
assert(foo_fn);

while (run_game_loop) {
  // ...
  // Maybe reload game logic library.
  // ...

  Foo f{};
  Bar b{};
  foo_fn(&f, b);
}

The typedef establishes an interface between the library to hot reload and the end user. Unlike what I show here, I'd put it in header files corresponding to the library to be reloaded. We then deal with stat(2) and stat(3) which we will be using to get at the file modification timestamp on our game logic library. stat needs to get at inode information and is thus a syscall, so it is "slow," but we also aren't going to be recompiling our library every frame, so we don't need to poll for the modification timestamp on every iteration through the game loop. That's where lib_stat_frame_counter comes in--we'll only poll every so many frames; in this case, once a second.

We next call dlopen which loads a library and gives us back a handle to it. We perform lazy symbol loading with RTLD_LAZY to avoid loading in all symbols our library might contain when I will only use a few of them on the platform side. To load symbols of interest, we call dlsym. We cast its return value to the type of the interface our requested symbol represents.

Not shown in the code above, but in the game loop, every GAME_LIB_CHECK_EVERY_N_FRAMES, we call stat again and check if the modification timestamp on our library has changed. If it has, we dlclose our current handle on our game logic library and then repeat the dlopen and dlsym work to get at the new implementation of our exported routines.

Finally, the game loop calls our routines through our function pointers; shown above is a call through foo_fn.

In a C++ context, the symbol name of the function foo will be mangled. To prevent this, we wrap our declaration of foo in our game logic side inside extern "C" { }. If you wanted to see symbol names in your library on Linux, you can do so with nm. Here are actual (truncated) examples from the test code I hoisted into my game logic library, first mangled and then not.

$ nm -D build/Debug/libgame.so 
000000000000120f T _Z17map_pixel_to_tileP7Tilemapm8Vector2f
000000000000130e T _Z24player_render_tile_underP6PlayerP7TilemapP12SDL_Renderer
00000000000011b9 T _Z5round8Vector2f
$ nm -D build/Debug/libgame.so 
000000000000120f T map_pixel_to_tile
000000000000130e T player_render_tile_under
00000000000011b9 T _Z5round8Vector2f

I don't need round on the platform side, so I leave its symbol name mangled. In this test, I do want map_pixel_to_tile and player_render_tile_under, though, so I needed to not mangle their names.

Code hot-reloading is for development and debugging only. Release builds will link the game library in statically.


2025-186

I implemented simple collision with the environment. This time, it is probably clearer for me to show it in action first.

There are a few new things going on in this video. The first is that the map looks different from last time. I updated the tilemap to add a few things the player can collide with besides the walls along the edges. I am currently exclusively using Aseprite for tilesets and tilemaps. Unlike map editors such as Tiled, Aseprite does not provide a way to annotate tiles in a tilemap with auxiliary information. In Tiled, you can do so, and one such attribute is drawing invisible bounding boxes which can be used to indicate collision areas.

The way I annotated collision in Aseprite was to use an additional tilemap layer. The tileset currently has only 2 tiles, technically 1 tile and the empty tile which occupies an index in the tileset. I set the collision layer to be higher in the layer stack than the map itself, set the opacity of the collision layer to somewhere between 25 and 50 percent, and then I drew over any tile I wanted to add collision to. I used a pink and black checkered pattern for the tile indicating a collision area in hopes that it would stand out from other tiles I may draw on the map.

This approach is not convenient because I need to manage at least 2 tilemaps for every level now. It's bug prone because I might change the underlying map without updating the collision layer or vice versa, or I may change the index values of tiles in the collision map but forget to do so in the game code. It also forces a particular software design on my game, having to parse a unique collision map for every tilemap I load. With all of that said, it's simple, and it works, so I'm going with it for now.

I could design an alternative tilemap representation toward my game, as I do control the binary format of the exported tilemap. One alternative is to set one of the high bits in the tile IDs in the map that logically have collision, essentially merging my collision map into a bitfield in each corresponding tile in the game map. If I recall correctly, Tiled does something similar with bit-flag attributes on tiles. At this point in time, it's not worth me implementing this myself, though.

The ultimate goal is an in-game level editor, but I'm far from that at this time.

The next thing in the video is the red square that seems to be following the player. This is a visual annotation of the tile the player is considered to occupy. I've added it for debugging. Unlike other kinds of software development, it's not so straightforward or even realistic to write traditional, automated unit tests for implementations of various things in games. Often times we get more information more quickly from visual feedback. Testing this way could also explain how so many bugs slip through QA in games all the time.

Computing the tile the player occupies introduces us to what will become a handful of coordinate systems in the game. Currently, for ease of development and debugging, I have only one coordinate system in the game. (0, 0) is at the upper-left corner of the game window, and it is the origin for both the camera and the map. Furthermore, everything is measured in pixels, until now. Determining the tile coordinate from a pixel coordinate, assuming the pixel and tile coordinates share the same origin, is straightforward. In my case, positions support sub-pixel accuracy and so are floating-point numbers.

Vector2d round(Vector2f v) {
  return {
    .x = (int32_t)(v.x + 0.5f),
    .y = (int32_t)(v.y + 0.5f),
  };
}

Vector2f pixel_xy = // ... ;
Vector2d int_pos = round(pixel_xy);
size_t tile_pos_x = int_pos.x / TILE_WIDTH_PX;
size_t tile_pos_y = int_pos.y / TILE_HEIGHT_PX;

I designed the collision map and game map so that they are the same dimensions and so that the tile width and height are the same across maps. I can therefore compute the index into the collision map tiles and check whether a tile has collision area or not. If the player tries to move into a map tile with backing collision area, I disallow the position update. That's what we see with me running into walls in the first video.

Now that we color the tile the player is occupying, we can also see that although the player is currently rendered as an entire tile, the player position is represented as a single point, the upper-leftmost point of the player tile. Because of this, the player can somewhat walk into walls to their right and below them. I'll address this later on by changing the position of the player relative to the player's rendered tile or perhaps changing the player's collision point to a collision area.

Lastly, although it is difficult to tell, we stick to walls somewhat. For example, if I press against a wall on my left side and then attempt to move diagonally up-left or down-left, I won't move at all. This is because my proposed position is up-left or down-left of me which enters the collision area on my left, and my entire position update is disallowed. I'll fix this eventually, likely by checking whether we can move horizontally separately from whether we can move vertically.


2025-185

I implemented basic player movement in the game. First, I created a simple tileset for the player and loaded that into the game as a single SDL texture. Each tile in the tileset is 16x16 pixels, and there are 4 tiles. The tiles are blue squares with a white arrow showing which direction the player is facing, so I have one tile per cardinal direction.

I then created a simple Player struct to hold a pointer to the texture and other supporting information such as the cardinal direction the player is facing and the player's position.

On each iteration through the game loop, I poll for and handle keyboard events. In the event handling logic, I save off the new key states into an instance of my own Buttons struct. I don't modify the player state immediately in the event-handling code. The simple Buttons struct looks as follows.

struct Button {
  bool pressed = false;
};

struct Buttons {
  Button up = {};
  Button down = {};
  Button left = {};
  Button right = {};
};

For now, we just keep track of whether the button is down on the current frame. We don't track whether it was pressed this frame or has been held down for some number of frames. With this setup, if we wanted to understand the input delta from the last frame, we could simply create a second instance of Buttons and store the latest frame's inputs in the second instance at the end of the game loop so that it is available on the next iteration through the loop.

Buttons previous_buttons = {};

while (running) {
  Buttons buttons = {};
  while (SDL_PollEvent(&event)) {
    switch (event.type) {
      case SDL_EVENT_KEY_UP:
        // Fall through.
      case SDL_EVENT_KEY_DOWN: {
          if (event.key.key == SDLK_DOWN) {
            buttons.down.pressed = event.key.type == SDL_EVENT_KEY_DOWN;
          } else if (/* ... */) {
            // ...
          }
       }
        break;
      default:
        break;
    }
  }

  // ...
  previous_buttons = buttons;
}

When it comes time to update the player based on input, I just check whether a certain button is pressed and then update the player's facing direction and a proposed position.

Vector2f proposed_position = player.position;
float dpos = (TARGET_FRAME_TIME_NS / 1000000000.0f) *
             RUNNING_VELOCITY_MS / METERS_PER_PIXEL;

if (buttons.down.pressed) {
  player.facing = Direction::DOWN;
  proposed_position.y += dpos;
} else if (buttons.up.pressed) {
  player.facing = Direction::UP;
  proposed_position.y -= dpos;
}

// The the same for right, left.

player.position = proposed_position;

Here I assume that I always hit my target frame time, and then I multiply that time converted to seconds with my configured running velocity to get a distance the player would have moved in meters. The screen is drawn in pixels, though, so I convert meters to pixels, and that is my delta position (dpos).

I also handle down-up and right-left separately. This way the player can move both vertically and horizontally at the same time. I handle down before up, meaning that if the player presses both down and up at the same time, down takes priority. They will face and move downward. (Notice that I add, not subtract, dpos in the down case--SDL's coordinate system places (0, 0) at the top-left of the rendering space.)

Handling right-left separately means that the player will always prioritize facing a right-left cardinal direction instead of a down-up one if either right or left were pressed on a frame.

Although I simply update the player's position with the proposed position here, I am setting myself up for handling collision. It's possible we should not be able to move where we want to move, so I keep the proposed position separate so I can test it before updating the player state.

Here is a video of what I have so far. The grey border is supposed to be walls, and the white is just a placeholder tile for the ground. You can see the blue player moving around with the arrow updating in the direction the player is facing (or the prioritized direction if the player is moving diagonally). The frame rate is low, but you can tell by the diagonal movement that while the map is based on tiles, the movement is based on pixel positions and is not tile-based. Two outstanding issues are the fact that there is no collision yet and that the player is faster if moving diagonally as opposed to just vertically or horizontally.


2025-184

When I went to implement basic player movement, I needed to capture keyboard input. This is done in SDL by polling for events and then processing keyboard-related ones.

SDL_Event e;
while (SDL_PollEvent(&e)) {
  switch (e.type) {
    case SDL_EVENT_KEY_UP:
      // Fall through.
    case SDL_EVENT_KEY_DOWN: {
        if (event.key.key == SDLK_DOWN) {
          // ...
        }
      }
      break;
    default:
      break;
  }
}

What I want to focus on in this post is the implementation of SDL_Event. If you look at its documentation, you'll see that it's a union. Its first member is type which encodes the event type--it's what we're switching on in the example above. If you look at the implementation of any of the SDL_* union members, you will see the first field of each of those structs is an instance of SDL_EventType. SDL_EventType is implemented as a C enum, which is represented as an int behind the scenes, which is the same number of bits (although different semantics) as the Uint32 (uint32_t-equivalent) type member in the SDL_Event union on nearly all modern platforms.

For reference, see the implementations of, for example, SDL_ClipboardEvent, SDL_TouchFingerEvent, SDL_MouseButtonEvent, or SDL_QuitEvent.

In other words, no matter which member of the union you work with, the first field will always be something encoding the event type.

This construct is called a discriminated union (or tagged union). Here is an all-in-one example:

enum TypeTag { kFoo, kBar, kNone };
struct Foo {
  TypeTag tag;
  // ...
};
struct Bar {
  TypeTag tag;
  // ...
};
union DiscriminatedUnion {
  TypeTag tag;
  Foo foo;
  Bar bar;
};

With discriminated unions, you can avoid inheritance and still compose one type that behaves as an instance of this or that type depending on some context, and you just check the value of tag to know how to access fields of the union. You lose anything you might have otherwise gained through polymorphism, and the total size of DiscriminatedUnion is the size of its largest member (plus padding, potentially; see the interesting note at the bottom of the SDL_Event implementation). However, the discriminated union construct supports data-oriented design by concentrating data of interest nearby in memory instead of potentially scattered about through a network of pointers (whether these pointers be pointers-to-base-class in some data structure, or members of instances pointing to this and that data, or the CPU figuring out which methods to call via dynamic dispatch).


2025-183

Today a simple feature I added to the game was a target frame rate. This is typically done in one of three ways.

The first is by using timers in your game code. You capture the start time of when processing begins for a single frame, then you capture the time processing ends for that frame. Subtract the times, and if the time required to prepare the frame is less than your target frame time, sleep the thread for the difference in times.

The major issue with this approach is that the thread you're collecting timing information on can be preempted at any time by the operating system. Depending on which operating system (or language library) features you use to measure time, you may accidentally be ignoring the time your thread spent off the CPU. That time passes in the world outside the computer and so can cause inconsistencies between the time your game thinks elapsed and the wall-clock time that actually elapsed.

The second way to achieve a target frame time is by using performance counters. Performance counters are typically built into modern processors and track some kind of low-level hardware event. In this way, the counter itself is independent of operating system nuances such as scheduling. The operating system provides an API to access the performance counter value. As we don't necessarily know what is being measured by the performance counter, we don't know what the count is relative to. Thus, the counter value itself is not a measure of the passage of time. The operating system also provides an API to query the frequency of increments to the performance counter, and with this information, we can compute the passage of time.

dt_s = (perf_counter_now - perf_counter_then) / perf_counter_frequency

SDL3 provides SDL_GetPerformanceCounter and SDL_GetPerformanceFrequency for this purpose.

The third way is to externalize synchronization, to rely on display hardware to indicate when a new frame can be drawn. This is done via "vertical synchronization," also called VSync. If your GPU supports VSync and you enable VSync in your renderer, your GPU synchronizes drawing the frame buffer with the refresh rate of your display hardware (ex: your monitor). If your mornitor supports running at 60 Hz and 144 Hz, and if you configure your monitor to run at 144 Hz, then your GPU will send whatever is in your graphics buffer to the display 144 times per second, making your display frame rate 144 Hz.

To use VSync effectively in your game, you likely need to make your game loop frame rate-independent. If your player has a very fast refresh rate on their monitor but your game sometimes takes a while to draw its next frame, you might not hit your frame timing. It's no problem to draw the same thing twice while you're preparing your next frame, but you need to make sure you account for the passage of time independent of refresh rate (performance counters help with this). It helps to do general math in your game in SI units--meters, seconds, etc. This way regardless of whether you hit your frame time or not, the movement of things in your game (for example) is natural. Things don't move faster than expected on fast refresh rates and slower than expected on slower refresh rates.

SDL3 provides SDL_SetRenderVsync for this purpose.

Regarding the first and second approaches to hitting a target frame rate, they are complicated by two assumptions.

The first assumption is that your player's display hardware is some harmonic of your desired target frame time. In other words, if your target frame rate is 60 FPS, then if your player's monitor is rendering at 60 FPS or perhaps 120 FPS, you most likely will not encounter screen tearing. Your 60 Hz game loop might be out of phase of the display hardware's refresh cycle, but this means you'll always have your next frame prepared by the time the display hardware draws its next image. Also, your game's cycle time is not outpacing or falling behind the cycle time of the display hardware, so you will likely never be partway through preparing a new frame when the display hardware starts reading the frame data to draw the next image. If your cycle timing is not a harmonic of the display hardware's, your game loop will, theoretically, eventually be updating a frame while the display hardware is reading the frame buffer to draw its next image.

The second assumption is that you always hit your frame time. Even if your cycle time is a harmonic of that of your display hardware, if your game loop runs longer than your target frame time, you are at risk of screen tearing.

With all this said, unfortunately none of these three techniques guarantees smooth rendering. See this great article for why.

As for me, for now, I am using the performance counter technique for simplicity.


2025-182

Today I implemented simple tilemap rendering. In my previous post, I discussed exporting a tilemap from an Aseprite file as a binary file. I parsed that binary file in my game and then used SDL3_image to load the tileset image. I was considering using stb image to do the loading, but since I'm committing to SDL, I figured I'd learn SDL3_image. If it feels too heavy, I'll switch to stb.

After building and using SDL3_image, when launching my game, the main window would open, close quickly after, and then open again. The fix for this was calling SDL_CreateWindow with the SDL_WINDOW_HIDDEN flag and then showing the window only after I create the renderer (which I do separately from creating the window).

The gist of accelerated graphics rendering in SDL is to load your media into a surface and then from a surface create a texture. In my case, I am dealing with tilemaps which themselves are built from tilesets. For the uninitiated, I'll give a quick outline.

A tileset consists of sprites. A sprite is essentially an image. Using text as make-believe images, an example tileset might contain these three "sprites": _.=. Using the previous three characters, we might build the map:

========
=_..___=
=___...=
========

Who knows what that represents, but it is a map built from the "sprites" in our tileset. If we were talking about actual computer graphics, our map would be some larger, rendered image. The rendered image's size in RAM would increase as the image got larger, but we know that the image (in this examlpe) is only composed of three unique sprites, those from our tileset. We could compress the tilemap by instead representing it as indices into the tileset.

22222222
20110002
20001112
22222222

The sprites in a tileset are also known as tiles. So here we have represented a tilemap by showing which tiles to use when rendering the map by referring to tiles (by index) in a tileset.

Back to SDL, this leaves me with multiple ways to actually store the tile data. Currently my binary tilemap export only encodes a single tilemap (due to the way I drew the map in Aseprite). I could load that tileset and then parse individual tiles out of its pixel memory, create a surface for each unique tile, and then create a texture for each unique surface. Nothing wrong with this. But what I did instead was create a single texture for the tileset, and at render time, I index into the texture to render only a portion of the tileset texture (the tile I want) to the window. The upside of this is that there are fewer resources to deal with, fewer allocations. The downside is that all tiles are stored in a single texture, so I cannot apply per-tile modulation (recoloring, transforming, etc.) as easily.

Here is an excerpt of code I wrote showing how I render from this single tileset texture.

/**
 * Render a tilemap instance to a renderer. `tileset` is expected to contain
 * the entire tileset referred to by the tilemap, and the tile properties are
 * assumed to be compatible with dimensions specified in the tilemap.
 */
void tilemap_render(Tilemap* tm, SDL_Texture* tileset, SDL_Renderer* renderer) {
  uint16_t tileset_tile_width = tileset->w / tm->tile_width_px;
  for (int i = 0; i < tm->height_tiles; i++) {
    for (int j = 0; j < tm->width_in_tiles; j++) {
      uint16_t tile_index = tm->tiles[i * tm->width_in_tiles + j];
      float tile_x = tile_index % tileset_tile_width * tm->tile_width_px;
      float tile_y = tile_index / tileset_tile_width * tm->tile_height_px;
      SDL_FRect src_rect = {
        tile_x, tile_y,
        (float)tm->tile_width_px, (float)tm->tile_height_px
      };
      SDL_FRect dst_rect = {
        (float)j * tm->tile_width_px,
        (float)i * tm->tile_height_px,
        (float)tm->tile_width_px,
        (float)tm->tile_height_px
      };
      SDL_RenderTexture(renderer, tileset, &src_rect, &dst_rect);
    }
  }
}

I haven't done much work on the general encapsulation; just standing things up right now.


2025-181

Today marks my first day working full-time on my own projects. I'm focusing on building a game. Toward learning how everything works end-to-end from first principles, I'm building a game engine. I will likely ship with SDL3, though.

My game will feature tilemaps built from pixel art tiles. I love using Aseprite for my art. At the time of this writing, Aseprite supports building tilemaps but only supports exporting them using an out-of-tool script. This script exports tilemaps and supporting data as JSON. I don't want to write or depend on a JSON parser, though, so I chose to fork and extend the script to output binary.

Aseprite scripts are written in Lua, so I needed to learn Lua. I half-jokingly searched for "learn lua in 10 minutes" and turned up Learn Lua in 15 Minutes. Using this, the official documentation, and asking an LLM questions here and there, I picked up Lua pretty quickly.

Some interesting things I learned are that Lua has a way to allow one Lua file to import another (require) which caches anything that the imported file runs on import so that when additional imports of the same file happen, nothing top-level runs again. There is another mode (dofile) which imports without caching, always running anything exposed at the top-level scope in the imported file. Another thing is that Lua only has a table data structure which doubles as an array. Array indices start from 1, not 0. You can use negative indicies when accessing the command-line arguments to find arguments passed to Lua itself, not to your script.

Anyway, without showing the full glue code, I introduced this binary.lua to the exporter script, added command-line argument parsing to the main script, and allowed the caller to either export JSON or binary. I'll stick with binary and write a simple parser in my game.

When calling a non-graphical script from Aseprite, you do so from the command line. You pass args before specifying the script. This is for at least Aseprite v1.3.14.3.

aseprite -b foo.aseprite --script-param bar=baz --script my_script.lua

The content of binary.lua in its initial form is below.

-- Export an Aseprite tilemap and supporting data to a binary file.
--
-- Output Schema
-- -------------
-- schema_major_ver: uint8_t
-- schema_minor_ver: uint8_t
-- schema_patch_ver: uint8_t
-- canvas_width_px : uint16_t
-- canvas_height_px: uint16_t
-- tilesets        : array<tileset>
-- layers          : array<layer>
--
-- array
-- -----
-- An array `array<T>` is a single `uint64_t` indicating the number of instances
-- of `T` immediately after. The instances of `T` are contiguous in memory.
-- For example, a 2-element array of type `array<uint8_t>` is laid out as
-- follows in memory:
--   uint64_t (num elements)
--   uint8_t  (first element)
--   uint8_t  (second element)
--
-- string
-- ------
-- A string is a single `uint64_t` indicating the number of characters
-- immediately following and then said number of characters. The characters are
-- 8-bit bytes. Each byte is a single ASCII character.
--
-- tileset
-- -------
-- image_pathname: string
-- tile_width_px : uint16_t
-- tile_height_px: uint16_t
--
-- layer
-- -----
-- name        : string
-- tileset_id  : uint16_t
-- width_tiles : uint16_t
-- height_tiles: uint16_t
-- tiles       : array<index_into_tileset>
--
-- index_into_tileset
-- ------------------
-- index: uint16_t
--
-- Notes:
-- - Strings do not support Unicode.
-- - All numbers are encoded as little-endian.
-- - The current schema only supports a single cel per layer.

local binary = { _version_major = 0, _version_minor = 0, _version_patch = 1 }

-- Write select keys out of table `t` to the open file `f` as binary. See
-- the output schema in the file documentation for what will be written out to
-- `f`.
function binary.encode(f, t)
  -- Schema version.
  f:write(string.pack("<I1", binary._version_major))
  f:write(string.pack("<I1", binary._version_minor))
  f:write(string.pack("<I1", binary._version_patch))

  -- Canvas width, height.
  f:write(string.pack("<I2", t.width))
  f:write(string.pack("<I2", t.height))

  -- Tilesets.
  f:write(string.pack("<I8", #t.tilesets))
  for i = 1, #t.tilesets do
    local ts = t.tilesets[i]
    f:write(string.pack("<I8", #ts.image))
    f:write(ts.image)
    f:write(string.pack("<I2", ts.grid.tileSize.width))
    f:write(string.pack("<I2", ts.grid.tileSize.height))
  end

  -- Layers.
  f:write(string.pack("<I8", #t.layers))
  for i = 1, #t.layers do
    local l = t.layers[i]
    f:write(string.pack("<I8", #l.name))
    f:write(l.name)
    f:write(string.pack("<I2", l.tileset))

    if #l.cels > 1 then
      error("Layer " .. i .. " has more than 1 cel")
    end

    local cel = l.cels[1]
    f:write(string.pack("<I2", cel.tilemap.width))
    f:write(string.pack("<I2", cel.tilemap.height))
    f:write(string.pack("<I8", #cel.tilemap.tiles))
    for j = 1, #cel.tilemap.tiles do
      f:write(string.pack("<I2", cel.tilemap.tiles[j]))
    end
  end
end

return binary

2025-175

1:14 to 1:28.