Blog
Atom feed is here.
As an unintentional follow-up to yesterday's post, I ended up writing a simple bump allocator. The motivation was that I wanted to build a routine to convert relative paths to absolute paths. For this, I need a place to store the absolute path. The relative paths are relative to the folder the game binary sits in, so I can't predict the full string length of the absolute path, meaning I need to dynamically allocate.
Here is the API for the bump allocator; no surprises if you've seen one before.
struct Allocator {
void* base = nullptr;
size_t size = 0;
size_t used = 0;
};
/**
* Set up an allocator. The allocator's memory is guaranteed to be zeroed.
*
* @param a A pointer to an allocator to set up; cannot be `NULL`.
* @param size The number of bytes to allocate. Must be greater than 0.
* @param addr The beginning address to allocate from. Pass `NULL` to defer base
* address selection to the operating system. Note that any other
* value greatly reduces portability.
* @return `false` if setting up the allocator failed for any reason; `true`
* otherwise.
*/
bool allocator_init(Allocator* a, size_t size, void* addr);
/**
* Allocate memory from the allocator.
*
* @param a The allocator to allocate from; cannot be `NULL`.
* @param size The number of bytes to allocate. Must be greater than 0.
* @return If successful, returns a pointer to the beginning of the allocated
* region. If unsuccessful for whatever reason, returns `NULL`.
*/
void* allocator_alloc(Allocator* a, size_t size);
/**
* Reset an allocator, meaning that its memory pool is considered entirely
* unused and the next attempt to allocate will start at the beginning of the
* backing memory pool. This does not deallocate anything from the operating
* system's perspective, but anything previously allocated in the allocator's
* memory pool should be considered invalid after calling this.
*
* @param a The allocator to reset; cannot be `NULL`.
*/
void allocator_reset(Allocator* a);
/**
* Destroy an allocator.
*
* @param a The allocator to destroy; cannot be `NULL`.
* @return `false` if destroying the allocator failed for any reason.
*/
bool allocator_destroy(Allocator* a);
I give the user the option to specify the base address of the memory pool or let the operating system pick it. In production, I'll let the operating system pick it, but in development, I'll control the base address so I can predict the general location of things in memory across program runs.
In the implementation of allocator_init
, I use mmap
instead of malloc
,
like so:
bool allocator_init(Allocator* a, size_t size, void* addr) {
assert(a);
assert(size);
int flags = MAP_PRIVATE | MAP_ANONYMOUS;
if (addr) {
flags |= MAP_FIXED_NOREPLACE;
}
a->base = mmap(addr, size, PROT_READ | PROT_WRITE, flags, -1, 0);
if (!a->base) {
return false;
}
a->size = size;
a->used = 0;
return true;
}
In short, we permit reading and writing on the allocated space, the memory
region is private to our process (MAP_PRIVATE
), we do not use file-backed
memory (MAP_ANONYMOUS
and -1
for the file descriptor parameter), and
should the caller request a specific base address, we tell the operating
system so a fixed address and force failure if our requested address has already
been mapped by us elsewhere in our process (MAP_FIXED_NOREPLACE
).
I'm using this as a frame allocator, meaning that at the beginning of each frame
(at the beginning of each iteration through the game loop), I call
allocator_reset
. I set up the allocator right after initializing SDL near the
top of main
, outside of the game loop. This allows me to use the allocator
before entering the game loop for any one-off scratch work I need to do, and
then immediately upon entering the game loop (with the exception of capturing
the frame start timestamp), the allocator is immediately reset.
Before entering the game loop, I use my relative-path-to-absolute-path utility
to build the full path to the game logic library so that I can check its last
modification time via stat
. I then periodically re-build the full path in the
game loop when I go to re-check the modification time, so in the future, I will
likely also introduce separate, persistent storage to cache such things.
Instead of having to implement my own way to get the base path of the game
binary across operating systems (using convenience functions in Windows and
checking /proc
things in Linux, for example), I can thankfully just leverage
SDL_GetBasePath.
The interface of my routine to build absolute paths looks like this:
char* abs_path(const char* path, Allocator* a);
It takes an allocator to use to obtain storage for the final string. The Jai
language I mentioned in a previous post does something similar for routines that
need to dynamically allocate but passes the current allocator to the function
implicitly in an argument called the context. I don't have that language feature
in C/C++ without approximating via OOP, globals, or function/method partials,
so the allocator is an explicit parameter in my case. I don't like this
signature much, but it accomplishes what I need. I haven't covered this detail
yet, but I have also separated more of the game logic code from the "platform"
and exposed a higher-level game_update
function from the game logic library
to the platform. The game_update
function will likely eventually receive a
handle to the frame allocator so that it can allocate whatever it needs in a
single frame in a manner I can easily control and introspect. That is how
game logic code will also be able to use things like abs_path
without owning
any particular allocator or backing memory pool.
Here are a couple other things I learned along the way. First, it is not only
Linux that has alloca;
Windows
has it
as well. The main difference besides the literal function name is that Linux's
causes undefined behavior on stack overflow whereas Windows's will raise a
structured exception.
I could use this for building strings as I don't anticipate memory required
for paths to be so large that it would cause a problem on the stack. The
second thing I learned is that man pages can even reference books. I noticed
the man page for mmap(2)
(Linux man-pages 6.7, 2023-10-31) referenced
what appeared to be a book, and upon looking it up, indeed, it is "POSIX. 4:
Programming for the Real World," by Gallmeister. Interesting, might want to
skim through it.
In my game, I want to control the pool of memory from which all allocations are made. This will help with debugging and also general stability, if I can allocate all memory I will need up front and then parcel that memory out internally on-demand. I haven't gone into much detail about why I am not using a game engine, but having more control over allocations or at least the memory pool being used is one of them.
SDL provides
SDL_SetMemoryFunctions
which a developer can use to specify the functions to call in place of the
typical malloc
, calloc
, realloc
, and free
. I have previously
implemented my own simple linear allocator, from which all I ever did was
"malloc
" a few times. SDL is its own animal, though, and until I read its
code, I need to treat it as if it could ({de|re})allocate at any time. This is
unfortunate since it complicates the problem for me, but it's one price I pay
for using a powerful abstraction like SDL.
Although I am compiling SDL3 from source for my game, I have not read into the code yet. To get a rough sense for how many allocations are happening and which memory-related functions are actually called, I wrote shims for the memory functions, registered them, and ran my game. One such shim looks like
static int my_malloc_count = 0;
void* my_malloc(size_t s) {
printf("my_malloc called; c: %d\n", my_malloc_count++);
return malloc(s);
}
What I see is that a lot of allocation happens initially, both with calloc
and
malloc
. This is SDL initializing its subsystems. On shutdown, a lot of
free
ing happens, which is cleanup. On the first pass through the game loop,
calloc
is called hundreds of times, realloc
is called maybe about ten times,
and malloc
is called a few times. realloc
is also called, although extremely
rarely relative to the other calls. Presumably SDL lazy-initializes some things,
and the first pass through the game loop is having SDL finish its initialization
as I use various subsystems. On all other iterations through the game loop, I
see only the following (keep in mind my game loop is relatively sparse right
now).
my_calloc called: c: 1160
my_free called; c: 251
my_free called; c: 252
my_malloc called; c: 297
Debugging shows that SDL_RenderPresent is what I call that in turn calls other functions which are responsible for the (de)allocations above. For what it's worth, I am using the GL rendering backend. I switched to the software rendering backend and experience the same behavior.
In a previous life, the linear allocator I mentioned didn't need to support
any efficient free
ing because I only ever allocated a fixed number of times to
set up buffers I would later reuse. What I see here is that SDL will routinely
allocate and free. So should I want to point the memory allocators at my custom
memory pool, I need to design the pool and the functions that use it in a way
that will efficiently recycle memory.
By the way, I was initially calling SDL_Log
in my allocator shims for
debugging, but this was a bad idea as SDL_Quit
also uses the memory
functions after it has turned at least some subsystems off, causing SDL
to request re-initialization of the logging subsystem and then hang indefinitely
in what appears to be a sleep-and-check loop.
my_free called
^C
Program received signal SIGINT, Interrupt.
(gdb) where
#0 in __GI___clock_nanosleep
#1 in __GI___nanosleep
#2 in SDL_SYS_DelayNS
#3 in SDL_Delay_REAL
#4 in SDL_ShouldInit_REAL
#5 in SDL_InitLog
#6 in SDL_CheckInitLog
#7 in SDL_GetLogPriority_REAL
#8 in SDL_LogMessageV_REAL
#9 in SDL_Log
#10 in my_free
#11 in SDL_free_REAL
#12 in SDL_RemoveHintCallback_REAL
#13 in SDL_QuitLog
#14 in SDL_Quit_REAL
#15 in SDL_Quit
#16 in main
(gdb) quit
I also noticed
SDL_aligned_alloc in the
SDL documentation. It has a corresponding SDL_aligned_free
function as well.
The documentation says that SDL_aligned_free
must be used to free memory
allocated with SDL_aligned_alloc
, so I'm not concerned about having to
accommodate these (de)allocations in my implementation. But their presence does
raise the question of whether they would ever be called in my runtime. If they
are called, my hopes of centralizing and controlling memory allocation are
dashed. SDL3/SDL_Surface.h
defines the surface flag SDL_SURFACE_SIMD_ALIGNED
with the documentation
Surface uses pixel memory allocated with SDL_aligned_alloc()
and grep
suggests the aligned (de)allocator is generally only used when
leveraging
SIMD. I also
see the aligned memory functions used in implementations for the PS2, PSP, and
PS Vita, but these are not targets of mine. I checked whether that surface flag
was set on surfaces I am loading, and it is not. So I can probably safely
ignore the aligned (de)alloc interface.
So now the question is, how much work is it going to be to implement my own memory pool and allocator to support SDL's behavior? Regardless of the implementation effort, there is also the testing effort, such as unit tests and then fuzzing with asan enabled or something like that.
...
This explanation of slab allocators in older verions of Linux tells me I'm going to be spending a lot of time writing an efficient allocator. The slab allocator is an interesting concept and an appropriate one to use in my case, in my opinion. One feature is that it caches previous allocations of particular sizes, anticipating that such allocations may be requested again. This is exactly the case in my game loop. It also does more advanced things such as adding "coloring" (padding) between slabs to try and keep different allocated regions in different cache lines, so that re-allocating one region does not evict neighboring regions from the cache.
Then there is the backing
buddy allocator which
handles coarser-grained allocations, typically on the order of pages. I don't
need to bother much with this, as for my own pool I'd make a single mmap
or
malloc
call to the system to get the memory for my pool.
I would like to leverage something like an
arena allocator
for per-frame storage and just clear the storage at the top of each iteration
of the game loop, but that SDL_RenderPresent
calls malloc
and doesn't
seem to free that allocation until the next call through tells me I shouldn't
free the underlying memory on a frame boundary. The Jai language has a dedicated
temporary storage that is recommended to be used in just this way for
applications with something like a game loop in them. (Aside: Jai is my favorite
language in the C++-contender space. I am just using my favorite dialect of C++,
which is C with funtion and operator overloading, for early game development
work because I want to reduce the amount of things I need to get better at all
at once.) I'd even accept two memory pools, one persistent across the lifetime
of the application for things like SDL subsystems, and the other being that
frame-temporary storage. Alas, I don't think I can find the lifetime boundaries
of allocations made by dependencies so easily and thus don't know when an
arena could be reset.
So I wondered, could I use something like jemalloc and just hand over a memory pool to the library and have it allocate from there? The answer is yes, although this is non-trivial, and it of course brings in another sophisticated dependency. So at the expense of making my game harder to trace and debug, I would have more control over the memory (de)allocations. I'm not ready to make that trade.
Lastly, I am toying with the idea of just letting SDL use the system-provided allocators and introducing a simple bump or arena allocator for per-frame storage. The current conclusion of my investigation is to put this off to a later time. The control/complexity tradeoff is not in my favor yet.
I implemented basic code hot reloading. Say in your game, you are in the middle of playtesting some long, complicated level, and you find some undesirable behavior. You want to modify your game code to fix the behavior, but then you have to recompile your game and get back to where you found the issue in order to test your change--that or hack up a minimal reproducible example for this one case--to verify if your fix worked.
Another approach is to separate the code for the game logic from the platform code and then dynamically load the game logic into the running platform. This way, you can change the game logic while keeping the game with all of its game state alive and test changes much more quickly. I have to thank Handmade Hero for this idea. I'm currently developing on Ubuntu, so this post will cover how I achieved hot reloading on Linux.
First, a simple demonstration. The video below shows a running game instance where I have moved the player somewhere away from its origin. The tile the player currently occupies is initially red, but then I change the game logic to render it as green, recompile the game logic library, and the game instance detects the library change, reloads the library, and then the tile is rendered as green, all while keeping the game and its state alive.
Until now I kept all the game code in a single file, but I pulled some (rather arbitrary) code out into a separate file to have something to compile into the game logic library while I was standing up hot reloading. Once we do this, we need to add a layer of abstraction between the functions the game logic library provides and the callsites outside the library. Essentially, we replace direct calls into some code with calls through function pointers that we point to implementations of routines we want to call. The implementations exist somewhere in our game logic library which we will now be dynamically loading into our platform.
The gist is like this.
#include <dlfcn.h>
// ...
constexpr int64_t TARGET_FPS = 60;
constexpr int64_t GAME_LIB_CHECK_EVERY_N_FRAMES = TARGET_FPS; // Every second.
typedef int (*FooFn)(Foo*, Bar);
struct stat lib_stat = {0};
const char* lib_pathname = "./path/to/lib.so";
int stat_ret = stat(lib_pathname, &lib_stat);
assert(stat_ret == 0);
uint64_t lib_stat_frame_counter = 0;
void* lib_handle = dlopen(lib_pathname, RTLD_LAZY);
assert(lib_handle);
FooFn foo_fn = (FooFn)dlsym(lib_handle, "foo");
assert(foo_fn);
while (run_game_loop) {
// ...
// Maybe reload game logic library.
// ...
Foo f{};
Bar b{};
foo_fn(&f, b);
}
The typedef
establishes an interface between the library to hot reload and
the end user. Unlike what I show here, I'd put it in header files corresponding
to the library to be reloaded. We then deal with
stat(2) and
stat(3) which we will
be using to get at the file modification timestamp on our game logic library.
stat
needs to get at inode information
and is thus a syscall, so it is "slow," but we also aren't going to be recompiling
our library every frame, so we don't need to poll for the modification timestamp
on every iteration through the game loop. That's where lib_stat_frame_counter
comes in--we'll only poll every so many frames; in this case, once a second.
We next call dlopen which
loads a library and gives us back a handle to it. We perform lazy symbol loading
with RTLD_LAZY
to avoid loading in all symbols our library might contain when
I will only use a few of them on the platform side. To load symbols of interest,
we call dlsym. We cast its
return value to the type of the interface our requested symbol represents.
Not shown in the code above, but in the game loop, every
GAME_LIB_CHECK_EVERY_N_FRAMES
, we call stat
again and check if the
modification timestamp on our library has changed. If it has, we
dlclose our current
handle on our game logic library and then repeat the dlopen
and dlsym
work
to get at the new implementation of our exported routines.
Finally, the game loop calls our routines through our function pointers; shown
above is a call through foo_fn
.
In a C++ context, the symbol name of the function foo
will be
mangled. To prevent this, we wrap
our declaration of foo
in our game logic side inside extern "C" { }
. If you
wanted to see symbol names in your library on Linux, you can do so with nm
.
Here are actual (truncated) examples from the test code I hoisted into my game
logic library, first mangled and then not.
$ nm -D build/Debug/libgame.so
000000000000120f T _Z17map_pixel_to_tileP7Tilemapm8Vector2f
000000000000130e T _Z24player_render_tile_underP6PlayerP7TilemapP12SDL_Renderer
00000000000011b9 T _Z5round8Vector2f
$ nm -D build/Debug/libgame.so
000000000000120f T map_pixel_to_tile
000000000000130e T player_render_tile_under
00000000000011b9 T _Z5round8Vector2f
I don't need round
on the platform side, so I leave its symbol name mangled.
In this test, I do want map_pixel_to_tile
and player_render_tile_under
,
though, so I needed to not mangle their names.
Code hot-reloading is for development and debugging only. Release builds will link the game library in statically.
I implemented simple collision with the environment. This time, it is probably clearer for me to show it in action first.
There are a few new things going on in this video. The first is that the map looks different from last time. I updated the tilemap to add a few things the player can collide with besides the walls along the edges. I am currently exclusively using Aseprite for tilesets and tilemaps. Unlike map editors such as Tiled, Aseprite does not provide a way to annotate tiles in a tilemap with auxiliary information. In Tiled, you can do so, and one such attribute is drawing invisible bounding boxes which can be used to indicate collision areas.
The way I annotated collision in Aseprite was to use an additional tilemap layer. The tileset currently has only 2 tiles, technically 1 tile and the empty tile which occupies an index in the tileset. I set the collision layer to be higher in the layer stack than the map itself, set the opacity of the collision layer to somewhere between 25 and 50 percent, and then I drew over any tile I wanted to add collision to. I used a pink and black checkered pattern for the tile indicating a collision area in hopes that it would stand out from other tiles I may draw on the map.
This approach is not convenient because I need to manage at least 2 tilemaps for every level now. It's bug prone because I might change the underlying map without updating the collision layer or vice versa, or I may change the index values of tiles in the collision map but forget to do so in the game code. It also forces a particular software design on my game, having to parse a unique collision map for every tilemap I load. With all of that said, it's simple, and it works, so I'm going with it for now.
I could design an alternative tilemap representation toward my game, as I do control the binary format of the exported tilemap. One alternative is to set one of the high bits in the tile IDs in the map that logically have collision, essentially merging my collision map into a bitfield in each corresponding tile in the game map. If I recall correctly, Tiled does something similar with bit-flag attributes on tiles. At this point in time, it's not worth me implementing this myself, though.
The ultimate goal is an in-game level editor, but I'm far from that at this time.
The next thing in the video is the red square that seems to be following the player. This is a visual annotation of the tile the player is considered to occupy. I've added it for debugging. Unlike other kinds of software development, it's not so straightforward or even realistic to write traditional, automated unit tests for implementations of various things in games. Often times we get more information more quickly from visual feedback. Testing this way could also explain how so many bugs slip through QA in games all the time.
Computing the tile the player occupies introduces us to what will become a handful of coordinate systems in the game. Currently, for ease of development and debugging, I have only one coordinate system in the game. (0, 0) is at the upper-left corner of the game window, and it is the origin for both the camera and the map. Furthermore, everything is measured in pixels, until now. Determining the tile coordinate from a pixel coordinate, assuming the pixel and tile coordinates share the same origin, is straightforward. In my case, positions support sub-pixel accuracy and so are floating-point numbers.
Vector2d round(Vector2f v) {
return {
.x = (int32_t)(v.x + 0.5f),
.y = (int32_t)(v.y + 0.5f),
};
}
Vector2f pixel_xy = // ... ;
Vector2d int_pos = round(pixel_xy);
size_t tile_pos_x = int_pos.x / TILE_WIDTH_PX;
size_t tile_pos_y = int_pos.y / TILE_HEIGHT_PX;
I designed the collision map and game map so that they are the same dimensions and so that the tile width and height are the same across maps. I can therefore compute the index into the collision map tiles and check whether a tile has collision area or not. If the player tries to move into a map tile with backing collision area, I disallow the position update. That's what we see with me running into walls in the first video.
Now that we color the tile the player is occupying, we can also see that although the player is currently rendered as an entire tile, the player position is represented as a single point, the upper-leftmost point of the player tile. Because of this, the player can somewhat walk into walls to their right and below them. I'll address this later on by changing the position of the player relative to the player's rendered tile or perhaps changing the player's collision point to a collision area.
Lastly, although it is difficult to tell, we stick to walls somewhat. For example, if I press against a wall on my left side and then attempt to move diagonally up-left or down-left, I won't move at all. This is because my proposed position is up-left or down-left of me which enters the collision area on my left, and my entire position update is disallowed. I'll fix this eventually, likely by checking whether we can move horizontally separately from whether we can move vertically.
I implemented basic player movement in the game. First, I created a simple tileset for the player and loaded that into the game as a single SDL texture. Each tile in the tileset is 16x16 pixels, and there are 4 tiles. The tiles are blue squares with a white arrow showing which direction the player is facing, so I have one tile per cardinal direction.
I then created a simple Player
struct to hold a pointer to the texture and
other supporting information such as the cardinal direction the player is facing
and the player's position.
On each iteration through the game loop, I poll for and handle keyboard events.
In the event handling logic, I save off the new key states into an instance of
my own Buttons
struct. I don't modify the player state immediately in the
event-handling code. The simple Buttons
struct looks as follows.
struct Button {
bool pressed = false;
};
struct Buttons {
Button up = {};
Button down = {};
Button left = {};
Button right = {};
};
For now, we just keep track of whether the button is down on the current frame.
We don't track whether it was pressed this frame or has been held down for some
number of frames. With this setup, if we wanted to understand the input delta
from the last frame, we could simply create a second instance of Buttons
and
store the latest frame's inputs in the second instance at the end of the game
loop so that it is available on the next iteration through the loop.
Buttons previous_buttons = {};
while (running) {
Buttons buttons = {};
while (SDL_PollEvent(&event)) {
switch (event.type) {
case SDL_EVENT_KEY_UP:
// Fall through.
case SDL_EVENT_KEY_DOWN: {
if (event.key.key == SDLK_DOWN) {
buttons.down.pressed = event.key.type == SDL_EVENT_KEY_DOWN;
} else if (/* ... */) {
// ...
}
}
break;
default:
break;
}
}
// ...
previous_buttons = buttons;
}
When it comes time to update the player based on input, I just check whether a certain button is pressed and then update the player's facing direction and a proposed position.
Vector2f proposed_position = player.position;
float dpos = (TARGET_FRAME_TIME_NS / 1000000000.0f) *
RUNNING_VELOCITY_MS / METERS_PER_PIXEL;
if (buttons.down.pressed) {
player.facing = Direction::DOWN;
proposed_position.y += dpos;
} else if (buttons.up.pressed) {
player.facing = Direction::UP;
proposed_position.y -= dpos;
}
// The the same for right, left.
player.position = proposed_position;
Here I assume that I always hit my target frame time, and then I multiply that
time converted to seconds with my configured running velocity to get a distance
the player would have moved in meters. The screen is drawn in pixels, though, so
I convert meters to pixels, and that is my delta position (dpos
).
I also handle down-up and right-left separately. This way the player can move
both vertically and horizontally at the same time. I handle down before up,
meaning that if the player presses both down and up at the same time, down takes
priority. They will face and move downward. (Notice that I add, not subtract,
dpos
in the down case--SDL's coordinate system places (0, 0) at the top-left
of the rendering space.)
Handling right-left separately means that the player will always prioritize facing a right-left cardinal direction instead of a down-up one if either right or left were pressed on a frame.
Although I simply update the player's position with the proposed position here, I am setting myself up for handling collision. It's possible we should not be able to move where we want to move, so I keep the proposed position separate so I can test it before updating the player state.
Here is a video of what I have so far. The grey border is supposed to be walls, and the white is just a placeholder tile for the ground. You can see the blue player moving around with the arrow updating in the direction the player is facing (or the prioritized direction if the player is moving diagonally). The frame rate is low, but you can tell by the diagonal movement that while the map is based on tiles, the movement is based on pixel positions and is not tile-based. Two outstanding issues are the fact that there is no collision yet and that the player is faster if moving diagonally as opposed to just vertically or horizontally.
When I went to implement basic player movement, I needed to capture keyboard input. This is done in SDL by polling for events and then processing keyboard-related ones.
SDL_Event e;
while (SDL_PollEvent(&e)) {
switch (e.type) {
case SDL_EVENT_KEY_UP:
// Fall through.
case SDL_EVENT_KEY_DOWN: {
if (event.key.key == SDLK_DOWN) {
// ...
}
}
break;
default:
break;
}
}
What I want to focus on in this post is the implementation of SDL_Event
. If
you look at its documentation, you'll
see that it's a union
. Its first member is type
which encodes the event
type--it's what we're switching on in the example above. If you look at the
implementation of any of the SDL_*
union members, you will see the first
field of each of those structs is an instance of SDL_EventType
.
SDL_EventType
is implemented as a C enum, which is represented as an int
behind the scenes, which is the same number of bits (although different
semantics) as the Uint32
(uint32_t
-equivalent) type
member in the
SDL_Event
union on nearly all modern platforms.
For reference, see the implementations of, for example, SDL_ClipboardEvent, SDL_TouchFingerEvent, SDL_MouseButtonEvent, or SDL_QuitEvent.
In other words, no matter which member of the union you work with, the first field will always be something encoding the event type.
This construct is called a discriminated union (or tagged union). Here is an all-in-one example:
enum TypeTag { kFoo, kBar, kNone };
struct Foo {
TypeTag tag;
// ...
};
struct Bar {
TypeTag tag;
// ...
};
union DiscriminatedUnion {
TypeTag tag;
Foo foo;
Bar bar;
};
With discriminated unions, you can avoid inheritance and still compose one type
that behaves as an instance of this or that type depending on some context, and
you just check the value of tag
to know how to access fields of the union. You
lose anything you might have otherwise gained through polymorphism, and the
total size of DiscriminatedUnion
is the size of its largest member (plus
padding, potentially; see the interesting note at the bottom of the SDL_Event
implementation). However, the
discriminated union construct supports
data-oriented design by
concentrating data of interest nearby in memory instead of potentially scattered
about through a network of pointers (whether these pointers be
pointers-to-base-class in some data structure, or members of instances pointing
to this and that data, or the CPU figuring out which methods to call via
dynamic dispatch).
Today a simple feature I added to the game was a target frame rate. This is typically done in one of three ways.
The first is by using timers in your game code. You capture the start time of when processing begins for a single frame, then you capture the time processing ends for that frame. Subtract the times, and if the time required to prepare the frame is less than your target frame time, sleep the thread for the difference in times.
The major issue with this approach is that the thread you're collecting timing information on can be preempted at any time by the operating system. Depending on which operating system (or language library) features you use to measure time, you may accidentally be ignoring the time your thread spent off the CPU. That time passes in the world outside the computer and so can cause inconsistencies between the time your game thinks elapsed and the wall-clock time that actually elapsed.
The second way to achieve a target frame time is by using performance counters. Performance counters are typically built into modern processors and track some kind of low-level hardware event. In this way, the counter itself is independent of operating system nuances such as scheduling. The operating system provides an API to access the performance counter value. As we don't necessarily know what is being measured by the performance counter, we don't know what the count is relative to. Thus, the counter value itself is not a measure of the passage of time. The operating system also provides an API to query the frequency of increments to the performance counter, and with this information, we can compute the passage of time.
dt_s = (perf_counter_now - perf_counter_then) / perf_counter_frequency
SDL3 provides SDL_GetPerformanceCounter and SDL_GetPerformanceFrequency for this purpose.
The third way is to externalize synchronization, to rely on display hardware to indicate when a new frame can be drawn. This is done via "vertical synchronization," also called VSync. If your GPU supports VSync and you enable VSync in your renderer, your GPU synchronizes drawing the frame buffer with the refresh rate of your display hardware (ex: your monitor). If your mornitor supports running at 60 Hz and 144 Hz, and if you configure your monitor to run at 144 Hz, then your GPU will send whatever is in your graphics buffer to the display 144 times per second, making your display frame rate 144 Hz.
To use VSync effectively in your game, you likely need to make your game loop frame rate-independent. If your player has a very fast refresh rate on their monitor but your game sometimes takes a while to draw its next frame, you might not hit your frame timing. It's no problem to draw the same thing twice while you're preparing your next frame, but you need to make sure you account for the passage of time independent of refresh rate (performance counters help with this). It helps to do general math in your game in SI units--meters, seconds, etc. This way regardless of whether you hit your frame time or not, the movement of things in your game (for example) is natural. Things don't move faster than expected on fast refresh rates and slower than expected on slower refresh rates.
SDL3 provides SDL_SetRenderVsync for this purpose.
Regarding the first and second approaches to hitting a target frame rate, they are complicated by two assumptions.
The first assumption is that your player's display hardware is some harmonic of your desired target frame time. In other words, if your target frame rate is 60 FPS, then if your player's monitor is rendering at 60 FPS or perhaps 120 FPS, you most likely will not encounter screen tearing. Your 60 Hz game loop might be out of phase of the display hardware's refresh cycle, but this means you'll always have your next frame prepared by the time the display hardware draws its next image. Also, your game's cycle time is not outpacing or falling behind the cycle time of the display hardware, so you will likely never be partway through preparing a new frame when the display hardware starts reading the frame data to draw the next image. If your cycle timing is not a harmonic of the display hardware's, your game loop will, theoretically, eventually be updating a frame while the display hardware is reading the frame buffer to draw its next image.
The second assumption is that you always hit your frame time. Even if your cycle time is a harmonic of that of your display hardware, if your game loop runs longer than your target frame time, you are at risk of screen tearing.
With all this said, unfortunately none of these three techniques guarantees smooth rendering. See this great article for why.
As for me, for now, I am using the performance counter technique for simplicity.
Today I implemented simple tilemap rendering. In my previous post, I discussed exporting a tilemap from an Aseprite file as a binary file. I parsed that binary file in my game and then used SDL3_image to load the tileset image. I was considering using stb image to do the loading, but since I'm committing to SDL, I figured I'd learn SDL3_image. If it feels too heavy, I'll switch to stb.
After building and using SDL3_image, when launching my game, the main window
would open, close quickly after, and then open again. The fix for this was
calling SDL_CreateWindow
with the SDL_WINDOW_HIDDEN
flag and then showing
the window only after I create the renderer (which I do separately from creating
the window).
The gist of accelerated graphics rendering in SDL is to load your media into a surface and then from a surface create a texture. In my case, I am dealing with tilemaps which themselves are built from tilesets. For the uninitiated, I'll give a quick outline.
A tileset consists of sprites. A sprite is essentially an image. Using text as
make-believe images, an example tileset might contain these three "sprites":
_.=
. Using the previous three characters, we might build the map:
========
=_..___=
=___...=
========
Who knows what that represents, but it is a map built from the "sprites" in our tileset. If we were talking about actual computer graphics, our map would be some larger, rendered image. The rendered image's size in RAM would increase as the image got larger, but we know that the image (in this examlpe) is only composed of three unique sprites, those from our tileset. We could compress the tilemap by instead representing it as indices into the tileset.
22222222
20110002
20001112
22222222
The sprites in a tileset are also known as tiles. So here we have represented a tilemap by showing which tiles to use when rendering the map by referring to tiles (by index) in a tileset.
Back to SDL, this leaves me with multiple ways to actually store the tile data. Currently my binary tilemap export only encodes a single tilemap (due to the way I drew the map in Aseprite). I could load that tileset and then parse individual tiles out of its pixel memory, create a surface for each unique tile, and then create a texture for each unique surface. Nothing wrong with this. But what I did instead was create a single texture for the tileset, and at render time, I index into the texture to render only a portion of the tileset texture (the tile I want) to the window. The upside of this is that there are fewer resources to deal with, fewer allocations. The downside is that all tiles are stored in a single texture, so I cannot apply per-tile modulation (recoloring, transforming, etc.) as easily.
Here is an excerpt of code I wrote showing how I render from this single tileset texture.
/**
* Render a tilemap instance to a renderer. `tileset` is expected to contain
* the entire tileset referred to by the tilemap, and the tile properties are
* assumed to be compatible with dimensions specified in the tilemap.
*/
void tilemap_render(Tilemap* tm, SDL_Texture* tileset, SDL_Renderer* renderer) {
uint16_t tileset_tile_width = tileset->w / tm->tile_width_px;
for (int i = 0; i < tm->height_tiles; i++) {
for (int j = 0; j < tm->width_in_tiles; j++) {
uint16_t tile_index = tm->tiles[i * tm->width_in_tiles + j];
float tile_x = tile_index % tileset_tile_width * tm->tile_width_px;
float tile_y = tile_index / tileset_tile_width * tm->tile_height_px;
SDL_FRect src_rect = {
tile_x, tile_y,
(float)tm->tile_width_px, (float)tm->tile_height_px
};
SDL_FRect dst_rect = {
(float)j * tm->tile_width_px,
(float)i * tm->tile_height_px,
(float)tm->tile_width_px,
(float)tm->tile_height_px
};
SDL_RenderTexture(renderer, tileset, &src_rect, &dst_rect);
}
}
}
I haven't done much work on the general encapsulation; just standing things up right now.
Today marks my first day working full-time on my own projects. I'm focusing on building a game. Toward learning how everything works end-to-end from first principles, I'm building a game engine. I will likely ship with SDL3, though.
My game will feature tilemaps built from pixel art tiles. I love using Aseprite for my art. At the time of this writing, Aseprite supports building tilemaps but only supports exporting them using an out-of-tool script. This script exports tilemaps and supporting data as JSON. I don't want to write or depend on a JSON parser, though, so I chose to fork and extend the script to output binary.
Aseprite scripts are written in Lua, so I needed to learn Lua. I half-jokingly searched for "learn lua in 10 minutes" and turned up Learn Lua in 15 Minutes. Using this, the official documentation, and asking an LLM questions here and there, I picked up Lua pretty quickly.
Some interesting things I learned are that Lua has a way to allow one Lua file
to import another (require
) which caches anything that the imported file runs
on import so that when additional imports of the same file happen, nothing
top-level runs again. There is another mode (dofile
) which imports without
caching, always running anything exposed at the top-level scope in the imported
file. Another thing is that Lua only has a table data structure which doubles
as an array. Array indices start from 1
, not 0
. You can use negative
indicies when accessing the command-line arguments to find arguments passed to
Lua itself, not to your script.
Anyway, without showing the full glue code, I introduced this binary.lua
to
the exporter script, added command-line argument parsing to the main script,
and allowed the caller to either export JSON or binary. I'll stick with binary
and write a simple parser in my game.
When calling a non-graphical script from Aseprite, you do so from the command line. You pass args before specifying the script. This is for at least Aseprite v1.3.14.3.
aseprite -b foo.aseprite --script-param bar=baz --script my_script.lua
The content of binary.lua
in its initial form is below.
-- Export an Aseprite tilemap and supporting data to a binary file.
--
-- Output Schema
-- -------------
-- schema_major_ver: uint8_t
-- schema_minor_ver: uint8_t
-- schema_patch_ver: uint8_t
-- canvas_width_px : uint16_t
-- canvas_height_px: uint16_t
-- tilesets : array<tileset>
-- layers : array<layer>
--
-- array
-- -----
-- An array `array<T>` is a single `uint64_t` indicating the number of instances
-- of `T` immediately after. The instances of `T` are contiguous in memory.
-- For example, a 2-element array of type `array<uint8_t>` is laid out as
-- follows in memory:
-- uint64_t (num elements)
-- uint8_t (first element)
-- uint8_t (second element)
--
-- string
-- ------
-- A string is a single `uint64_t` indicating the number of characters
-- immediately following and then said number of characters. The characters are
-- 8-bit bytes. Each byte is a single ASCII character.
--
-- tileset
-- -------
-- image_pathname: string
-- tile_width_px : uint16_t
-- tile_height_px: uint16_t
--
-- layer
-- -----
-- name : string
-- tileset_id : uint16_t
-- width_tiles : uint16_t
-- height_tiles: uint16_t
-- tiles : array<index_into_tileset>
--
-- index_into_tileset
-- ------------------
-- index: uint16_t
--
-- Notes:
-- - Strings do not support Unicode.
-- - All numbers are encoded as little-endian.
-- - The current schema only supports a single cel per layer.
local binary = { _version_major = 0, _version_minor = 0, _version_patch = 1 }
-- Write select keys out of table `t` to the open file `f` as binary. See
-- the output schema in the file documentation for what will be written out to
-- `f`.
function binary.encode(f, t)
-- Schema version.
f:write(string.pack("<I1", binary._version_major))
f:write(string.pack("<I1", binary._version_minor))
f:write(string.pack("<I1", binary._version_patch))
-- Canvas width, height.
f:write(string.pack("<I2", t.width))
f:write(string.pack("<I2", t.height))
-- Tilesets.
f:write(string.pack("<I8", #t.tilesets))
for i = 1, #t.tilesets do
local ts = t.tilesets[i]
f:write(string.pack("<I8", #ts.image))
f:write(ts.image)
f:write(string.pack("<I2", ts.grid.tileSize.width))
f:write(string.pack("<I2", ts.grid.tileSize.height))
end
-- Layers.
f:write(string.pack("<I8", #t.layers))
for i = 1, #t.layers do
local l = t.layers[i]
f:write(string.pack("<I8", #l.name))
f:write(l.name)
f:write(string.pack("<I2", l.tileset))
if #l.cels > 1 then
error("Layer " .. i .. " has more than 1 cel")
end
local cel = l.cels[1]
f:write(string.pack("<I2", cel.tilemap.width))
f:write(string.pack("<I2", cel.tilemap.height))
f:write(string.pack("<I8", #cel.tilemap.tiles))
for j = 1, #cel.tilemap.tiles do
f:write(string.pack("<I2", cel.tilemap.tiles[j]))
end
end
end
return binary
1:14 to 1:28.