2025-188

15:55:33, atom feed.

In my game, I want to control the pool of memory from which all allocations are made. This will help with debugging and also general stability, if I can allocate all memory I will need up front and then parcel that memory out internally on-demand. I haven't gone into much detail about why I am not using a game engine, but having more control over allocations or at least the memory pool being used is one of them.

SDL provides SDL_SetMemoryFunctions which a developer can use to specify the functions to call in place of the typical malloc, calloc, realloc, and free. I have previously implemented my own simple linear allocator, from which all I ever did was "malloc" a few times. SDL is its own animal, though, and until I read its code, I need to treat it as if it could ({de|re})allocate at any time. This is unfortunate since it complicates the problem for me, but it's one price I pay for using a powerful abstraction like SDL.

Although I am compiling SDL3 from source for my game, I have not read into the code yet. To get a rough sense for how many allocations are happening and which memory-related functions are actually called, I wrote shims for the memory functions, registered them, and ran my game. One such shim looks like

static int my_malloc_count = 0;
void* my_malloc(size_t s) {
  printf("my_malloc called; c: %d\n", my_malloc_count++);
  return malloc(s);
}

What I see is that a lot of allocation happens initially, both with calloc and malloc. This is SDL initializing its subsystems. On shutdown, a lot of freeing happens, which is cleanup. On the first pass through the game loop, calloc is called hundreds of times, realloc is called maybe about ten times, and malloc is called a few times. realloc is also called, although extremely rarely relative to the other calls. Presumably SDL lazy-initializes some things, and the first pass through the game loop is having SDL finish its initialization as I use various subsystems. On all other iterations through the game loop, I see only the following (keep in mind my game loop is relatively sparse right now).

my_calloc called: c: 1160
my_free called; c: 251
my_free called; c: 252
my_malloc called; c: 297

Debugging shows that SDL_RenderPresent is what I call that in turn calls other functions which are responsible for the (de)allocations above. For what it's worth, I am using the GL rendering backend. I switched to the software rendering backend and experience the same behavior.

In a previous life, the linear allocator I mentioned didn't need to support any efficient freeing because I only ever allocated a fixed number of times to set up buffers I would later reuse. What I see here is that SDL will routinely allocate and free. So should I want to point the memory allocators at my custom memory pool, I need to design the pool and the functions that use it in a way that will efficiently recycle memory.

By the way, I was initially calling SDL_Log in my allocator shims for debugging, but this was a bad idea as SDL_Quit also uses the memory functions after it has turned at least some subsystems off, causing SDL to request re-initialization of the logging subsystem and then hang indefinitely in what appears to be a sleep-and-check loop.

my_free called
^C
Program received signal SIGINT, Interrupt.

(gdb) where
#0 in __GI___clock_nanosleep
#1 in __GI___nanosleep
#2 in SDL_SYS_DelayNS
#3 in SDL_Delay_REAL
#4 in SDL_ShouldInit_REAL
#5 in SDL_InitLog
#6 in SDL_CheckInitLog
#7 in SDL_GetLogPriority_REAL
#8 in SDL_LogMessageV_REAL
#9 in SDL_Log
#10 in my_free
#11 in SDL_free_REAL
#12 in SDL_RemoveHintCallback_REAL
#13 in SDL_QuitLog
#14 in SDL_Quit_REAL
#15 in SDL_Quit
#16 in main
(gdb) quit

I also noticed SDL_aligned_alloc in the SDL documentation. It has a corresponding SDL_aligned_free function as well. The documentation says that SDL_aligned_free must be used to free memory allocated with SDL_aligned_alloc, so I'm not concerned about having to accommodate these (de)allocations in my implementation. But their presence does raise the question of whether they would ever be called in my runtime. If they are called, my hopes of centralizing and controlling memory allocation are dashed. SDL3/SDL_Surface.h defines the surface flag SDL_SURFACE_SIMD_ALIGNED with the documentation

Surface uses pixel memory allocated with SDL_aligned_alloc()

and grep suggests the aligned (de)allocator is generally only used when leveraging SIMD. I also see the aligned memory functions used in implementations for the PS2, PSP, and PS Vita, but these are not targets of mine. I checked whether that surface flag was set on surfaces I am loading, and it is not. So I can probably safely ignore the aligned (de)alloc interface.

So now the question is, how much work is it going to be to implement my own memory pool and allocator to support SDL's behavior? Regardless of the implementation effort, there is also the testing effort, such as unit tests and then fuzzing with asan enabled or something like that.

...

This explanation of slab allocators in older verions of Linux tells me I'm going to be spending a lot of time writing an efficient allocator. The slab allocator is an interesting concept and an appropriate one to use in my case, in my opinion. One feature is that it caches previous allocations of particular sizes, anticipating that such allocations may be requested again. This is exactly the case in my game loop. It also does more advanced things such as adding "coloring" (padding) between slabs to try and keep different allocated regions in different cache lines, so that re-allocating one region does not evict neighboring regions from the cache.

Then there is the backing buddy allocator which handles coarser-grained allocations, typically on the order of pages. I don't need to bother much with this, as for my own pool I'd make a single mmap or malloc call to the system to get the memory for my pool.

I would like to leverage something like an arena allocator for per-frame storage and just clear the storage at the top of each iteration of the game loop, but that SDL_RenderPresent calls malloc and doesn't seem to free that allocation until the next call through tells me I shouldn't free the underlying memory on a frame boundary. The Jai language has a dedicated temporary storage that is recommended to be used in just this way for applications with something like a game loop in them. (Aside: Jai is my favorite language in the C++-contender space. I am just using my favorite dialect of C++, which is C with funtion and operator overloading, for early game development work because I want to reduce the amount of things I need to get better at all at once.) I'd even accept two memory pools, one persistent across the lifetime of the application for things like SDL subsystems, and the other being that frame-temporary storage. Alas, I don't think I can find the lifetime boundaries of allocations made by dependencies so easily and thus don't know when an arena could be reset.

So I wondered, could I use something like jemalloc and just hand over a memory pool to the library and have it allocate from there? The answer is yes, although this is non-trivial, and it of course brings in another sophisticated dependency. So at the expense of making my game harder to trace and debug, I would have more control over the memory (de)allocations. I'm not ready to make that trade.

Lastly, I am toying with the idea of just letting SDL use the system-provided allocators and introducing a simple bump or arena allocator for per-frame storage. The current conclusion of my investigation is to put this off to a later time. The control/complexity tradeoff is not in my favor yet.