2025-188
15:55:33, atom feed.
In my game, I want to control the pool of memory from which all allocations are made. This will help with debugging and also general stability, if I can allocate all memory I will need up front and then parcel that memory out internally on-demand. I haven't gone into much detail about why I am not using a game engine, but having more control over allocations or at least the memory pool being used is one of them.
SDL provides
SDL_SetMemoryFunctions
which a developer can use to specify the functions to call in place of the
typical malloc
, calloc
, realloc
, and free
. I have previously
implemented my own simple linear allocator, from which all I ever did was
"malloc
" a few times. SDL is its own animal, though, and until I read its
code, I need to treat it as if it could ({de|re})allocate at any time. This is
unfortunate since it complicates the problem for me, but it's one price I pay
for using a powerful abstraction like SDL.
Although I am compiling SDL3 from source for my game, I have not read into the code yet. To get a rough sense for how many allocations are happening and which memory-related functions are actually called, I wrote shims for the memory functions, registered them, and ran my game. One such shim looks like
static int my_malloc_count = 0;
void* my_malloc(size_t s) {
printf("my_malloc called; c: %d\n", my_malloc_count++);
return malloc(s);
}
What I see is that a lot of allocation happens initially, both with calloc
and
malloc
. This is SDL initializing its subsystems. On shutdown, a lot of
free
ing happens, which is cleanup. On the first pass through the game loop,
calloc
is called hundreds of times, realloc
is called maybe about ten times,
and malloc
is called a few times. realloc
is also called, although extremely
rarely relative to the other calls. Presumably SDL lazy-initializes some things,
and the first pass through the game loop is having SDL finish its initialization
as I use various subsystems. On all other iterations through the game loop, I
see only the following (keep in mind my game loop is relatively sparse right
now).
my_calloc called: c: 1160
my_free called; c: 251
my_free called; c: 252
my_malloc called; c: 297
Debugging shows that SDL_RenderPresent is what I call that in turn calls other functions which are responsible for the (de)allocations above. For what it's worth, I am using the GL rendering backend. I switched to the software rendering backend and experience the same behavior.
In a previous life, the linear allocator I mentioned didn't need to support
any efficient free
ing because I only ever allocated a fixed number of times to
set up buffers I would later reuse. What I see here is that SDL will routinely
allocate and free. So should I want to point the memory allocators at my custom
memory pool, I need to design the pool and the functions that use it in a way
that will efficiently recycle memory.
By the way, I was initially calling SDL_Log
in my allocator shims for
debugging, but this was a bad idea as SDL_Quit
also uses the memory
functions after it has turned at least some subsystems off, causing SDL
to request re-initialization of the logging subsystem and then hang indefinitely
in what appears to be a sleep-and-check loop.
my_free called
^C
Program received signal SIGINT, Interrupt.
(gdb) where
#0 in __GI___clock_nanosleep
#1 in __GI___nanosleep
#2 in SDL_SYS_DelayNS
#3 in SDL_Delay_REAL
#4 in SDL_ShouldInit_REAL
#5 in SDL_InitLog
#6 in SDL_CheckInitLog
#7 in SDL_GetLogPriority_REAL
#8 in SDL_LogMessageV_REAL
#9 in SDL_Log
#10 in my_free
#11 in SDL_free_REAL
#12 in SDL_RemoveHintCallback_REAL
#13 in SDL_QuitLog
#14 in SDL_Quit_REAL
#15 in SDL_Quit
#16 in main
(gdb) quit
I also noticed
SDL_aligned_alloc in the
SDL documentation. It has a corresponding SDL_aligned_free
function as well.
The documentation says that SDL_aligned_free
must be used to free memory
allocated with SDL_aligned_alloc
, so I'm not concerned about having to
accommodate these (de)allocations in my implementation. But their presence does
raise the question of whether they would ever be called in my runtime. If they
are called, my hopes of centralizing and controlling memory allocation are
dashed. SDL3/SDL_Surface.h
defines the surface flag SDL_SURFACE_SIMD_ALIGNED
with the documentation
Surface uses pixel memory allocated with SDL_aligned_alloc()
and grep
suggests the aligned (de)allocator is generally only used when
leveraging
SIMD. I also
see the aligned memory functions used in implementations for the PS2, PSP, and
PS Vita, but these are not targets of mine. I checked whether that surface flag
was set on surfaces I am loading, and it is not. So I can probably safely
ignore the aligned (de)alloc interface.
So now the question is, how much work is it going to be to implement my own memory pool and allocator to support SDL's behavior? Regardless of the implementation effort, there is also the testing effort, such as unit tests and then fuzzing with asan enabled or something like that.
...
This explanation of slab allocators in older verions of Linux tells me I'm going to be spending a lot of time writing an efficient allocator. The slab allocator is an interesting concept and an appropriate one to use in my case, in my opinion. One feature is that it caches previous allocations of particular sizes, anticipating that such allocations may be requested again. This is exactly the case in my game loop. It also does more advanced things such as adding "coloring" (padding) between slabs to try and keep different allocated regions in different cache lines, so that re-allocating one region does not evict neighboring regions from the cache.
Then there is the backing
buddy allocator which
handles coarser-grained allocations, typically on the order of pages. I don't
need to bother much with this, as for my own pool I'd make a single mmap
or
malloc
call to the system to get the memory for my pool.
I would like to leverage something like an
arena allocator
for per-frame storage and just clear the storage at the top of each iteration
of the game loop, but that SDL_RenderPresent
calls malloc
and doesn't
seem to free that allocation until the next call through tells me I shouldn't
free the underlying memory on a frame boundary. The Jai language has a dedicated
temporary storage that is recommended to be used in just this way for
applications with something like a game loop in them. (Aside: Jai is my favorite
language in the C++-contender space. I am just using my favorite dialect of C++,
which is C with funtion and operator overloading, for early game development
work because I want to reduce the amount of things I need to get better at all
at once.) I'd even accept two memory pools, one persistent across the lifetime
of the application for things like SDL subsystems, and the other being that
frame-temporary storage. Alas, I don't think I can find the lifetime boundaries
of allocations made by dependencies so easily and thus don't know when an
arena could be reset.
So I wondered, could I use something like jemalloc and just hand over a memory pool to the library and have it allocate from there? The answer is yes, although this is non-trivial, and it of course brings in another sophisticated dependency. So at the expense of making my game harder to trace and debug, I would have more control over the memory (de)allocations. I'm not ready to make that trade.
Lastly, I am toying with the idea of just letting SDL use the system-provided allocators and introducing a simple bump or arena allocator for per-frame storage. The current conclusion of my investigation is to put this off to a later time. The control/complexity tradeoff is not in my favor yet.