← back

2019-10-14: fundamentals of game audio design

For audio speakers, which output analogue sound, the solution from the computer's point of view is to take the internal audio data and "sample it" using a technique known as pulse code modulation; typically 48,000 samples per second AKA 48kHz.

However, this by itself only allows us to play a single sound, which is not that interesting. We can play more sounds by summing the waves of all of the output signals. Hence...

audio output = sum(all signal waves)

In order to get this to a sound card and actually play the sound, we need to use a specific structure called a circular buffer which will store the data stream for use with the constant audio input/output that might occur in a game; e.g. from sound effects and music.

A well designed circular buffer must guarantee; that it will allow any given value stored to be returned in less than the buffer length; that it can always finish processing the whole buffer; that the output contains valid audio data and with no exceptions or errors.

In order to obtain the above requirements, this means the buffer must be implemented as a "mixer thread" that is spawned or declared by the program or an audio library. To create a proper mixer, it typically needs the following:

the mixer thread must run at high OS priority
it must be block free, which means either lock-free (atomic) or wait-free
no memory allocations or deallocations
no I/O delay; e.g. console, IPC, disk, network, etc

Since video games are real-time applications, this means that processor and memory and how they are used is very important. Yet there are a number of known options for dealing with this:

decompress the file into memory (akin to PCM), reducing CPU cost
copy the compressed file into memory and then decompress it from the buffer, reducing memory cost
create another buffer layer and load it from an input source, such as a microphone or disk, and then buffer that into the original circular buffer
stream the data off of the disk (e.g. ifstream) using a double buffer; this can be very helpful for large audio files, such as music or ambience sounds
synthesize the audio in real-time, which only works well if you want to entirely generate a piece of audio

In most modern game design, typically some sort of middleware is provided to assist with the creation and playing of sounds or music during in-game events. Examples include; FMOD, AudioKinetic Wwise, CRI ADX2, while other game engines like Unity and Unreal feature their own middleware.

Sounds that are currently playing are read from the buffer into elements called channels and in most middleware they offer features to check whether or not they are still playing sounds or if they are paused, as well as the ability to set the 3D position of the channel; i.e. sound emission origin.

A good sound engine will be built as a finite state machine where it will be only in a single state at any given time and often have a variety of helpful features such as fadeouts, async loads, and virtual sounds. Such a state machine could have the following:

a playing state for when sounds are playing
stopping and stopped states for implementing fadeouts
for async loading, the initialize and loading and try-to-play / devirtualize states are needed
virtual sounds need the virtualizing and virtual states; where virtual sounds are sound data that is important enough to keep track off but you the player may not actually hear them

If the above are implemented into an audio engine, then the developers have quite a lot of functionality to utilize and this can allow the creation of effects, such as a low-pass-filter.