The Basics of XAudio2
If music be the food of love, play on […]
– Duke Orsino; Twelft Night
In this tutorial, we will learn how to use XAudio2 and the Windows Media Foundation to load both uncompressed and compressed audio files from the hard drive and how to play them back using the event queue introduced in a previous tutorial.
To learn more about the history of the audio APIs under Windows, read the following excellent article written by Shane.
Introduction
XAudio2 is a rather low-level audio API for Windows, Xbox 360 and Windows Phone 8. It is the spiritual successor to DirectSound on Windows and an improved version of the original XAudio on the Xbox 360. XAudio2 is backwards compatible by operating through the XAudio API on the Xbox 360, through DirectSound on Windows XP, and through the low-level audio mixer WASAPI on Windows Vista and higher.
The XAudio2 library is included in the March 2008 DirectX SDK. The latest version of XAudio2 is 2.9, which was released for Windows 10.
Features
XAudio2 provides a signal processing and mixing foundation for games. For example, it provides a flexible and powerful Digital Signal Processing (DSP) framework, with which, for example, cat meows can be turned into scary monster sounds.
XAudio2 also facilitates combining different voices into single audio streams, called submixing, to, for example, create an engine sound made up of composite parts, all of which are playing simultaneously. Another usage for submixing could be to combine all game sound effects and all game music in different sets to allow the user to set different volume levels for sounds and music.
DirectSound lacked support for compressed audio formats, and although with the Windows Media Foundation, it is possible to load in countless compressed formats, it would be great to have native compressed support. With XAudio2 this dream has come true, as it supports ADPCM natively.
The XAudio2 API is also “non-blocking”, meaning that the game can safely make a set of method calls to XAudio2 at any time, with a few exceptions, without long-running calls causing delays.
For a complete list of the most exciting features of XAudio2, check the MSDN.
Versions
This is a small list taken from the MSDN.
XAudio 2.7 and earlier (Windows 7)
The first version of XAudio2, XAudio2 2.0, shipped in the March 2008 release of the DirectX SDK. The last version to ship in the DirectX SDK was XAudio2 2.7, available in the last release of the DirectX SDK in June 2010.
XAudio 2.8 (Windows 8.x)
With Windows 8, XAudio2 was no longer part of the DirectX SDK, instead XAudio2 now ships as a system component. It is automatically available and does not require redistribution with an app.
Here is a small list of changes from the previous versions:
- This new version supports Windows Store app development.
- Support for instantiating XAudio2 by CoCreateInstance has been removed.
- The Initialize function is now implicitly called by the creation process and has been removed from the IXAudio2 interface.
- The X3DAudio and XAPOFX libraries are merged into XAudio2. App code still uses separate headers, X3DAUDIO.H and XPOFX.H, but now links to a single import library, XAUDIO2_8.LIB.
- xWMA support is not available in this version of XAudio2; xWMA will not be supported as an audio buffer format when calling CreateSourceVoice. Microsoft now recommends using the Media Foundation Source Reader.
XAudio2 version 2.9
The newest XAudio2 version ships as part of Windows 10, XAUDIO2_9.DLL, alongside XAudio2.8 to support older applications, and does not require redistribution.
XAudio2.9 has been updated with the following changes:
- New creation flags: XAUDIO2_DEBUG_ENGINE, XAUDIO2_STOP_ENGINE_WHEN_IDLE, XAUDIO2_1024_QUANTUM.
- xWMA support is available again in this version of XAudio2.
The XAudio2 Engine
To initialize XAudio2, as with all things DirectX related, a pointer to an interface of an IXAudio2 object is required. With the IXAudio2 interface it is possible to enumerate the available audio devices, to configure global API properties, to create voices, and to monitor performance.
Most importantly, the interface can be used to create a master voice. A mastering voice is used to represent the actual audio output device. Once a master voice is created, it can be used to create sound effects, bind them to the master voice and play them back.
To initialize XAudio2 the XAudio2Create helper function can be used:
IXAudio2 **ppXAudio2
If the function call was successful, the first parameter returns the address to a pointer to an interface of an XAudio2 object.
UINT32 Flags X2DDEFAULT
For now we will simply set this to 0 and forget about it, i.e. we will use the default value.
XAUDIO2_PROCESSOR XAudio2Processor X2DEFAULT
We can set this to XAUDIO2_DEFAULT_PROCESSOR which tells XAudio to use the default sound processor, or simply leave it at the default value.
Once a pointer to the main XAudio engine is available, creating a master voice is done using the IXAudio2::CreateMasteringVoice method. This method takes many parameters, but all of them are initialized to the default values already — and we won’t use anything else in this tutorial. For now, just note that the first parameter returns the address of the new mastering voice (if the function call was successful):
As you can see, the creation of XAudio2 is straightforward:
Now that we have a mastering voice, we have to take a quick look at the key concepts of XAudio2. How do we play sound? Well, in XAudio2 the audio data (probably read from a file on the hard drive) must be passed to a SourceVoice, which is responsible for channeling the audio data to the mastering voice, which in turn then sends the audio from all source voices to the actual audio device (most likely the speakers or a headset).
The only difficulty thus is to submit the audio data to a source voice. XAudio2 has no native support for loading sound files, and thus we have to read in all associated metadata, like the number of channels, bits per sample, and so on, ourselves. Having read the metadata, one must locate and read the actual audio data and submit it to a source voice.
Loading Audio Files
Audio files supported by XAudio2 use the Resource Interchange File Format (RIFF). We won’t elaborate on the details of the RIFF format just yet, but you can also check out the MSDN for more information.
To make things a bit easier (at least, I think it is easier), we will use the Windows Media Foundation (WMF) API to load sound files from the hard drive into a buffer. An additional benefit of using the Windows Media Foundation is that it comes with support for compressed files, such as mp3.
As we are only interested in using the WMF to decode audio files, we basically only need one aspect of the huge WMF complex, the IMFSourceReader, which is a universal decoder for audio and media formats.
WMF uses Media Types to specify the format of a media stream. There are two parts to a Media Type, the Major Type specifies the type of the media data, i.e. audio or video, while the Sub Type specifies the format of the data, for example compressed mp3 or uncompressed wav. We will use the source reader to get the details of the media we are reading from disk, and then branch our program off accordingly.
Okay, enough theory, let us learn how to use the WMF’s source reader to read in any type of supported audio, compressed or uncompressed, and to extract the audio data into a buffer that can be used with XAudio2.
Initializing the Windows Media Foundation
First things first, we have to include a few headers and load a few libraries into our application:
To initialize the WMF framework, a call to MFStartup is enough:
The Version parameter simply sets the desired version of the WMF to use and the dwFlags parameter is optional for C++, and we won’t use it, we will thusly call the function in a completely straightforward manner:
Shutting the WMF down is just as easy, a simple call to MFShutdown is enough:
Initializing the Source Reader
Before being able to read files from the disk, we have to configure the source reader. To configure a WMF object, IMFAttributes interfaces, which provide a generic way to store attributes of an object, are used. To create such an attribute interface, a single call to the MFCreateAttributes method is enough:
The function receives a pointer to the attribute interface and the initial number of elements allocated for the attribute store.
Once we have the attribute interface, we can configure the object as we desire. What we actually do desire is to tell the source reader that we want no latency, we are in Need for Speed (sic!). To do so, we use the IMFAttributes::SetUINT32 method:
The first parameter is the GUID of the value to set and the second paramter is the new value to set. The GUID for low latency is: MF_LOW_LATENCY.
Here is our function call:
Reading Audio Files
Now that the source reader is properly configured, loading in a file from the hard drive is done using the MFCreateSourceReaderFromURL function:
The first parameter specifies the location of the data in the hard drive, the second parameter holds the attributes of the source reader we have just defined, and the last parameter receives a pointer to the actual source reader.
Calling this method is straightforward again:
To make sure we are reading from an audio stream, we will disable all other streams, using the SetStreamSelection method:
The first paramter specifies the stream to set. It can be set to MF_SOURCE_READER_FIRST_VIDEO_STREAM to set it to the first video stream, to MF_SOURCE_READER_FIRST_AUDIO_STREAM to set it to the first audio stream and to MF_SOURCE_READER_ALL_STREAMS to select all streams.
The second parameter is a simple boolean specifying whether a stream should be selected (true) or deselected (false).
Thus, what we have to do, is to deselect all streams and then select the first audio stream:
Now that the source reader is attached to a file on the hard drive, we can query the source reader for the native media type of the file, which will allow us to act accordingly, i.e. we will check if the file is indeed an audio file and whether it is in a compressed or uncompressed format. If the file is uncompressed, we can simply manipulate its data, if not, we will have to decode, or uncompress it first.
To get the media type of the file, a call to the GetNativeMediaType function is enough:
Here the first parameter specifies the stream to query, we will set this to the first audio stream. The second parameter specifies which media type to query for, we will set this to 0. The last parameter returns a pointer to an IMFMediaType interface, holding the information we desire.
As before, when configuring the source reader, to actually get the information we want, we have to work with attributes using the GetGUID method as follows:
Now if the audio file is uncompressed, everything is fine, but if we are working with a compressed format, such as mp3, for example, we have to decode it first. To do so, we simply request the source reader to decode it for us. The source reader will then look through the system registry to find a suitable decoder and perform the decoding for us.
To tell the source reader what exactly we want it to do, we create a media type, set it to the format we want and then set the current media type of the source reader appropriately.
Creating the media type is done using the MFCreateMediaType function, which only takes one parameter, the address of an IMFMediaType interface:
As we are used to now, we will set attributes by using the SetGUID method:
To submit our request to the source reader, we can use the SetCurrentMediaType method:
The first parameter once again specifies the stream to configure. As always, we will set this to the first audio stream. The second parameter is reserved, and we will set it to NULL. The last parameter is a pointer to the media type to set.
Here is how to call this function in our example:
Okay, now that the source reader is properly configured to decode the audio file, we have to create the necessary precautions to be able to store the decoded audio data in a format that XAudio2 can use. XAudio2 natively works with audio files in the Resource Interchange File Format (RIFF), such as .wav files.
To do so, we create a WAVEFORMATEX object, which specifies the data format of a wave audio stream, using the IMFSourceReader::MFCreateWaveFormatExFromMFMediaType method:
The first parameter is a pointer to an IMFMediaType interface, specifying the type of the media to use, i.e. the current media type of the source reader.
The second parameter returns the address of the WAVEFORMATEX structure that was just filled with the fmt chunk specifying the audio data.
The third receives the address of an unsigned int that will be filled with the size of the above structure once the function returns.
The last parameter is a flag that we do not need to use yet.
Here is how to use this function to create a wave format description from the source reader:
Finally, there is only one step left to do: read all the audio data into a vector that we can later use to fill an XAudio2 audio buffer structure.
To do so, we read samples of the audio file, convert the sample into a contiguous buffer and then store that buffer in an array, or vector, or whatever, of bytes.
To read a sample of an audio file, we can use the ReadSample method:
As you can guess, the first parameter specifies the stream to pull the data from, we will set this to the first audio stream.
The second parameter sets control flags, which we do not need at the moment.
The third parameter receives the zero-based index of the stream, we will simply set this to a nullptr, as we want to read from the beginning of the stream. We will cover this in greater detail in the next tutorial.
The fourth parameter returns flags specifying the state of the source reader; we can use this to determine whether we have reached the end of the file, for example.
The fifth parameter specifies the time stamp of the sample in nanoseconds. We don’t need this for now and will set it to “nullptr”.
The last parameter receives a pointer to the IMFSample interface that we want filled with the audio data.
To convert the audio sample into a contiguous buffer, the ConvertToContiguousBuffer method can be used, which only takes one parameter, the address of an IMFMediaBuffer object to be filled with the audio data.
The buffer can be locked and unlocked to load and store data.
And finally, behold the code to load data from an audio stream into a buffer:
Creating Sound Events
Now, with the ability to load audio files from the hard drive, let us think back to the last tutorial. We want to create audio events to be played back in our game, using our event queue.
A sound event can loosely be defined as follows:
I am sure all the members are self-explanatory, remember that the source voice is responsible for submitting the audio data to the mastering voice of the XAudio2 engine.
In the demo created for this tutorial, I won’t use the concept of sound falloff, sound priorities or playing multiple short sounds to combat monotony, we will simply load the audio data read from a file into an XAudio2 buffer.
To do so, we simply convert the byte data read from the file into an XAudio2 audio buffer structure. To decouple this from the actual XAudio engine, we will create a new class, called the AudioComponent class:
On initialization, we simply create the XAudio2 engine:
The load file function of the audio component calls the load function from the XAudio2 engine that we just discussed and then creates the appropriate XAudio2 structure:
The source voice is created using the CreateSourceVoice method:
Luckily for us, most of those parameters come preinitialised, all we have to do is put in the address of our source voice (first parameter) and a pointer to the source format (second parameter).
To fill the XAudio2 audio buffer, we simply point it to the data collected from file on the hard drive.
Playing Audio Files
To play an audio file, all that is left to do is to submit the audio data to the source voice and to start the source voice.
Submitting audio data to a source voice is done using the IXAudio2SourceVoice::SubmitSourceBuffer method, which simply takes an XAudio2 buffer structure as input.
Starting a voice is done using the IXAudio2SourceVoice::Start method.
To stop a voice, use the IXAudio2SourceVoice::Stop method.
Here is the C++-code to play (and stop) an audio file:
To add the audio component to our event queue, we simply play or stop a sound, depending on the type of the received message. The actual sound event is passed in the message parameter of the Depesche:
As an example, I downloaded a few sounds, a meow sound, a barking sound, a button click sound and a menu music sound from freesound, which is an excellent source for free audio files, created by the following people:
- Big Dog Barking by mich3d
- Button Click by fins
- Cat Meow by Noise Collector
- Nodens (Field Song) by axtoncrolley
Here is an example of how to load the menu music and how to play it using the event queue:
We have certainly learned a lot in this tutorial. You can download the source code from here.
Here is a video of the previous tutorial game of Cosmo chasing cats with music and sound files added:
In the next tutorial, we will learn how to use submix voices to band source voices together into larger sets.
References
Literature
(in alphabetic order)
- Game Programming Algorithms, by Sanjay Madhav
- Game Programming Patterns, by Robert Nystrom
- Microsoft Developer Network (MSDN)
- Tricks of the Windows Game Programming Gurus, by André LaMothe
- Wikipedia
Audio
- button by fins
- Big Dog Barking by mich3d
- Button Click by fins
- Cat Meow by Noise Collector
- Nodens (Field Song) by axtoncrolley
Art
- Cat and Dog by pzUH
- GUI Buttons by looneybits
- Menu Buttons by Soundemperor.
- Music by ironflower86
- TexturePacker
- Wikipedia