Zafehouse’s audio engine: ADPCM versus xWMA

I might have mentioned somewhere that I’ve converted Zafehouse 2′s audio engine from Managed DirectX to XAudio2 under SlimDX. I don’t believe I mentioned it here, though.

First, history. The original Zafehouse lacked many elements one normally expects from a game. Graphics. A proper tutorial. Sound. These are slowly being rectified in the sequel. Graphics are being taken care of handily, and a tutorial will be implemented once, well, the game is.

Sound… sound was an interesting one. There are many ways to play sounds in Windows and .NET. You can use the basic, in-built functionality in My.Computer. Or you can venture into the slighly more complex world of System.Media.SoundPlayer.

Both, sadly, are kind of garbage if you’re trying to make a game. That is, a game with more than one or two sounds playing at once. There’s also the problem of being limited to PCM as an audio file format. Hey, if I was willing to have Z2 weigh in at a couple of hundred megabytes, then PCM would be fine. Hilariously so.

But I think my download server (and everyone downloading from said server) would be upset. So upset they might convert from downloadees to “I-hate-you-Logan-for-wasting-my-download”…. ees.

So I started investigating Managed DirectX. MDX is a wrapper around the standard DirectX libraries so you can use them in C# and VB .NET, among other languages. DirectSound under MDX didn’t look too foreboding, and I went ahead and implemented a basic audio engine capable of playing multiple sounds and background music.

Now MDX will play anything that resembles a RIFF. Well, anything that resembles a RIFF and contains a PCM or ADPCM stream. Anything else and it will spit at you like a pretentious hydra being served broiled heads instead of boiled ones. Because hydras like boiled heads.

ADPCM isn’t really a compressed stream. It’s just PCM reduced from 16 to 4-bits. Other stuff happens to maintain sound quality, but essentially you end up with something many times smaller than the original PCM. The only problem is, to (barely) match the compression ratio of MP3, Vorbis or WMA, you have to cut out a channel.

Stereo to mono. Which ain’t so bad. Truly, it’s not. And, in some cases, ADPCM can produce audio that sounds better than what a psychoacoustic codec can crank out. As long as you don’t mind a bit of hissing.

But I wasn’t satisifed. I knew I could do better. I was particularly certain of this betterness when I read that Microsoft was encouraging developers to move from MDX to XNA. It gently encouraged this by flipping the bird to MDX.

Yes, I had just written an audio engine in an API no longer supported by MS. Sweet, I thought, and began searching for alternatives.

There are a bunch of free audio engines that work in .NET, but if you ever want to commercialise your product, then you have to fork out megabucks. And I didn’t want to lock myself into that sort of situation. The idea of using a pre-packaged solution didn’t tickle me the right way either.

I fiddled around with Vorbis, but had trouble tracking down a native implementation in VB .NET or C#. I did find one in the Mono repository, and even got it working. Problem was it was slow (Vorbis-to-PCM conversion, specifically), and it still relied on MDX.

Sucky? You bet. Extra sucky because I know squat about the inner workings of Vorbis and had no idea how to optimise. I didn’t really want to spend time doing it. I have a game to finish, after all.

XNA appeared my only option. After getting the libraries loaded, I realised it was too high level. Which means it didn’t give me enough control over what it was doing. Much sighing occurred.

Then… I found XAudio2. Its documentation was hidden away in MSDN, but there it was. XAudio2 is Microsoft’s replacement for DirectSound, and it’s the underlying tech for XNA’s audio stuff. XAudio2 is nifty. It would have been even more nifty if all the documentation wasn’t as verbose as a mute sports announcer. Oh, the documentation is for C++ only, so make that a mute sports announcer who only speaks Esperanto.

But XAudio2 supports xWMA out of the box. xWMA is a stripped-down version of WMA encased in a RIFF container. Yay! It was exactly what I was after. I grabbed SlimDX, which allows you to access XAudio2 (and other multimedia-related libraries) via a managed wrapper, and buried myself to the armpits in code for a weekend.

Eventually, I came up with something – dare I say it – sound. An audio engine blindingly superior to what I’d previously concocted. It plays up to 64 xWMA-encoded sounds flawlessly and with only a minor hit to memory consumption (much less than the MDX monster I was working with).

Only now I need to include SlimDX with Zafehouse 2. I went from a 9MB audio package encoded in ADPCM, to a 7MB xWMA one with a 3MB DLL. To be fair, the new audio is all stereo (which makes the music sound that much better), but I still feel like I’ve run in a giant circle.

The circle does crack out some fine dual-speaker tunes, though.