I'm designing a game that depends heavily on audio; I will have some 300+ speech files (most of them just a word or two long). This can very quickly escalate the size of my final game.

What's the optimal way to encode/compress speech files to keep the size minimal without getting audio artifacts?

Please address both per-file compression/encoding, and also zipping/compressing the set of all speech files together in your answer. Because I'm not sure which (or combination of both) factors will give me the best results.

Edit: I need this to run in Silverlight and Android, so I'm presumably stuck with only MP3 as my option (other than uncompressed wave files).

Do you have to use MP3, or do you also consider other codecs? There are codecs optimized for speech, such as Speex...
–
bummzackSep 2 '11 at 13:29

2

Just as an aside, if you use a good codec, attempting to compress further with a generalised compression tool like zip will do little to nothing. It may even make things worse.
–
Matthew ScharleySep 2 '11 at 13:33

@bummzack I may be wrong, but I interpreted the question as "I have MP3's, how can I encode/compress them to make them smaller." Totally agreed on Speex too. Mumble uses it as a low-bandwidth codec for realtime speech, and it sounds great.
–
Matthew ScharleySep 2 '11 at 13:34

I've updated my question. I presumably can only use MP3 or WAV files, unfortunately.
–
ashes999Sep 2 '11 at 13:36

1

From previous experience now 4-5 years old, you're not going to get voice without artifacts at compression ratios 8:1 or higher. Artifacts aren't horrible until you push it hard. The tech may be more advanced than then, too.
–
Patrick HughesSep 3 '11 at 2:55

3 Answers
3

With regards to compressing all your files together for distribution, a bare tarball or similar may be your best bet. When attempting to compress files that are already heavily compressed such as video and audio, you can sometimes end up with larger files by compressing them due to file headers and such. Be wary, but your mileage may vary.

Speex is really great for voice encoding. Funny thing is, the first hit when you click on the Android link leads to a blog from the company I used to work for. I wrote the Android client and we used a native speex library for our android client. However, the android os did not have a JIT compiler at that time.
–
LurcaSep 2 '11 at 16:36

Ogg/Opus is, as best I understand, one of the most compact speech formats available that does not significantly degrade the quality. I worked with it for a time in a former job, and the files compressed to a fraction of the size of the comparative Speex format, let alone the huge improvements over storing as an MP3 or wave. The only drawback is that there's a limited amount of pre-built material for doing the Ogg encoding/decoding. I recommend using the OpusFile library to save a lot of time and trouble.

The only other caveat is that you have a limited set of frequencies you can use natively between 8 kHz and 48 kHz and the libopus library defaults to 48k, so you'll have to downsample and upsample if you don't want to do a little modification of the source code (which I can aver is not difficult).

G.726 is only available in 16, 24, 32 and 40 kbps (and the only free implementation I could find, Asterisk's, only supports the 32 kbps). If you're strapped for space, this isn't the ideal solution.
–
Martin SojkaSep 2 '11 at 14:28

I'm not sure I understand your answer. Given a recorded wave or MP3, there are tools that will convert my file into something that's ADPCM encoded which will work in Silverlight and Android?
–
ashes999Sep 2 '11 at 15:39