Audio Codecs

Introduction

The basic format of an audio file in a computer is a Wave (.WAV) file. This contains uncompressed PCM audio and a 4-minute song at CD quality will be about 40MB in size. Audio codecs (encoder-decoder) are programs that reduce this filesize and can be split into two main categories - "lossy" and "lossless". See the Hydrogenaudio wiki for more information about these terms and more.

This page describes various audio codecs and provides links to resources that would be useful to a developer wanting to add support for that format to Rockbox as well as providing a chart detailing their current support status by Rockbox.

Support was added on 05 June 2007. -c1000 to -c4000 decode in realtime on Gigabeat S, -c1000 to -c3000 decode in realtime on Gigabeat F and Coldfire, only -c1000 and -c2000 decode in realtime on PortalPlayer targets.

Other Codecs

SID

Added on 18 Jul 2006. Works very well on all supported targets.

MOD

Added to repository on 21 May 2008. Works very well on all supported targets.

Realtime means that the codec is able to decode a file as fast as it needs to be played (ie. a one minute file is decoded in one minute). Codecs should be a good deal faster than this to allow for buffering, crossfading etc. though.

Development discussion

Lossy Codecs

A "lossy" codec (e.g. MP3, OGG Vorbis, AAC) uses knowledge of human hearing to try and discard as much of the original audio signal as possible, whilst attempting to make the audio sound as close as possible to the original. These codecs typically achieve a filesize of 10%-20% of the original.

From what I can tell from the website, the Helix decoder is MP3 only (i.e. no layer-I or layer-II support), is written in C++, and is licensed under the Real Networks Public Source License (RPSL). For those reasons, and the fact that MAD is tried and tested, I think we should stick with MAD -- DaveChapman

The fixed int Helix MP3 decoder is not written in C++, it's written in C. RPSL is an OSI license and GPL compatible. It is tried and tested - Motorola use it in their phones. -- AlastairS?

The RPSL is most definately not GPL compatible. They do list it as a compatible license, but that more or less just means that the GPL is RPSL compatible, rather than the other way round. Which is of course a huge joke. The license even has a note to that effect. -- JonasHaeggqvist

Another option is Stephane TAVENARD's MPEGDEC library as ported to the Coldfire here. I have set this up to work with Rockbox, but the gains weren't as great as we hoped. I have no further time/desire to work on it, but if anyone feels like picking it up, it's available here. -- ThomJohansen

FAAD2 (Free Advanced Audio Decoder version2, with HE-AAC support) from http://www.audiocoding.com comes under GPL (unlike the outdated FAAD version which is under LGPL and doesn't allow HE-AAC decoding). FAAD(2) is a both mpeg2 and MPEG4 AAC decoder!

For MP4 parsing, FAAD's mp4ff library would be nice, but it uses malloc rather excessively and the code in CVS for ALAC seems to work, with some minor changes for AAC support. Probably best to roll our own.

As of March 09, 2007, the FFmpeg project has an WMA encoder. -- BlakeJohnson - 19 Oct 2007

A/52 (aka AC3)

liba52 is a GPL'ed implementation with an integer-only mode that would run without problems on the iRiver's hardware

AC3 is the most common audio format for DVDs, so support for this format would allow you to rip the audio from a DVD and play it directly on your DAP without re-encoding. An obvious next step (if technically possible) would be "AC3 pass-thru" via the optical digital output to a standalone AC3 surround decoder.

RealAudio is a container format and a range of different codecs can be used. However, the most common is the "cook" codec.

Currently rockbox supports playback of RealAudio files with any of the following codecs : cook, AAC, AC3 and ATRAC3.

???

Lossless Codecs

A "lossless" codec (e.g. FLAC) performs the same function as "winzip" - i.e. it compresses an audio file without discarding any of the information. These codecs typically achieve a filesize of 50%-60% of the original filesize, but the audio playback will be bit-for-bit identical to the original file.

".wav" is the de-facto container format for storing uncompressed PCM audio, but may contain compressed data (most common formats are pcm, adpcm, alaw, mulaw, and dvi_adpcm). Obviously, no encoding or decoding needs to be done on PCM data (but byte-swapping may be needed - WAV files are little-endian), but other formats use various compression methods - thus the need for a codec. Details, Official specs, Sample WAV files.

As decoder.

AIFF

Audio Interchange File Format (AIFF) is the Apple version of WAV (developed jointly by Apple and SGI), and is the standard container for uncompressed PCM audio on a Mac. AIFF-C is a similar format containing compressed audio. A description of the format is available from Apple

jmac, a java implementation of Monkey's Audio is also an option (under LGPL).

The codec is heavily x86-centric with lots of x86 assembly to speed up parts of the code - particularly a neural network. Unless it's very heavily optimized for 68K and PortalPlayer, it won't run real-time. And some compression modes (Extra high, Insane) probably won't run no matter how much you optimize it - RobertoAmorim

Other Codecs

There are other audio "Codecs" that don't fit into the "lossy" or "lossless" categories above. These are different because the source for the files was not a .WAV file, but rather they are formats used by music composers to store computer-generated music.

The sid format is music from Commodore 64 games and other productions. I don't know much about the format, but a good starting point would probably be sidplay2 which includes a standalone library. I don't know whether or not this is floating point.

A thing to note is that sid files have no notion of playing time. They are simply programs that go on forever, either with music that loops or silence. Traditionally this has been fixed by looking up playtimes in a database of file-md5 hashes which includes more or less every song available on the net.

Furthermore, each file may contain many subtunes.

DanHollis: sidplay2 appears to be completely integerized.

MartinArver: Unfortunatley, libsidplay, which is the lib for sidplay2, is written in c++. I have been looking at an older version of libsidplay(1.36.59), this has fixpoint-math. But, as it is written in c++ it seems we have to build the toolchain with c++ enabled for this to compile.

Encoding not possible.

SPC

The SPC format is music from Super Nintendo (Super Famicom in Japan) games and other productions. There are a multitude of players for the SPC format, most of them listed here. As with most sound emulator formats the sound is not absolutely accurate, but the SNESamp Winamp plugin is generally regarded as the one which represents the sound best. I'm not sure if SNESamp is floating point (ChrisRobinson?: or if the nature of the Winamp plugin is indeed useful for converting - I'm no programmer), but if not there are multiple other players.

SNESamp actually uses the SNESapu emulator for SPC decoding. It is indeed the most accurate SPC700 APU emulator publicly available. Unfortunately, all emulation code is written in x86 assembly (NASM) - RobertoAmorim

Unlike the sid format SPCs are single songs, which have ID666 tag support, and every SPC file is 64kb in size. However, the more popular RSN format acts more like a sid file and contains all songs from a game (RSNs are simply RAR files which contain SPCs [which have a 90% average compression rate] named to RSN). Some players support RSN, others I believe do not.

Encoding not possible.

PSF

The PSF format is music from Sony Playstation 1 games. The decoders are fixedpoint. There is source code in C from an GPL Linux XMMS plugin known as SexyPSF. I've looked at the code but am totally clueless as to how it works. It's avilable there for anyone willing to tinker with it. Neill Corlett's PSF Central also has alot of PSF related information as well as the actual PSF spec paper. I would love to see this format make it into Rockbox in the future. There are also a few other decoders out there, but Sexy PSF is the only one I could find a source link to. I suppose some Emulator decoders could work somehow too?

Encoding not possible.

GSF

GSF files are Game Boy Advance music files. Perhaps due to how new it is, there are almost no programs around that can play GSFs, let alone open source ones. I did find one, on http://www.caitsith2.com/gsf/ . However, it is a Winamp plugin.

Encoding not possible.

Tracker formats

This is actually a whole range of more or less similar sequencer formats. There are a few opensource libraries available, which all support a lot of formats.

Mikmod is for some reason often used, but I've been told that it lacks support for many features in some formats. This is a C library.

This is for the sound tracks from the original Nintendo Entertainment System (NES). The NSF file is actually an NES ROM, with all data not related to audio stripped out. The sound track for pretty much any 8-bit Nintendo (not Super Nintendo) game is available online. The NSFE file format is a bit better. Both are playable.

Encoding not possible.

GBS

This is a chiptune format for Game Boy games. These can be played with special players, or directly through a Game Boy emulator. It is similar to the NSF format in that a GBS file is nothing a Game Boy ROM with all data not related to audio stripped out.

Encoding not possible.

HES

This is a chiptune format for the NEC PC Engine/TurboGrafx-16. Similar to the NSF and GBS format, it is merely the audio portion of a TG16 ROM.

Encoding not possible.

SAP

This is a chiptune format for old Atari 8-bit computers. It is the data played by the POKEY chip. Its based off of the code for ASAP, a GPL'd Atari music player. More info about the player can be seen here.

I have written a simple MIDI player for Linux in C++, now rewriting it in C in hopes of making it work with Rockbox. Currently the C port is able to allocate memory for the file, decode it, sequence events, and interpret them enough to play the file using the sound card an a very simple sinewave-based synthesizer I threw together. I am hoping to find an adlib emulation engine that I can import into this thing, and possibly a wavetable engine if I can find good patches.

It plays sound and at this state, it may actually work on the iRiver if someone ports the sound output routine in pctest.c to write to the iRiver DSP.

I guess the MIDI codec would have to be more of a plug-in, as it loads the entire file at once and then plays it from memory...

You may want to look for the Gravis UltraSound patches; they are floating around and used by software MIDI players such as TiMidity++

I have looked at TiMidity++ as well as the music engine used by ScummVM... Does anyone know of a good description of the Gravis UltraSound patch format? Those patches store a good deal of information, such as waveform looping, envelope, etc.. but I cannot find a guide that explains the fields in the file. Any ideas?

All right.. playback, looping, interpolation (No more ghetto lowpass filter!), drums, panning, pitch wheel and all that work fine. I have just added envelope support. It works but could probably use some more exhaustive testing. I don't know how well envelope stuff will work on the target, given the amount of extra work it puts on the processor compared with the difference it makes to the output. I guess at this point the code needs to be built for and tested on the target.. but I don't really know what kind of functions it uses for file I/O, etc. Maybe someone can help me with that.. someone who actually has an iriver, etc.

Updated synth sample here. This plugin needs a separate soundset to work. This is available here. Extract its contents into the /.rockbox directory. Warning: file is around 22MB in size.

The plugin can play back midi in 22kHz in realtime on coldfire based targets, still not realtime on pp based targets.

Encoding not possible.

Faster MDCT Experiment

Work has started on an experiment aiming for writing a faster MDCT for the codec library. For details, see FasterMDCT