Motivation

The in-progress port of Rockbox to the iRiver and other devices requires both software audio decoding and an abstraction of the audio hardware and playback features of the different target devices, neither of which are present in the current Archos-oriented code. The aim of this document is to:

Provide the Rockbox application (i.e. the code in apps/) with an abstract Audio API capable of playing and recording multiple audio formats.

Provide a CODEC API to support the Audio API in that task.

Provide a low-level Audio Device Driver layer inside the firmware to abstract the details of the ever-changing hardware supported by Rockbox and to enable the implementation of emulation of the audio hardware within the Rockbox UI simulator.

Architecture needed for software codecs

We need a dual-buffer system with a filter in between:

One buffer for compressed data, which is fed to the codec. ("buf1")

A transport/filter function, which receives uncompressed data from the codec and writes it to the uncompressed buffer. This filter is responsible for optional manipulation of the data, such as gap removal, crossfading, equalizing etc.

filter (codec->buf2): receives uncompressed data from the codec and decides where in buf2 it goes, and if something should be done to it first

feeder (buf2->dac): reads uncompressed data from buf2 and feeds it to the dac

Questions: How does the above architecture deal with different sampling frequencies, mono/stereo and (possibly) sample sizes? Shouldn't be a problem in buf1 (the compressed data will contain the relevant meta-data), but is an issue for buf2. Do we want to attempt cross-fading between a 44.1KHz file and a 48KHz file? Is a "gapless" change in playback frequency possible on the iRiver?

Keep in mind we need to make this work in reverse too, for recording:

adc->buf2

buf2->codec

codec->buf1

buf1->disk

For devices with hardware codecs, the chain is shortcut between the loader and the feeder:

loader (disk->buf1)

feeder (buf1->hwcodec)

NOTE: the possibility of implementing a dual-buffer approach for devices with hardware codecs was discussed on IRC (2005-02-16 - very start of the day) - for the MAS devices, buf1 would contain the MP3 data as read from the disk, and the "codec" would be a "swap-copy" routine to bitswap the data in preparation for sending to the hwcodec. The existing architecture bitswaps the data in-place, right after reading it from the disk. Implementing a dual-buffer scheme here will sacrifice some RAM.

In addition, the audio API needs to support instant playback of short audio clips from memory or files - for Talkbox support, key beeps etc.

Overview of existing APIs

[Can someone who is familiar with the current playback and recording systems write a high-level description here?]

Audio API

This section describes the highest level of the API - namely that between the rockbox application and the rockbox firmware.

The first problem when playing an audio file is to determine the format. A simple approach which is probably good enough is just, in the first instance, use the file extension to decide if the file is supported or not. This guess needs to be confirmed by the actual codec code - for example, a ".WAV" file could theoretically contain one of many types of data, not just uncompressed PCM (e.g. GSM 6.10). So the codec code itself needs the ability to double-check that the file is supported.

The following is a list of proposed file formats to be recognised (but maybe not playable - that depends on the hardware) by Rockbox. I propose that the definitions of a particular hardware device includes a HARDWARE_PLAYBACK_FORMATS definition which is a combination of the following values:

There are three tasks needed to be done during the playback of a track which require knowledge of the codec's file format:

Reading metadata from the file (including ID3-type tags and technical data such as sampling rate and total running time)

Loading compressed data from the file into the compressed data buffer

The actual decoding of the data to PCM suitable for the audio driver

Audio Driver API

This section describes how the audio hardware of the various devices can be abstracted.

[Please propose and discuss.... ]

CODEC API

This section describes the API for providing decoding and encoding of the audio codecs to be supported in Rockbox. Metadata (e.g. ID3 tags) are also a feature of the codecs and so the codec API needs to include the appropriate functions to read (and write?) the metadata in a file.

We need to remember to give credit to the codec authors and information about decoder versions in the "Info" menu screen and other relevant places.

Overview

A main design goal of Rockbox is to minimise battery usage by keeping the hard-disk powered down as much as possible, and performing as few power-hungry spin-ups as possible.

It is proposed that codecs dynamically loadable - using a specialised version of the existing general-purpose plugin architecture already in Rockbox. This will remove any limititations on the number of codecs Rockbox can support. However, both the number of "codec slots" (the destinations for loadable codecs) and the number of codecs compiled into Rockbox should be configurable.

In order to allow Rockbox to support many different types of codecs (such as "non-streaming" codecs like SID/MOD, or codecs that offer "hybrid" compression like wavpack where two input files are needed to produce one output stream), it is proposed that the codecs themselves manage the memory buffer for the tracks that they are playing.

The lifetime of a codec

When Rockbox is initially started, no codecs will be activated. The user will add some songs to the playlist and the first codec will need to be loaded into a codec slot and initialised.

In order to allow the codec to make full use of the disk-spinup, it should start (in a seperate thread) the loading of the data from disk. This should not be a CPU-intensive task, but it is desirable for the codec to remove any redundant data during this loading process in order to maximise memory usage. The codec should be able to peek into the playlist in order to load multiple files during the same load operation - subject to the available memory.

As soon as possible after the codec has loaded a small amount of the first file into memory, the codec should start decoding that data into either the cross-fade buffer or directly into the low-level audio buffer. Codec implementations should aim to minimise the amount of copying of data between buffers

When a change of codecs is necessary, the audio system will need to load the second codec and initialise the decoding of the next file before the first one has finished.

API Details

The basic "decoding loop" in the audio system will ask the codec to provide X bytes (e.g. 4096 bytes) of uncompressed audio from the stream.

Function declarations

[this section is now out-dated by the above changes to the API overview] int codec_init(???)

This is the general initialisation call to the codec - so the codec can allocate memory and perform any other housekeeping tasks before it is ready to actually load and decode a file. Return codes would include:

CODEC_OK
CODEC_OUT_OF_MEMORY

int codec_open_file(???, file_info_struct* file_info)

The codec_open_file function is responsible for initialising the codec for the decoding of a specific file.

The file_info_struct parameter is used to return the technical information about the file such as the bitrate of the compressed data, the PCM samplerate (e.g. 44.1KHz) sample word size (e.g. 16-bit), number of channels (e..g. 2) and total number of samples in the stream.

Return codes would include:

CODEC_OK
CODEC_ERROR_UNSUPPORTED_FILE

int codec_get_metadata(???, metadata_struct* metadata)

This function returns ID3-tag type information from the file. We may want to call it either before or after a file is opened. i.e. to read the metadata from a track wie will be playing in the future, but without initialising a full decoder instance.

int codec_decode_data(char* pcmbuf, int* size)

This function would, with the help of the "read" callback, decode "size" bytes of PCM data from the input stream (in the format specified in the file_info_struct returned from the codec_open function). The size variable would be modified to return the actual number of bytes read. This may be less than the number requested in the case of a failure to the read() callback or an end of file condition. Return values would include:

CODEC_OK
CODEC_READ_ERROR
CODEC_END_OF_FILE
CODEC_RECOVERED_FROM_ERROR // e.g. sync was lost, but decoding continued. The audio system could feed this back to the user.
CODEC_INTERNAL_ERROR // An unexpected internal error from the codec

int codec_seek(long offset, int whence)

This function would seek on a sample-accurate basis in the file. For some codecs this could be an expensive operation, in which case we may want to allow the codec to "guess" at the appropriate seek point.

NOTE: Seeking is a complicated issue and possible seeking strategies for each supported codec need to be discussed before deciding on the semantics of this function. But some codecs (e.g. WAV, FLAC) are designed to allow sample-accurate seeking, so this should be the benchmark.

void codec_close(???)

This function is called when the audio system is finished with a file - either when the end of the file has been reached, or the user has cancelled playback. It can not fail. The codec is returned to the same state as that just following a call to codec_init

void codec_finished(???)

This function causes the codec to release any memory. It then can not be used again until a call to codec_init().

[Please propose and discuss.... ]

Implementation Progress

Now that code can be tested on the iRiver itself, it would be useful to see example implementations of simple "viewer" plugins which decode a track from a compressed format and write to a WAV file. These can be used for testing decoding speed and optimisation work can begin before the full audio API is developed and implemented.

Library source code and "codec2wav" test plugins are now in CVS for MPEG Audio (libmad), FLAC (libFLAC), AC-3/A-52 (liba52) and OGG Vorbis (Tremor). If you are actively working on such an implementation for a different codec, add the name of the codec (and your name) to the following list. We are especially in need of someone to investigate the implementation of "non-streaming" codecs such as the various sequencer formats.