Contents

What is it

OggPCM is a pulse-code modulation (PCM) audio codec for Ogg. Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container.

Why is it

The intention for this format is as an interchange format, especially for use with OggStream. It is also useful for storing time-synced decoded audio/video for development, vs RIFF/WAV (.wav) and YUV4MPEG (.yuv) in seperate files as we did with Theora.

It is also less complex than either .wav (RIFF) or .aiff (AIFF), both of these formats being designed for generic multimedia (audio, video, etc). Full compatability with these formats includes support for non-PCM data.

Using raw PCM data, on the other hand, doesn't give us that all-important header which carries information about the number of channels, sample width, and sample frequency. So what we need is a header followed by raw PCM data - nothing more complicated.

Format

Packets are processed as per the value of their first byte. Packets of unknown ID should be silently ignored, providing a convient way to add future expandability which does not break the data format. Multibyte fields in the header packet are packed in big endian order. Other fields are stored MSB first. Multibyte fields in the data packet are packed in little endian order.

The granule position specified is the total samples encoded after including all samples on the page. Samples must not be split across pages. The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is. A truncated stream will still return the proper number of samples that can be decoded fully.

This format can support any __documented and registered__ format by since it uses an enumeration. Each logical stream can support up to 16 channels sharing a fixed sample rate. Logical streams from the same source may be multiplexed to provide up to 4096 channels per source, each with their own sample rate. Up to 256 Sources may be multiplexed within a physical Ogg stream, unless an application takes other measures to logically partition the stream.

Discussion

This seems to make it easy to support the simple/normal cases and possible to support the pathological cases, for instance:

Source ID

Channel Bitfield

Sample Rate

Sample Format

Comment

0x00

0000 0000 0000 0011

96000

OGGPCM_FMT_LE_S24

Front Stereo Pair

0x00

0000 0000 0011 1100

44100

OGGPCM_FMT_LE_S16

Center And Surrounds

0x00

0000 0000 0010 0000

8000

OGGPCM_FMT_LE_S16

LFE Channel

0x01

0000 0000 0000 0001

8000

OGGPCM_FMT_U8

PC Speaker

0x02

0000 0000 0000 0001

8000

OGGPCM_FMT_U8

Microphone

0x03

0000 0000 0000 0011

8000

OGGPCM_FMT_LE_S16

Voice Chat

Each entry in the table is a logical Ogg stream. Arc is not convinced that the source id and channel block are necessary, but figured he'd throw it out there.