Contents

What is it

OggPCM is a pulse-code modulation (PCM) audio codec for Ogg. Similar to Microsoft's .wav or Apple's .aiff formats, it's a simple way to store and transfer uncompressed audio within an Ogg container. For the purposes of this document, the term PCM is used to describe a digital representation of an audio signal, where volume samples are taken at regular uniform intervals and then quantized into a digital (usually binary) code. A more complete definition of PCM and related terminology can be found at Wikipedia.

Why is it

The intention for this format is as an interchange format, for example for use with OggStream. It is also useful for storing time-synced decoded audio/video, as opposed to using RIFF/WAV (.wav) and YUV4MPEG (.yuv) in separate files as was done during Theora development. It is intended to be less complex to use than either RIFF or AIFF.

Stream Description

A stream is composed of a header packet, zero or more comment packets, and one or more data packets. Data packets may be of variable length, including zero. The only valid use of a zero length data packet is to mark the end of stream. Data packets must contain samples for all channels. That is to say, the length of a data packet must be a multiple of the number of channels times the storage size of a single sample. For instance, for a stream containing 6 channels at 2 byte per channel, the length of the data packet must be a multiple of 12 bytes.

The degenerate stream is a single header packet followed by the raw data packets. While this degenerate stream is not incredibly useful for long term storage or as a general purpose container, it is useful for applications where other data describing the stream is available out of band, for instance amongst cooperating applications in an inter-process communication scheme. Streams providing the extra defined comment packets are intended to be useful for long term storage and communication amongst diverse applications.

Packet Format

Header and comment packets are processed as per the value of their first byte. Packets of unknown ID should be silently ignored, providing a convient way to add future expandability which does not break the data format. An example of how this can be useful is the proposed ReplayGain extension to .wav format: http://replaygain.hydrogenaudio.org/file_format_wav.html

The header packet contains a field indicating the number of comment packets preceding the raw data. Applications must either parse or skip exactly this many packets, in addition to the header packet, before treating the stream as raw data.

Header Packet

Multibyte fields in the header packets are packed in big endian order, to be consistent with network byte order. A header packet contains the following fields:

Bit Description
15 (MSB) Interleaved/Chunked - If set, data in the packets is "chunked" by channel. In a data
packet containing 3 channels and 2 samples/channel, the chunked storage order would be
001122. For the interleaved storage format (default), the order would be 012012.
others Reserved

Applications conforming to version 1.0 of this spec MUST:

set all reserved flags to false (zero) when creating these streams.

preserve all values of all reserved flags when reading or modifying these streams, unless the application sets the minor version field to zero, in which case the reserved flags must be set to false as well.

Comment Packets

Data Packets

Data packets have no header word. This is done to preserve the alignment of the data payload. The contents of the data packets are specified by a combination of the 'PCM Format ID' field and the 'Flags' field. The length of the data packet must be a multiple of the number of channels specified in the header, and the storage size of a single sample, as specified by the 'PCM Format ID' field.

Supported PCM Formats

Formats are identified within a header packet by a 16 bit "format type" field. While
most applications will treat this as an opaque type, it is possible to discern some
information about the format from the value of this field itself. Specifically, the
format's storage size, in bytes, and its byte ordering, can be discerned by parsing
the lower 6 bits of the value. These values are exposed so that it is possible to
extract individual samples without necessarily understanding the coding scheme involved.
While for pratical purposes, due to performance concerns, most applications will
choose to operate on a buffer directly, it is nonetheless possible to work a sample
at a time.

Binary Value Meaning
..xxxx00 N/A, or data not accurately described by this scheme.
..xxxx01 Least significant byte first. Bytes are MS bit first.
..xxxx10 Most significant byte first. Bytes are MS bit first.
..xxxx11 Data is machine endian
..0000xx Data can not be described by this bytepacking scheme.
..0001xx Samples are stored using one byte per sample
..0010xx Samples are stored using two bytes per sample
..0011xx Samples are stored using three bytes per sample
..0100xx Samples are stored using four bytes per sample
..1000xx Samples are stored using eight bytes per sample

The remaining 10 bits describe the coding scheme used to convert the digital value
to an audio signal. The following formats are defined for version 1.0 of this
format. For purposes of attribution, it should be noted that these formats are the
PCM formats supported by the Advanced Linux Sound Architecture (ALSA) project, and
should be fairly comprehensive.

Encapsulation in Ogg

Following standard terminology for uncompressed audio, an audio frame is the collection of samples for all channels for a single sampling period. For example, an audio frame for a stereo signal is a pair of sample values for the left and right channels.

The granulepos of an Ogg page indicates the presentation time of the last presentable element in the last complete packet within that page; for OggPCM, a granule is an audio frame. The granule position specified is the total audio frames in the stream including the last complete packet in a page. Audio frames must not be split across packets. The rationale here is that the position specified in the frame header of the last page tells how long the data coded by the bitstream is in samples as well as provides the current stream position to seeking routines. A truncated stream will still return the proper number of audio frames that can be decoded fully.