AAC Decoder

The Microsoft Media Foundation AAC decoder is a Media Foundation Transform that decodes the following Advanced Audio Coding (AAC) and High Efficiency AAC (HE-AAC) profiles:

MPEG-2 AAC Low Complexity (LC) profile (multichannel).

MPEG-4 HE-AAC v1 (multichannel) with AAC-LC core.

MPEG-4 HE-AAC v2 (stereo) with AAC-LC core.

The AAC decoder supports both raw AAC streams with no headers and AAC in an audio data transport stream (ADTS).

Starting in Windows 8, the AAC decoder also supports decoding MPEG-4 audio transport streams with a multiplex layer (LATM) and synchronization layer (LOAS). It can also convert an LATM/LOAS stream to ADTS.

Media Types

The AAC decoder supports the following media types.

Input Types

The AAC decoder supports the following audio subtypes:

Subtype

Description

Header

MFAudioFormat_AAC

Raw AAC or ADTS AAC.

For this subtype, the media type gives the sample rate and number of channels prior to the application of spectral band replication (SBR) and parametric stereo (PS) tools, if present. The effect of the SBR tool is to double the decoded sample rate relative to the core AAC-LC sample rate. The effect of the PS tool is to decode stereo from a mono-channel core AAC-LC stream.

This subtype is equivalent to MEDIASUBTYPE_MPEG_HEAAC, defined in wmcodecdsp.h. See Audio Subtype GUIDs.

1: ADTS. The stream contains an adts_sequence(), as defined by MPEG-2. Only one raw_data_block() per adts_frame() is allowed.

3: Audio transport stream with a synchronization layer (LOAS) and a multiplex layer (LATM).
Of the three types of LOAS, only AudioSyncStream is supported. The multiplex layer is AudioMuxElement, restricted to one audio program and one layer.

MF_MT_AAC_PAYLOAD_TYPE is optional. If this attribute is not specified, the default value 0 is used, which specifies the stream contains raw_data_block elements only.

MFAudioFormat_AAC: Contains the portion of the HEAACWAVEINFO structure that appears after the WAVEFORMATEX structure (that is, after the wfx member). This is followed by the AudioSpecificConfig() data, as defined by ISO/IEC 14496-3.

MEDIASUBTYPE_RAW_AAC1: Contains the AudioSpecificConfig() data. This data must appear; otherwise, the decoder will reject the media type.

The length of the AudioSpecificConfig() data is 2 bytes for AAC-LC or HE-AAC with implicit signaling of SBR/PS. It is more than 2 bytes for HE-AAC with explicit signaling of SBR/PS.

The value of audioObjectType as defined in AudioSpecificConfig() must be 2, indicating AAC-LC. The value of extensionAudioObjectType must be 5 for SBR or 29 for PS.

Output Types

The decoder supports the following output types:

Subtype

Description

MFAudioFormat_Float

IEEE floating-point audio.

MFAudioFormat_PCM

16-bit PCM audio.

MFAudioFormat_AAC

Requires Windows 8.

This output type can be used to convert an AAC stream in the LOAS/LATM format to ADTS format.

To convert an LOAS/LATM stream to an ADTS stream, set the input type to MFAudioFormat_AAC with payload type 3 (LOAS). Then set the output type to MFAudioFormat_AAC with payload type 1 (ADTS). The decoder will reformat the conainter without decoding the bitstream.

Note
The decoder does not register MFAudioFormat_AAC as an output type. However, if the application sets the input type as described, the IMFTransform::GetOutputAvailableType method returns MFAudioFormat_AAC in the list of available output types.

If the input stream contains more than two channels, the AAC decoder provides two options for the output format:

The same channel configuration as the input type.

Stereo fold-down.

Format Constraints

The decoded audio sampling rate must be one of the following, after SBR is applied (if present):

8 kHz

11.025 kHz

12 kHz

16 kHz

22.05 kHz

24 kHz

32 kHz

44.1 kHz

48 kHz

Sampling rates above 48 kHz are not supported.

The decoder supports up to 6 audio channels. For each speaker configuration, the decoder expects the AAC syntactic elements to appear in a certain order. The following table lists the supported speaker configurations. The third column of the table lists the expected syntactic elements and their order, using the following notation:

<SCE1>: The single_channel_element (SCE) associated with the front center speaker.

<SCE2>: The SCE associated with the back center speaker.

<CPE1>: The channel_pair_element (CPE) associated with the front speakers.

<CPE2>: The CPE associated with the back (or side) speakers

<LFE>: The lfe_channel_element (LFE).

For more information about these syntactic elements, refer to ISO/IEC 13818-7.