Background

Advanced ( formerly Active ) Streaming Format was developed by
Microsoft in 1995-1998. Its main purpose is to serve as an universal format
for storing and streaming media. There are two versions of ASF. Version that is
known as 2.0 is well-documented and its specifications are publicly available.
Unfortunately, they are not very helpful for developers because
this format is not widely used ( if used at all ).
On the other hand, there's
another version of ASF format ( 1.0 ). It is extremely popular. All files
with extensions .asf, .asx, .wmv and .wma that you can find in the 'Net are
stored in ASF 1.0. Microsoft never released any documentation covering
this format. There's a rumour that this format is even patented! This situation
similar to the one with MPEG-4 specifications: Microsoft appears to take active
part in development of specifications for MPEG-4 but does not use these
formats in its products, instead, it promotes their closed-source variations
( DivX ;-) and Windows Media Video ).
As long as Microsoft does not provide implementations of ASF reader
or writer for any platforms except Windows and Macintosh, it is necessary
to have at least minimal specification of the format to implement tools
for working with ASF 1.0 on all other platforms. This document tries to organize
all available information covering the format, received from different sources.
Readers are encouraged to get acquainted with ASF 2.0 specifications
to better understand the ideas beyond the format and other features that it
offers.

Disclaimer

This specification was created by analyzing data contained
in freely-available media files. No reverse-engineering or other illegal activity took place
during collection of this information. Neither author nor any contributors
guarantee that any bit of this information is correct.

Data types

UINT8, UINT16, UINT32, UINT64 - unsigned integer values, 8, 16, 32 or 64-bit long.
In GNU C compiler they are represented by types 'unsigned char', 'unsigned short', 'unsigned long'
and 'unsigned long long'.
FILETIME - unsigned 64-bit integer. Number of 100-nanosecond intervals
since midnignt, January 1, 1601, GMT.
GUID - 128-bit value, that can be generated on any system using special
algorithm. The algorithm guarantees uniqueness of any such value ( it means
that two different computers or even the same computer in different
moments of time cannot generate the same GUIDs ).
BITMAPINFOHEADER - universal structure that describes format of a ( compressed ) image.

typedef struct
{
long biSize; // sizeof(BITMAPINFOHEADER)
long biWidth;
long biHeight;
short biPlanes; // unused
short biBitCount;
long biCompression; // fourcc of image
long biSizeImage; // size of image. For uncompressed images
// ( biCompression 0 or 3 ) can be zero.
long biXPelsPerMeter; // unused
long biYPelsPerMeter; // unused
long biClrUsed; // valid only for palettized images.
// Number of colors in palette.
long biClrImportant;
} BITMAPINFOHEADER;

typedef struct
{
short wFormatTag; // value that identifies compression format
short nChannels;
long nSamplesPerSec;
long nAvgBytesPerSec;
short nBlockAlign; // size of a data sample
short wBitsPerSample;
short cbSize; // size of format-specific data
} WAVEFORMATEX;
This structure is immediately followed with an array of bytes of size cbSize.

All time intervals are either measured in 100-nanosecond steps
and represented with 64-bit type ( they wrap around each
several million years ), or measured in milliseconds and represented
with 32-bit ( they wrap around roughly each 49.7 days ) or 16-bit types ( each 65.5 seconds ).

Basic information

ASF 1.0 file consists of 'chunks'. They are similar to
chunks from AVI format, but size of their fields was increased.
Chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Data

-

Variable

Chunk type describes type of content in the chunk. See below for list of
known chunk type GUIDs.
Chunk length corresponds to the entire chunk ( i.e. length of data only is
chunk length minus 24 ).
The other important concept is 'packet'. Since the format is supposed to be
streamable, all actual data, such as compressed audio or video, is stored
in 'packets'. Unlike in ASF 2.0, all packets have fixed size.
Each valid file should contain at least two chunks. They are File Header Chunk
and Data Chunk. File Header Chunk contains all the information required
to start processing actual data, while Data Chunk contains data packets.

Headers

File Header chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Number of subchunks

UINT32

4

Unknown

-

2

Chunks

-

Variable

This chunk is special because it contains other chunks in the data field.
There may be any number of such chunks, but we need to know about two
special kinds of them.

Header Object:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Client GUID

GUID

16

File size

UINT64

8

File creation time

FILETIME

8

Number of packets

UINT64

8

Timestamp of the end position

UINT64

8

Duration of the playback

UINT64

8

Timestamp of the start position

UINT32

4

Unknown, maybe reserved ( usually contains 0 )

UINT32

4

Flags ( usually contains 2 )

UINT32

4

Minimum size of packet, in bytes

UINT32

4

Maximum size of packet

UINT32

4

Size of uncompressed video frame

UINT32

4

Value 0x02 in flags probably means that the file is seekable.
Minimum & maximum sizes of packet are typically equal. It is not precisely known how to handle ASF file if it's not true.
Stream Object:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Stream type (audio/video)

GUID

16

Audio error concealment type

GUID

16

Unknown, maybe reserved ( usually contains 0 )

UINT64

8

Total size of type-specific data

UINT32

4

Size of stream-specific data

UINT32

4

Stream number

UINT16

2

Unknown

UINT32

4

Type-specific

-

Variable

Stream-specific

-

Variable

Type-specific data is data which meaning can be derived only from stream type.
It may be followed by fields that also depend on value of audio error concealment
type.
Second unknown value in this object seems to be absolutely random,
but if there is more than one stream in the file, they all hold the same
value here.
Type-specific data for video stream:

Field

Type

Size (bytes)

Picture width

UINT32

4

Picture height

UINT32

4

Unknown

UINT8

1

BITMAPINFOHEADER size

UINT32

4

Picture format

BITMAPINFOHEADER

Variable

Field 'Picture format' usually contains BITMAPINFOHEADER structure, which
is 40 bytes long, but it is not a good idea to rely on this fact, since it may contain
something of a larger size.

Type-specific data for audio stream:

Field

Type

Size (bytes)

Sound format

WAVEFORMATEX

14

Sound format extension

-

Variable

Size of sound format extension is equal to cbSize member of WAVEFORMATEX
structure.

Stream-specific data for audio stream:

Field

Type

Size (bytes)

H, Total number of audio blocks in each scramble group

UINT8

1

W, Byte size of each scrambling chunk

UINT16

2

Block_align_1, usually = nBlockAlign

UINT16

2

Block_align_2, usually = nBlockAlign

UINT16

2

Unknown

UINT8

1

This data is only present if 'Audio error concealment type' field in the
main structure contains corresponding GUID.
See section 'Audio error concealment' for details on this field.

All valid ASF files contain one Header Object, as well as one Stream Object
per stream.

Data chunk

Data chunk:

Field

Type

Size (bytes)

Chunk type

GUID

16

Chunk length

UINT64

8

Unknown

GUID

16

Number of packets

UINT64

8

Unknown

UINT8

1

Unknown

UINT8

1

Packets

-

variable

As mentioned above, packets have fixed size. It can be found in the corresponding
field of Header Object.

Packets

Compressed video and audio data are usually organized into 'frames' or 'objects' of an arbitrary
size. When one needs to transfer such data in packets of a fixed size, there
can be three opportunities:
a) Frame size is close to the size of the packet. It would be acceptable
to store the frame completely in one packet and pad it to needed size.
b) Frame is larger than the packet. Then it needs to be 'fragmented'
into several fragments and sent in different packets.
c) Frame is significantly less than the packet. In this case it would
be a good idea to send multiple frames in the same packet. It is called
'grouping'.
<Packet>: <Header> <Segment> [<Segment>] ... <Padding>
There may be several formats of headers, but packets in most movies start with
the V82_Header:

Precise meaning of 'packet size' is not known. It rarely appears in ASF streams, and when it
does, it shows complete length of data in this packet ( from the beginning of packet header
to the end of the last segment ). Sometimes it's OR'ed with 0x10 or 0x8, but I've never seen
packets with specified nonzero padding size and 0x40 set in flags.
Segment:

Field

Type

Size (bytes)

Stream ID

UINT8

1

Sequence number

UINT8

1

Segment-specific fields

-

Variable

Most significant bit ( 0x80 ) is set in the stream ID if the segment
contains a keyframe.
Here things become a bit more complicated.
Segment-specific fields depend on whether this segment is grouped
( i.e. it contains more than one frame ) or not. This can be
deduced from flags value, which is inside segment-specific fields
itself!

Segment-specific fields, no grouping:

Field

Type

Size (bytes)

Fragment offset

UINT8, UINT16 or UINT32

Variable

Flags

UINT8

1

Object length

UINT32

4

Object start time, milliseconds

UINT32

4

Data length

UINT8 or UINT16

0, 1 or 2

Data

-

Variable

"Fragment offset" is offset of this fragment in the object ( e.g. video frame )
that contains it. For complete frame in the fragment, fragment offset is
0 and data length is equal to object length.
"Flags" can be either 0x01 or 0x08. 0x01 means "grouping ( multiple objects
in segment )", and 0x08 means "no grouping ( single object or fragment )".
"Data length" field is not needed if this segment is the only one in
the packet, because in this case data takes all remaining space in the packet
( of course, taking padding into account ). Thus, it's only
present when bit 0x01 is set in packet flags.
"Fragment offset" field size is determined by 'Segment Type ID' packet header value.
Known possible values for the latter are 0x55, 0x59 and 0x5D, which correspond
to 1, 2 and 4 byte sizes.
"Data length" field size is determined by 'Number of segments' packet header value.
When 'Number of segments' field is present, its lower bits ( probably 6 of them ) contain
number of segments, set bit 0x40 means that 'Data length' segment field is 1-byte wide,
and set bit 0x80 means that 'Data length' segment field is 2-byte wide. Otherwise,
this field size defaults to 2 bytes.

Segment-specific fields, grouping:

Field

Type

Size (bytes)

Object start time, milliseconds

UINT8, UINT16 or UINT32

Variable

Flags

UINT8

1

Unknown

UINT8

1

Data length

UINT16

0 or 2

Repeat until we run out of data length:

Object length

UINT8

1

Data

-

Variable

...

This structure is similar to the one with 'no grouping', but it does not
have 'fragment offset' field, because fragmentation and grouping can
not take place simultaneously.
Each segment has a field called 'sequence number'. It can be used
to reassemble fragmented objects. Subsequent objects have sequence numbers
that differ by 1 ( there will be larger skips in 'sequence number' fields
when grouping takes place ). Different fragments of the same object have
the same sequence number and the same object start time.
Packets are usually organized in order of increasing timestamps. It is not
known if it's always true. Packets may be missing, and this case should be
properly handled.

Audio error concealment

Sometimes compressed audio is stored in stream in a special 'scrambled' manner.
It should be descrambled before passing data do audio decompressor. This
technique is supposed to increase stream tolerance to errors.
All audio data is separated into 'audio blocks'. Size of an audio block
is a multiple of data sample size.
The process is defined with two variables: audio block length ( Width )
and number of audio blocks in 'scrambling chunk' ( Height ). This process
is most simple to demonstrate with the picture.

Here each [x] is data region with size specified in Block_align_1 field of
scramble definition structure. Width is first field of that structure, and Height
is second field, divided by third.
When total amount of data is not multiple of 'scrambling chunk' size
( in bytes, that's first field times second field ), the remaining part
is written as is, without scrambling.
Even when GUID in the stream header indicates that audio is scrambled, there
may be no need in it, because very often values of W or H are equal to 1.

Streaming over the Internet

Media content in ASF format can be streamed over the Internet
in several ways. Most popular way is streaming using HTTP protocol. Other protocols,
such as UDP, may be supported as well.
URLs for ASF files may lead to 'redirectors'. Redirector is a XML file that
describes media that it refers to, includes other URLs and additional data
needed for stream playback. Redirector files often have extensions .asx, but
it's probably not a requirement. Some details can be found at
http://msdn.microsoft.com/peerjournal/wm/g060199a.asp.

Streaming using HTTP protocol

ASF URLs that start with http:// or mms:// refer to streams that are delivered
to end-user over protocol that's based on HTTP. They can consist of
redirectors, pre-recorded or live ( broadcast ) data. To start transmission,
client program connects to server using TCP ( often on port 80 ),
sends a HTTP request and listens for data.
Here are descriptions of HTTP requests, in sprintf()-compatible form.

The initial HTTP request of media player.
It is used to query for the media type header of
the stream (needed for checking if the codecs are
installed at the client and for obtaining the type
of stream (live stream, pre-recorded content etc..)
.
Note that the request-context changes with every
new HTTP request:

The HTTP request that starts downloading
prerecorded (=seekable) content.
The stream-offset parameter defines the start offset
in the ASF file on the server.
The stream-time is the timecode (milliseconds) for
seeking within the stream:

Pay some attention to lines with 'stream-switch-count' and 'stream-switch-entry'. First line includes a number of streams which you want to receive. Second line includes a string in the following form:
ffff:1:0 ffff:2:2 ffff:4:2 ( etc. )
where each entry corresponds to one stream, first value is always 'ffff', second value is the stream ID from ASF header and third value is unknown.
Even if you request for only selected streams, server may send you all of them. So, request with num_streams=1 and stream_selection="ffff:1:0" will sometimes give you all streams ( instead of one ). Same rules apply to broadcast request, described further.
This is the HTTP request that starts downloading
live (=broadcast) content.

Server reply on these requests consists of an arbitrary number of lines
which are terminated by \n ( 0x0A ) or \r\n ( 0x0D 0x0A ) ( HTTP header ),
an empty line and actual content.
First line of HTTP header has form:
"HTTP/1.%d %d %s", version, errorcode, string
where version is 0 or 1, errorcode is 3-digit HTTP error code and string
is an optional server message. Possible error codes include 200 - no error,
404 - file not found, and others.
Other important HTTP header lines:
"Content-Type: %s", content_type
Content type of data. Possible values:
application/octet-stream - 'real' binary ASF stream.
audio/x-ms-wax, audio/x-ms-wma, video/x-ms-asf, video/x-ms-afs,
video/x-ms-wvx,
video/x-ms-wmv, video/x-ms-wma - ASX redirectors.
"Pragma: features=%s",features
If "features" has substring "broadcast", the stream is live ( not prerecorded ).
Headers are followed by actual content, separated into chunks. However,
these chunks are different from the ones described in previous sections.

Field

Type

Size (bytes)

Basic chunk type

UINT16

2

Chunk length

UINT16

2

Sequence number

UINT32

4

Unknown

-

2

Chunk length confirmation

UINT16

2

Body data

-

Variable

Chunk length corresponds to data that starts from sequence number field.
Basic chunk type can be 0x4424 ( Data follows ), 0x4524 ( Transfer complete ) and 0x4824 ( ASF header chunk follows ).
For type 0x4824 'body data' should be parsed according to the same rules as a local ASF file. It is arranged so that ASF recorder program
would not need to leave any 'holes' in file while recording - this chunk includes all ASF content up to the beginning of first packet with compressed media.
For type 0x4424 'body data' contains a complete packet ( for example, first byte of this data is usually 0x82 ). Network transmission may send chunks that are
shorter than pktsize from ASF file header, by chopping off padding section.
Some fields in ASF file header may be empty, especially for the live stream.

Credits

Most of the information contained in this document was collected by
Avery Lee <uleea05 at umail.ucsb.edu> and by unknown author of
ASFRecorder program. Translated from C/C++ into readable English by
yours, truly <divx at euro.ru>. Comments and improvements are welcome.