In general, the format has been simplified and unnecessary sections were removed (label array, field indices array and list indices array)

Overall, the new format should result in smaller files with faster access times.

File Format Conceptual Overview

Field Data Types

There will be a starting list of supported types and new types can be added as needed. The file format can support as many as 65,535 different basic types.

Type

Type ID

Discussion

UINT8

0

INT8

1

UINT16

2

INT16

3

UINT32

4

INT32

5

UINT64

6

INT64

7

FLOAT32

8

FLOAT64

9

Vector3f

10

Vector4f

12

Quaternionf

13

ECString

14

An ECString is always a reference to elsewhere in the raw data (regardless of flags). The string is essentially a list of WCHARs.

Color4f

15

Matrix4x4f

16

TlkString

17

A TlkString is not actually a string, but a pair of UInt32 values. One is the index of a string in the TLK string table.

Strings are stored as list of WCHARs.

There's also a "Generic" type that's only usable in lists (and references?), with type ID 0xFFFF

Field Labels

The Binary GFF uses 4-byte IDs to label each field. Within each struct, each field must have a unique ID. These IDs could be string hashes or numerical IDs. The only requirement is that the reader and writer of these files agree on the IDs. A large list of common IDs can be found by opening Dragon Age\tools\plugins\EditorGff40.dll with any text editor and searching for the string BinaryGFFIDList.h

File Format Physical Layout

Platform dependence

The endianness of the data is that of the target platform. For example, data files for intel processors should be little endian whereas data for power pcs should be big endian.

There may be other differences in the files generated for different platforms in order to achieve proper alignment or the desired in memory layout of the data.

Overall File Layout

Header

Struct Array

Field Array

Raw Data Block

Header

The header is located at the start of the file and contains the following values.

Value

Description

GFFMagicNumber

All GFF files will start with the hexadecimal value 0x47464620, which is the ASCII value for “GFF “.

GFFVersion

4 bytes representing the version of the underlying GFF format. This should be “V4.0” or 0x56342E30 for all files using this format.

TargetPlatform

4-byte field indicating the intended target platform for this file.

“PS3 “ or 0x50533320 for the Playstation 3

“X360” or 0x58333630 for the XBOX 360

“PC “ or 0x50432020 for the PC

There will most likely be more specialized targets for the PC in the future.

FileType

4-byte field used to identify this file type. By convention it should be the three letter file extension followed by a space.

FileVersion

4-byte version of the FileType. By convention it should be “Vx.x” or “xx.x” where X is a digit.

StructCount

4-byte unsigned number of elements in the Struct Array.

Data Offset

4-byte unsigned offset from the beginning of the file to the Raw Data Block.

The first five fields are always in big endian and never byteswapped. This keeps those fields human readable on any machine.

Struct Array

A struct is a grouping of data. A struct definition describes which data is in a struct. Many instances of a struct type may occur in a single file but there will only be one definition for each struct type.

The Struct Array starts immediately after the header. The first element in the Struct Array is the Top-Level Struct for the GFF file and it describes what the file looks like at the top level. Since the Top-Level Struct is always present, every GFF file contains at least one element in the struct array.

The Struct Array looks like this:

Struct 0 (Top-Level Struct)*

Struct 1

Struct 2

...

Struct N-1**

*Struct 0 is always present
**N = Header.StructCount

The GFF Struct contains the values listed in the table below.

Value

Description

StructType

4-byte programmer defined ID

FieldCount

4-byte number of fields in the struct

FieldOffset

4-byte unsigned offset from the beginning of the file to the first field in the struct

StructSize

4-byte unsigned size of the chunk of data representing the struct

All the fields for a struct are contiguous so knowing the address of the first one and the number of fields is enough information to access any element in the struct.

Field Array

The Field Array starts immediately after the Struct Array. Each field entry describes a piece of data contained in a struct. Each struct’s fields must be contiguous in the array and appear in increasing order of their labels. The fields for the Top Level Struct appear first in the array.

The Field Array looks like this:

Struct 0 field 0

Struct 0 field 1

Struct 0 …

Struct 0 field Struct.FieldCount — 1

Struct x field 0

Struct x field 1

Struct x …

Struct x field Struct.FieldCount — 1

...

Each field looks like this:

Value

Description

Label

4-byte label used to look up the field

FieldType

4-byte describing the type of the field (see below for explanation)

Index

4-byte unsigned offset to the location of the data

The label is just a 4-byte value used to find the correct field. They could be string hashes or some other numerical ID.

The index field stores the location of the data as an offset from the beginning of the struct in the data block. This can result in padding within the structs which can be garbage data for all we care (although by convention we usually start with 0xFF). In particular this happens when trying to maintain alignment for data types such as 16 byte alignment for vectors, etc.

The type is broken up into two 2-byte values that describe the type of the field.

The type looks like this:

Value

Description

TypeID

A 2-byte unsigned number indicating the type

Flags

2-bytes of bit flags

The following flags are currently defined starting from the most significant bits.

Bit

Flag Description

1 (MSB)

List Flag. If this flag is set then this type is a list of the described type.

2

Struct flag. If this flag is set then this type is a struct.

If the struct flag is not set then the BaseType indicates the type of the field by the integer id of that type. If the struct flag is set then the BaseType indicates the index of this struct’s description in the Struct Array.

3

Reference flag

If the reference flag is set then the data stored in the data block is actually an offset from the beginning of the data block to the location of the actual data. References can be used to mimic pointers in a GFF.

Raw Data Block

The data block is where the actual data is stored. The data for the top level struct is stored at the beginning of the block. All other data will be fields in the struct or accessed by reference.

Lists

The address pointed to by a list is actually a reference to another location in the file which stores the list. The first thing in the list is a 4-byte unsigned length of the list followed by the elements.

This is what the list looks like:

Length

Element 0

Element 1

...

Element Length - 1

Empty lists can just store a null reference to prevent creating another block.

Generic lists (the lists with FieldType set to 0xFFFF) store pairs: type, reference where type defines FieldType (with flags) of the individual element and reference is a standard Reference pointing to data of the element. Therefore, each entry in a generic list is 8 bytes long.

References

A reference is a 4-byte unsigned offset from the beginning of the data block to the location of the data.

Null references are stored as 0xFFFFFFFF. Null references can be used in lists, as well as individual reference items.

Improvements and optimizations

Field look-up

Field look-up is faster in this version of the GFF because it requires an integer comparison instead of a string comparison. Additionally, storing fields in order allows efficient search for the specific field. Finally, knowing what the data structure should look like allows the program to make a good guess as to where the data should be. If the guess is wrong (because the file is an older version) then the program only has to start searching at the initial guess, which is still a faster search than without a good guess.

Binding tables

It is possible to build a binding table for a type of file that will be read in often. The binding table will allow data to be loaded from the GFF without parsing through the file each time.

Direct loads to memory

Using this format it is possible to optimize specific file types by writing the GFF in exactly the way it will be written in memory. After verifying that the file type is up to date, the game can read the data block into memory and cast the pointer it directly to the in C data type. This optimization would not result in any loss of generality since the GFF will still be accessible in the usual way.