I'm doing a hobby project on some game data files. I would like to edit some things in them and repackage them so the game accepts the modifications.

The directories themselves were archived in a proprietary format which was easy enough to open up. The files were compressed with zlib. Now I'm stumped, because it seems there is still (at least) one more layer of archiving. The files seem to be serialized, but looking up the most common obvious answers didn't pan out. Google wasn't helpful. I didn't find any magic bytes (doesn't mean there aren't any, I just didn't find any). How do I find out what the serialization format is, if it is commercial? If it is not, how should I approach the problem?

A little background:

the file is read by a Visual C++ application on WindowsI believe the file pre-serialization was XML-likeI've decompiled the .exe, trying to step the process while data files were being read didn't work out (it reads in 7Gb of data, I couldn't locate the start of the file type I wanted to work with). Fishing for helpful strings didn't work out either.I've tried comparing to Python pickle, marshal, VC++ MFC marshal and various archiving program formats. No luck.Distinctive features of the serialized files:

and so on. The other headings in the TOC are TOPO, CHNK, CLAS, PROP, STRG, TRAN, IMPR and EXPR all followed by offset and length. Offset and length values are big-endian.

The file itself seems to be either type-length-value encoded (human-readable strings falling under the CLAS heading) or type-different type-value in 4 byte chunks. There are 4 byte blocks like AA AA AA AA, AB AB AB AB or BB BB BB BB which probably work as delimiters.

There are long parts of data where nothing changes except one byte is increased by 1. Looks like an index of sorts.

The file data may contain various data types.

I had the chance to compare two different versions of the data files. Changing int values in the unserialized file lead to very small changes in the serialized file (typically one number changed in the original lead to one hex value being changed in the resulting file).

The format is extremely space inefficient. Most everything is in 4-byte chunks and the file is compressible by a factor of 10. This and human readability of strings have lead me to believe the file is not compressed or encrypted in any way. It's just serialized somehow.