I've got 120,000 files (way more, actually; this is just an arbitrary subset) of an unknown type. Linux file does not identify them (not that they're necessarily Linux files), nor do any other methods I've tried. There are only two hints about them that I currently have. One is that I suspect some compression is employed -- I have metadata that claims the file sizes are always some amount larger than what I observe.

The other is that in 100,000 of these files, the first 16 bytes are always:

ff ee ee dd 00 00 00 00 01 00 00 00 00 00 00 00

That really looks like a file header/magic number to me, but I just can't place it. Does anyone know what kind of files this would indicate? Alternatively, can anyone convince me that these suspiciously common bytes certainly do not indicate a specific file type?

UPDATE

I don't know the exact reverse-engineering details, but most of the files in our case are zips after the first 29(? or so) bytes are ignored. So in practice the problem is solved (we know how to process the files) but in theory the question is still unanswered -- I don't know which application routinely prepends about 29 bytes to its zips. [I'm not sure if I should leave the question open or not at this point.]

1 Answer
1

TrID is an utility designed to identify file types from their binary
signatures. While there are similar utilities with hard coded logic,
TrID has no fixed rules. Instead, it's extensible and can be trained
to recognize new formats in a fast and automatic way.

TrID has many uses: identify what kind of file was sent to you via
e-mail, aid in forensic analysis, support in file recovery, etc.

TrID uses a database of definitions which describe recurring patterns
for supported file types. As this is subject to very frequent update,
it's made available as a separate package. Just download both TrID and
this archive and unpack in the same folder...
...
...

Update
After reading your update, about the fact that they are Zip files with 29 bytes added in front of them, maybe these prepended bytes are from some kind of "glitch" due to the way these files were obtained.

Example 1:
Maybe these files have been extracted from a big-single-file backup of a file server (For example, if you do a server backup using NTBackup in a single file, NTBackup may prepend some attribute data in front of the data actually contained in the files)

Example 2:
maybe these file where extracted from a DB, where they were stored like BLOB object

Example 3:
maybe these files have been extracted form a a RAW CD/DVD image (the prepended bytes may com from a wrong interpretation of the file offset/file system)

There are an infinite number of hypotesis... maybe if you know from where does these files come from, you can do some test/check to see if there is an utility/software/tool/DB/server that archive zip files in some other file/data structure, prepending these 29 bytes.