I have been writing a program in C# (.Net 4.0) to help my dad reorganise his ebook collection (almost entirely MOBI format) by renaming all the files into a tidier format based on the authorname of the folder they're in.

If possible I'd like to extend this to programmatically get the author name and title etc from the metadata within the MOBI file itself. I'd like to try to do this from the standard MOBI metadata since not all of these ebooks will necessarily have been generated by or processed using Calibre.

From what I've read so far reading EXTH header information can be tricky because it can be compressed, some of it using Mobi's own secret compression scheme.

Am really just starting out on this, so was wondering if anyone had any information on programmatically reading EXTH header information and whether it's necessary to first get a routine to decompress the file? Looking at the wikipedia entry for mobi file format and EXTH header, I think I can probably easily read in the information I want once I can get at its XML format rather than the compressed version that seems to be in the MOBI files I have.

I don't want to write or update anything within the file, just read the metadata.

I had a shot as doing something similar last year (just for fun). I don't recall any problems with compression of the meta data (as opposed to the actual text of the book). But I never got round to testing it with a large sample of books.

Don't know anything about Python so will have a nosey through and try to translate how I can achieve a similar thing in C# (although I imagine once I've streamed the text of the file in the code will be fairly similar).

You might also find the (non-obfuscated) Java code of gluggy's Java Mobi Metadata Editor useful, which you can view with JD-GUI or request from the developer.
There's also Alissa's MobiHandler, whose source code is included in the release.

You might also find the (non-obfuscated) Java code of gluggy's Java Mobi Metadata Editor useful, which you can view with JD-GUI or request from the developer.
There's also Alissa's MobiHandler, whose source code is included in the release.

Since there are many different tools to manipulate Mobi Headers, I have put together a python 2.7 program that will work with Amazon/Mobi ebooks created with the very latest version of Kindlegen.

This program will dump all known and unknown fields and all EXTH metadata in each mobi header that is found in the ebook. This includes the latest KF8 dual mobi style books that Kindlegen now generates which have two separate headers and two EXTH metadata storage areas.

To run the program simply do the following:

python ./DumpMobiHeader.py PATH_TO_YOUR_EBOOK

on Mac or Linux

or

python .\DumpMobiHeader.py PATH_TO_YOUR_EBOOK

when running cmd.exe under Windows.

It should work on both drm and non-drm Amazon/Mobi style ebooks that use the latest header layout since the headers and metadata are not excrypted themselves). Please note that Amazon ties its DRM to many of the metadata fields (watermark, tts, etc) to prevent them from being changed. Also some new metadata values are required for the ebook to be read properly. So be careful exactly what metadata values you change or delete. You may end up breaking the ebook.

I wrote this to document all that is known about based on other tools, the wiki about our Mobi format, and from reversing the latest KF8 format mobis for the Mobi_Unpack program.

Even if you do not read/follow Python, the code itself documents what is known and should be easy enough to follow along.

If anyone knows of *any* corrections or extensions please let me know so we can keep this program updated to help properly document the mobi format.

Hi cool thanks. Please note that my intention is not to change ANY of the metadata, or to change the MOBI file at all. Merely to be able to read the title and author of the book from the metadata so I can programmatically rename the file.

Basically I have written a program for my dad that will tidy up and rename the ebooks based on the author folder they're in. But he has many ebooks which are in a miscellaneous folder with just 1 book per author, and not worth the time to manually put these into separate author folder. Since there's no reliable way to get the program to guess the author name and title based on the filename, programmatically examining the MOBI header seemed a good way to handle it.

So far I've just got to the stage where I can stream the array of bytes into my C# program, reading in the first X bytes (usually the header looks to be just over 1024 bytes so I am taking in 2048 to be on the safe side), since obviously we don't need the whole damn book, just the header!

Of course now the challenge is to parse these bytes and establish whereabouts the relevant author name and title can be found. I currently program in Delphi at work but formerly had many years experience with C# and VB, but never really used C++ or Python.

Anyway I will take a look through your code hopefully it will be a helpful documentation of how to parse the codes.

If I get anything up and running I`ll try to make the C# source available in case any other .Net programmers out there are interested.

Please do. I for one would find it very useful (especially as I can understand C# better than Python).

Hi Mike.

Sorry for the delay I've been kinda busy for several months and just got round to popping back on this forum.

Just to let you know I have something up and running now which will fully parse a .mobi file, PDBHeader, PalmDOCHeader and all of the Mobi header including the extended metadata, written entirely in C# (for .Net 4.0) without relying on MobiPerl or any other external modules.

The core classes also come with supporting features such as SortedDictionary properties on each header to easily set the contents of a header as a datasource on a grid etc, and overridden ToString() methods to pump out the properties and values in a plain text format.

The interface allows for any combination of mobi properties to be selected as columns so you can view all metadata for all books in a folder at the same time in a listview, Windows Explorer style, as well as the full metadata on individual books from a right-click popup dialog.

I actually wrote this a while ago but have recently moved house and with a number of other things on just haven't had time to pop on here.

Going to tidy things up a bit when I get home tonight and then I'll post the source code and description.