What language does this script use? It is something that can easily by used with Windows 7? If so, can you please post it?

I believe I wrote it in php, which can be installed on Win7. It just accepts an utf-8 input text file and spits out a character list to stdout, it's not very sophisticated. I'll see if I find it when I go to work tomorrow, if not I can probably rewrite it in a few minutes if you're still interested

It'd be nice to eliminate all characters from the script that occur inside html tags. Those wouldn't necessarily need to be a part of any embedded font since they won't be rendered.

My (rather stupid) script expects pure utf-8 text files. You could get those by converting an epub to txt in calibre (remember to specify utf-8 as output encoding). Most authoring software can probably save to txt as well. Formatting doesn't really matter as long as every character is included. This could maybe have been more convenient, but parsing html is outside of my abilities, and I want those results before making an epub as well.

Since you might be interested only in special characters, you could just add a bunch of regular characters that you're not interested in to disallowed = set('') in line 6, ie

An attempt to modify so that only the text of an html document is parsed and also allow the input/output of other charset encodings. The default is utf-8 if not specified on the command-line. I got it to work with either utf-8 or windows-1252.

BTW, Python-challenged ebook designers could simply compile an epub with KindlePreviewer/KindleGen and have a look at the detected Unicode ranges in the log file. For example, if you compile the book mentioned in roger64's post you'll see the following output:

BTW, Python-challenged ebook designers could simply compile an epub with KindlePreviewer/KindleGen and have a look at the detected Unicode ranges in the log file.

I think that getting the ranges isn't fine grained enough. We're not wanting to check that our fonts cover the characters used, but to trim the fonts to cover only the characters used. Of course, making sure that all the needed characters are in the font will be part of this.

I think that getting the ranges isn't fine grained enough. We're not wanting to check that our fonts cover the characters used, but to trim the fonts to cover only the characters used. Of course, making sure that all the needed characters are in the font will be part of this.

Actually I am very interested in finding out if there are missing glyphs... Small squares in place of characters is a far larger problem than a few tens of Kilobytes in size IMO

I suspect that most methods of subsetting would also give you a "free" coverage check in the bargain.