for embedding font in a PDF document. However, this does not apply to UTF-8 fonts, as we may use characters which are not in a series (random characters throughout the UTF-8). On the other hand, it is not possible to embed the entire UTF-8 font, as it is too big for a PDF documents (and of course is useless).

How to selectively embed the fonts with a few characters? My question is about the PDF code (how to write it)?

POSSIBLE APPROACHES:

Manual: With the aid of programs like FontForge, we can capture a custom font with selected characters, but how to point to the custom characters in the PDF document? PDF only asks for the FirstChar and LastChar.

Automatic (Preferred): Embedding the entire UTF-8 font into the PDF document, then optimizing the pdf documents (with tools like pdftk) to remove unnecessary characters. Is there such tool?

First of all you don't describe your environment, what are your requirements for a solution? When you mention a "manual", you make it sound like some person is there to manually put the bytes together... And what do you mean by "PDF only asks for the FirstChar and LastChar"? Those values mostly restrict the dimension of width arrays.
–
mklDec 5 '12 at 6:22

1 Answer
1

Defining FirstChar and LastChar in a PDF file doesn't affect the actual font data at all. In order to embed less than the entire range of characters in a font you need to 'subset' the actual font data. That is, find the description of each glyph, store each description required, and then generate an appropriate framework to contain the glyph descriptions, which depends on the font type.

In general there is no such thing as a 'UTF-8' font. Fonts contain a series of instructions on how to draw a number of glyphs, and a means for indexing from a character code to find the correct glyph description. For PostScript fonts this is given by the Encoding, for CIDFonts its given by the CMap and for TrueType fonts it is given by the CMAP subtables embedded in the font.

So in order to achieve your goal you will need to understand the font format you intend to use quite thoroughly (PostScript type 1, type 2, CIDFont or TrueType), be able to determine which glyphs descriptions you need, extract those from the font, and then build a new font which contains just those required glyph descriptions.

This would be a lot of work. As @mkl says, you would probably be better advised to describe your intended workflow and we cna try to advise you better on how to achieve it. For example bot Adobe Acrobat Distiller and Ghostscript's pdfwrite device will subset fonts when converting PostScript to PDF.