Introduction

I have always wondered about the Adobe Reader and the PDF files. Have you ever tried to open a PDF file in a text editor? It’s amazing! In this project I am trying to bring the hidden things behind the PDF files to light. This simple application lets you create PDF files, just as you create txt files from a Notepad (hence the name pdfpad). Type your text in the editor and save it as a PDF file. Of course, you need Acrobat Reader to view the created PDF file. You cannot open an existing PDF file in this editor. You can only create, and once created its done. The greatest feature of this project is the digital signature. It teaches you the very basics of adding an invisible digital signature to the files created using pdfpad. It automatically adds an invisible digital signature when you create PDF files in pdfpad.exe.

Due to lack of time, many of the details couldn't be included. Please bear with us.

Background

Firstly, one should know the basics of the PDF format. I recommend you to download a copy of the PDF reference manual from PDF Reference and go through it (oops it's 1000 pages!).

Download the application demo, enter some text, and save the file as PDF. Now open the file in notepad and read on…

If one says C++ is object oriented, I would say PDF is more object oriented. In a PDF everything is treated as an object and every object has its own property and will refer other objects. This makes large PDF files (A 1000 page book just downloaded) to be navigated randomly and quickly.

A PDF file is read from the last. There is a token called the startxref, this is were everything begins. A viewer application reads this entry to get the offset of a table called xref. The table lists the objects used in the file and also their byte offsets within the PDF file. The format of the entries greatly matter here. Each entry should be 20 bytes long including the carriage return and the line feed.

Every object is numbered sequentially starting from 0 to n. ( though not necessary). If you notice the xref entry you will find a ‘0’ and a number n. This means that the table contains n objects starting from 0. Just take a look at them… 0000000074 this is the byte offset, 00000 is the generation number, n ..means it's in use. Only the first entry has the generation number that is not zero and it's marked f. Read the reference manual for more details.

A PDF document can be regarded as a hierarchy of objects contained in the body section of a PDF file. At the root of the hierarchy are the document's catalog dictionary. Most of the objects in the hierarchy are dictionaries. Each page of the document is represented by a page object, which is a dictionary that includes references to the page contents and other attributes such as its thumbnail image and any annotations associated with it. The individual page objects are tied together in a structure called the page tree, which in turn is located via an indirect reference in the document catalog.

The root of a document object hierarchy is the catalog dictionary, located via the Root entry in the trailer of the PDF file. The catalog contains references to other objects that define the document's contents, outline article threads, named destinations and other attributes.

Now to start with, the reader reads the value of the root entry in the trailer. This is the root. This is the root of all the references that are to be made. Now the reader reads the byte offset of the root object and moves to the root. This is a catalog dictionary. This again contains many other references. In our application only minimum entries are made so that it is easy to understand.

Now let’s see what happens to the text that we enter in the edit box. Firstly, all the occurrences of the end of line are replaced with the PDF operators for line feed. Then all the operators for showing the text on the page is added in the contents dictionary. This content is added as a stream, which is called a content stream. For compressing the text I have used zlib, courtesy zlib, this is a freely downloadable library. Flat compression algorithm is used to compress the text. This algorithm is supported by the Adobe viewer.

The most amazing thing is about the digital signature. I haven't employed a real life digital signature using cryptographic libraries. All I intend to show is, how to add a digital signature to the PDF document. The entries here are all dummy entries. This signature can be made a real digital signature if you can change the contents entry in the Signature dictionary with the real signed hash of the document.

I won’t be covering the details of the digital signature here. I will stick to the details of the PDF. PDF has two types of digital signatures, invisible and visible. Our application uses invisible signatures. The signature can be viewed in the signature panel. The entries in the signature dictionary can be changed to put your name, time of signing, location etc., programmatically using the user's inputs. This is left to you.

When a digital signature is added to a document, the Adobe acrobats signature handler calculates a checksum that is based on the content of the document at that time and it embeds the checksum in the signature. When the signature is validated, the handler recalculates the checksum for that signed version of the document and compares it with the value in the signature. If the signed version has changed in any way the signature handler detects the change and marks the signature as invalid.

You can also use Crypto API to create the hash, Sign using Digital Certificates etc., which I hope to cover in my next article. While creating the hash, the byte range must be specified correctly. Byte range is an array of two integers, Starting offset and number of bytes. Byte range array is used to exclude the contents entry in the signature dictionary. This entry will be filled with a temp entry initially to get the total file size for calculating the hash. After creating the hash the contents entries are be made. This explains why byte range is specified so as to exclude the contents entry from creating the hash. Otherwise while verifying the signature it may get invalidated.

Once you get a grip of the reference manual you can modify the code below, to add more pages, add drawing to the below etc.

Using the code

The main function that creates the PDF files is added to the ***doc class, it's called CreatedPdf ( CString text). I enjoy manipulating the CString object rather than using the char buffers. You can modify accordingly to make it more efficient. The code is well commented to explain the details.

This is a part of the Doc class, that should be modified to write the files in PDF format:

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Hi
The first thing a reader looks for is the version number of the Pdf. Then it tries to figure out if the file is written as binary. If the file is written as binary then
%AAAA. four characters with value greater than 128 is written are the comment.

Regards
Shiraz

The Best Relligion is Science.
Once you understand it, you will know God.

Hi
Implementing Digital Signature is only possible if you know how to use CryptoAPI. There is whole lot of apis for searching the certificate chain, ecryption using private key, varifying using public key.
The process is
1. We should create hash of hte pdf file specifying the byte range.
This byte range excludes the portion of the file where you intend to keep the signature content. This is quite obivious

2. Take the certificate from the certificate store. (refer Crptoapi)
3. Take the private key and encrypt the hash. This is called the signature content.
4. Store the signed hash in the file. (inside the excluded byte range portion.) We know the size of the signed hash.
5. thats all. Its all signed.

Hope this would help you

Regards
Mohamed Shiraz

The Best Relligion is Science.
Once you understand it, you will know God.

Hello
I have been working on a digital signature Plug-in for Adobe Acrobat 5.0 and 6.0 and making the v3 of the software. now I want to add some feature that is only allow the user that is defined in some kind of list (certificates would be used) I put the list of certificates of the persons who can sign the document in first field. every thing works fine but the probelm is when we already have blank signature fields and try to add. anyways actually I want to get enumration of signature fields according to the order in which they were signed and for that purpose I want to use some way other then adobe SDK

hi..
My understanding is:
You should know how many persons have signed the document, their information etc without using sdk. ? rgt ?
if so you must create a parser using the information in the xref table and look for the signature dictionary and get the information.

Enjoy !

The Best Relligion is Science.
Once you understand it, you will know God.

Acctually the article is great.
Now I want to add a Image to the PDF File.I just Create a bmp like this....
Bitmap bmp(L"myImage.jpg");Now I want to add bmp to a PDF File...
Can u please help me for my problem....

I Downloaded Acrobat 5.0 SDK.There is a sample in Acrobat 5.0 SDK\PluginSupport\Samples\PDF Creation and Editing\AddImage.
When I compiled it ,create a file called "AddImage.api".
How do I use that file to add a Image to PDF?.I don't know what is that?.

hi
I think you are no the wrong place. This article has nothing to do the adobe sdk. This is all about doing it without adobe.

*.api file is the plugin file that you compiled. If this is copied to the adobe acrobat's ( not the free reader ) plugin folder, you will find a new button when you open acrobat. Clicking this you can add image to the opened pdf file.

Now there is another forum for the sdk users run by adobe. Thats the place where programmers using adobe acrobat sdk put questions.

http://www.adobe.com/support/forums/main.html

see u there..

Regards
Mohamed Shiraz T K

The Best Relligion is Science.
Once you understand it, you will know God.

hi
I suggest the start point will be implement digital signature in the above applicaiton after removing the compression. ( I heard that compression is causing some problem, making pdffiles useless ).

Then using crytoapi create the digital signature and place as mentioned in the article. You should look into the pdf reference manual how to add the signature so the the acrobat signature could recoganise them. Add the signature accordingly.

To add signature to the existing document is little more difficult
hope this would help you

Regards
Shiraz

The Best Relligion is Science.
Once you understand it, you will know God.

have you some experience on creating the /Contents stream using cryptoapi?

I'm trying to get the adbe.pkcs7.sha1 contents using capicom but I get a different result. I dont know where and how to insert the sha1 digest. The signature that capicom gave me is shorter than the one created from acrobat.