John Brinkman on form design

Posts in Category "Performance Tuning"

I enjoyed attending MAX last week. The best part was putting some faces to names of people who had previously been cyber-personalities.

We did a pre-conference lab on LiveCycle best practices. For the lab I prepared four exercises on various aspects of form design. For those who weren’t able to attend, I’ll post those exercises as a series of blog entries.

Exercise #1: Measure and Improve JavaScript Performance

In this exercise we optimized a fairly simple script – a few lines in a loop. Depending on your machine, you should be able to improve the performance anywhere from a factor of three up to a factor of seven.

Let’s talk about image size (again). But this time in the context of data capture. We have this wonderful image field that allows your users to attach image files to their form. This is great. but it’s also scary. You likely don’t want to use this for the 10MB images that came straight from their digital camera. Consider that these images will be base64-encoded and inserted with the rest of the form data — and stored in the PDF. You want to limit them to images that are reasonably sized.

Today’s sample has an image field with a validation script that checks the size of the loaded image. For an image field, field.rawValue will return the base64 value of the embedded image. The length of that string will tell you how big the image is. My validation script does a rough conversion — image size is roughly 3/4 the size of the base64 string. If that size is greater than your threshold you can choose to either reject the image (set the field to null) or simply mark the field as invalid.

By now I’ve mentioned a bunch of times that it is better to link your images than to embed them. Most notably in this blog entry. Just to review the facts one more time:

Linked/embedded applies only to the form definition (the template). By the time you generate a PDF, the image is always embedded in the PDF. But the mechanics of how it is embedded in the PDF differs from embedded to linked.

Images embedded in the template are stored in base64 format. Linked images are stored in binary form. Base64 is 4/3 bigger than binary

Embedded images cause very large form templates. Linked images are stored outside the form template.

Same embedded image used <n> times is stored <n> times. Same linked image used <n> times is stored once

Now, just in case all of that rationale was a bit too much tech-speak (what’s a DOM anyway?) let me simplify the message: Do not embed images in your templates. When you see this pallette, do not check the box.

Lost Images

I predict some of you will look at your forms and realize that you no longer have the original image that you embedded in your template. It is possible to recover these images and extract them to an external file. The process is a bit messy, but if you really need it, I’m sure you can figure it out. Here are the steps:

In Designer, select the image object and then switch to XML source view. Make note of the contentType. In the example below, it is image/JPG. This will tell you what file suffix to use when you download the file.

Go to the web site and paste into the "Base64 to Decode" field. Click the "Process now" button.

Now click the "download binary file now" button. Save the file using a suffix according to the content type — in this case, with a .jpg extention.

Once you have the image extracted, go back to your form design, re-specify the url to the new image and de-select the embed option. This would be a great tool to develop into a Designer macro. Especially given that mx.utils has a Base64Decoder class.

It strikes me that we don’t complain about big email attachments as much as we used to. We don’t mind as much when aunt Grace sends us a batch of 2MB jpg files straight from her camera. The photo quality is the same as always (sigh), but at least it doesn’t bring our internet connection and email client to it’s knees anymore. Heck, even Dad and Mom now have a broadband connection.

But just because we *can* send big attachments doesn’t mean we should. Let’s have some professional pride in making sure that our files are as lean as possible. Today’s topic is about managing your fonts so that your PDF forms are tidy and small.

I’m sure you already knew this, but I’ll repeat the basics. PDF files can embed font definitions. The advantage of embedding a font is that it guarantees your PDF will look exactly the same no matter where it’s opened — you don’t have to worry whether the user opening your form has copies of the fonts you used or not. If you don’t embed fonts, and the user doesn’t have
"Charlemagne Std" on their system, Reader will display the PDF with a substitute font. It won’t look the same.

Of course, the disadvantage of embedding fonts is that they’re big. They bump up the size of your PDF in a big hurry. Often around 200K per font.

Here are some notes to remember about font usage:

Note #1: If you’re using common fonts, and especially if you can tolerate some variance in your page display, then don’t embed fonts. Designer embeds all fonts by default:

Note #2: If you’re embedding fonts, use as few fonts as possible in your form design.

There was a reason I just finished updated the form reporter. It will tell you what fonts you have used and how many instances of each. I recently reviewed a form that showed this in the report:

All fonts were embeded. The form had one object using Times New Roman. That one instance bloated the PDF by over 200K. After I consolidated all font instances to Myriad Pro, the form was a total of 600K smaller.

Note #3: Not all fonts are equal in size

I haven’t done an extensive accounting, but it appears that Myriad Pro is smaller than most. A small form with Myriad Pro embedded is 77K. While the not-embedded version is 13K. Why does embedding Myriad Pro cost only 64K while Times New Roman was 200K? I’m told that the embedded Myriad Pro excludes character sets that are not in use e.g. Cyrillic.

Note #4: Reader installs fonts

On my system, Reader X installed Minion Pro, Myriad Pro and Courier Std. There are asian font packs for Reader available for download. I’d like to think that for most users having Adobe Reader installed fonts would mean they don’t need to be embedded in your PDFs.

Note #5: Fonts can be subset

If you use a font in an interactive form field, you need to have the entire font embeded. But if the font is used only in boilerplate text, then you only need to embed the definitions of the characters that are found in the PDF. In this case we can reduce the size by embedding only a subset of the font. Options to embed subset fonts are not exposed in Designer. This is server-side processing. And as long as you’re mucking in that area, you can also explicitly choose on a font-by-font basis which are embedded and which are never embedded.

Edit Fonts in Designer

Here’s the problem: The form report shows you have one instance of Times New Roman in your form design. Now find that one instance among the 300 fields on your form and change it. If it were me, I’d probably switch over to source view. But that’s not very user friendly.

This becomes yet another case where Designer macros can be very helpful. Here is a zip file that contains a Designer macro to perform global font substitutions. When you run it, you’ll get a dialog like this:

The macro will replace the font references it finds in <font> elements, as well as the font references it finds inside rich text values. Just be sure to type the names of the fonts correctly. If you mis-spell the replacement font, the macro will happily give you a form full of "Myirod Pro" references. When the macro completes, look in Designer’s log display for a summary of the changes.

In your form design you have two choices when it comes to labelling your fields. You can either place the descriptive label text in a field caption or you can create a separate text object with the label text.

There are pros and cons to each, but today I am going to try and convince you to use captions — and give you a tool to make it easier to create captions.

Whether your descriptive text is stored as a separate object or whether it is specified as a caption, in most cases we can achieve the same visual effect with either. Admittedly there are some layout conditions where a caption cannot be used. e.g. where the label extends both above and to the left of the field. But this is the exception.

The advantages of using separate objects:

Absolute control over label positioning

Can be less effort to author in this mode

The advantages of using captions:

Far better accessibility experience

Results in smaller form templates with better performance

Lower form maintenance

Accessibility

Field captions can be used to generate a good accessible experience. The caption text can be used as the text that the screen reader uses to describe the field. Obviously when it is stored as a separate object we do not have this correlation. When the label is a separate object, we have extra “noise” as the screen reader traverses the read order of the document. The read order will encounter the label and the field as objects that will be read separately. If they’re not consecutive in the read order, there is potential for even more confusion.

Form Size

As I’ve mentioned before, form size matters. If your form is delivered with rights enabled and/or with certification, the time to open the form is directly impacted by the size of the form template. This is because both RE and certification use encryption and we need to validate that the form has not been tampered with at open time. To make a long story short, the validation involves loading the form template more than once. If there are fewer objects and less syntax, the form will load faster and open time will be reduced.

Lower form maintenance

When a caption is combined with a field, then all operations treat the field and caption atomically:

A script to make a field invisible will also hide the caption.

A page break will never separate a field from its caption, but could separate a field from label text.

Moving or deleting a field in Designer automatically does the same for the caption.

The Form Conversion Problem

My suspicion is that very few form authors consciously choose to separate the label from the field. In most cases this is a byproduct of a conversion process from some other format to XDP. The form author’s starting point is a form design where all fields are separated from their caption. The question becomes: is it worth the time and effort for an author to combine these objects? Is the net gain worth the investment? The answer is dependent on your own scenario — and dependent on the amount of effort to fix the form.

Designer has a built-in function that will combine text and field into a combined field with caption. You select the text and the field and choose “Layout/Merge as Caption”. The problem you’ll find is that while designer does combine the objects, it does not preserve the layout, positioning, font and paragraph properties of the label when converted to a caption. I think you will see this function improve in future versions of Designer. But in the mean time, this is also a function that can be customized using a designer macro. Here is a designer macro you can install that will combine labels with fields when the label is positioned above the field.

You should find that the macro works fairly hard to give you a finished result with the same visual appearance as the original. Here is a sample PDF with the separated label/text on the left and the combined result on the right.

As I was writing the macro, I realized there are cases where the end-user might want a different behaviour than I’ve provided. For example, when the caption is wider than the field, should we expand the field to match the width of the caption? That was the choice I made. If you prefer a different result, you can tailor the macro to your own needs.

Enjoy. Hopefully there’s enough of a starting point here that someone could expand the macro to also handle captions placement to the left/right/bottom of fields.

I have recently been involved in a couple of contexts where customers have worked hard to reduce the size of their XDP and resulting PDF files.

We know that size matters. The smaller the better. We want small PDF files for downloading. We know that encryption-based technologies such as certificates and rights enabling perform faster when the form definition is smaller. Today’s discussion is a couple of tips for reducing the size of your XDP and PDF.

Fonts are a simple discussion. Your PDFs are much smaller when the fonts are not embedded. If you need to embed fonts, use as few as possible. To see which fonts are referenced by you form, you can check out the form summary tool here.

But the main topic for today is syntax cleanup. The elements and attributes in the XFA grammar all have reasonable default values. We save lots of space by not writing out the syntax when the value corresponds to the default. For example, the default value for an ordinate is "0". If your field is at (0,0), we will not write out the x and y attributes.

However, there are a couple of places where the syntax cleanup could use some help. Specifically: borders (edges and corners) and margins. There are scenarios where these elements can be safely deleted from your form.

Today’s sample is a Designer macro (with the sample form) for eliminating extra syntax. If you don’t want to understand the gory details, you can just download and install the Designer macro. (If you haven’t already, please read this regarding Designer macros). After running the macro, look in Designer’s log tab for a summary of the cleanup. Then double check the rendering of your form and make sure it hasn’t changed.

If you want to understand a bit more, I’ll explain what is going on. There are three instances of syntax bloat that the macro will clean up:

When you modify edge properties, designer will inject <corner> definitions. However, the only time a corner definition has any impact on the rendering is when the radius attribute is non-zero. In this example, the <corner> elements may all be safely removed without changing the rendering of the form.

I’m not sure how I ended up with this configuration, but as you can see, all the edges and corners are hidden.This would be be represented more efficiently as <border presence="hidden"/> or better yet, no border element at all. But before cleaning up this syntax, bear in mind that there can be a useful purpose here. If you have script that toggles the presence of border edges, it is useful to have the edge properties (e.g. thickness, color) defined in the markup. If you remove the markup, you will need to set those properties via script.

Zero Margins

If you have edited object margins, you could end up in a situation where your margin element looks like:

We have had some feedback around the performance of paper forms barcodes on large forms. Seems that when customers use multiple barcodes to process large amounts of data, their form slows down.

The reason the form slows down is because the barcode recalculates every time a field changes. The calculation is doing several things:

Generate minimal, unique names for each data item. When the names are included in the output, we want them to be as terse as possible, while still uniquely identifying the data. To do this, each name needs to be compared to all other names.

Gathering data values to be included in the output

Formatting the result as delimited text

It’s the first item that takes the bulk of the time. In order to find the minimal name, the script compares each data node name against all others and iteratively expands the names until they are unique. The algorithm appears to have O(n2) performance — which means that it degrades quickly when the number of data elements grows large.

There are three techniques you can use to improve the performance:

1. Do the work in the prePrint event

Move the barcode calculation to the prePrint event. In the barcode properties, uncheck "Automatic Scripting" and move the script from the calculate event to the prePrint event. Now, instead of recomputing the barcode every time a field changes, we compute the barcode only once — just before it gets printed.

2. Use unique field names

When the script encounters duplicate field names, it does lots of extra work to resolve them. So don’t ask the script to do so much work. Use unique field names. For example, instead of:

Not only will the script complete more quickly, but the names written to the barcode value will be shorter. When they are not unique, they get prefixed with their subform name. When they’re unique, they are left unqualified.

3. Do not include names — and modify the script

Since the bulk of the work that the script does is to come up with minimal unique names, let’s not write out the names. Uncheck the "Include Field Names" option. Unfortunately, the script goes through the effort to produce unique names even when names are not included in the output. You need to modify the script to prevent it from calculating names. Uncheck "Automatic Scripting" and add the lines in red below.

19 function encode(node)20 {21 var barcodeLabel = this.caption.value.text.value;22 if (includeLabel == true && barcodeLabel.length > 0)23 {24 fieldNames.push(labelID);25 fieldValues.push(barcodeLabel);26 }2728 if(collection != null)29 {30 // Create an array of all child nodes in the form31 var entireFormNodes = new Array(); if (includeFieldNames) {32 collectChildNodes(xfa.datasets.data, entireFormNodes); }3334 // Create an array of all nodes in the collection35 var collectionNodes = new Array();36 var nodes = collection.evaluate();3738 for(var i = 0; i < nodes.length; ++i)39 {40 collectChildNodes(nodes.item(i), collectionNodes);41 }4243 // If the form has two or more fields sharing the …44 // parents of these fields, as well as the subscript …45 // their parents, will be used to differentiate … 46 // to take as little space in the barcode as possible, …47 // data in the object names only when necessary … if (includeFieldNames) {48 resolveDuplicates(collectionNodes, entireFormNodes,…); }

If you implement this option, odds are you won’t bother with the first two methods. The performance of the script is now O(n) and should work fine in the calculate event with non-unique names. And … is it just me? I get a kick out of watching the 2D barcode re-draw itself every time I modify a field.

I continue to spend time playing with the prototype macro capability found in Designer 9 (ES2). I have a new sample that uses it, but of course before showing you, I need to reiterate that the macro stuff is still an unsupported trial feature.

I wanted to do something to further help people debug their form script. The usual way this happens today is that you sit down and add console.println() commands to the various scripts that run. There are a few drawbacks to this mode of debugging:

It is tedious to add the debug/trace statements

It is not always clear which scripts need to have trace added

If you generate too much output, the console window fills up and stops showing content

FormCalc scripts cannot access the console object

Dumping large amounts of content to the console slows form execution considerably

To help matters along, I have written a macro that will modify scripts in your form and automatically add trace statements. The injected code does the following:

Note that when you use this debugging technique you must not leave this script in your production forms. It adds too much overhead. When you want to trace/debug your form, follow these steps:

Save a copy of your form

Add trace to the copy

Run the copy in Acrobat (not Reader)

When you want a trace dump, print the form. The pre-print event fires and prompts you to save the trace (when trace has been saved, cancel the print)

debug and migrate changes/fixes back to the original document

The actual macro is here. You need to rename it to have a .js suffix and put it in a subdirectory under the Scripts directory in your Designer install (see detailed instructions in the original blog entry).

I’ve found this tracing technique to work well in cases where form performance may be impacted by lots of events and scripts. Especially in cases where there are dependency loops (see the discussion under "Dependency Loops" in this blog entry).

Note that the macro does not add trace to the indexChange event. I ran into a scenario where this combination produced some bad behavior, so I elected to exclude that trace. Hopefully this problem gets sorted out in the next version of Designer.

Many of you are already aware that the JavaScript debugger inside Acrobat partially works with XFA forms. You can turn on the debugger in the Acrobat preferences:

When debugging is turned on, the Acrobat JavaScript console will allow you to navigate to objects in the XFA hierarchy and set break points in a subset of events:

BUT (and you knew there was a "but" coming) there are some serious limits:

Using the debugger with XFA forms is not officially supported

We cannot debug script objects

Storing break points doesn’t seem to work. This makes it hard to debug an initialization script (unless you force your form to do an xfa.form.remerge())

On the other hand, even with these limits, many of you will find the debugger useful.

There’s another "but" and this is really my main reason for posting on this topic. You need to turn off the debugger when you’re not using it.

There are two reasons:

Exception handling. Note the first dialog above has the option to break when an exception is thrown. If you’re not expecting it, this option can circumvent normal JavaScript processing. e.g. it is good JavaScript practise to use try/catch to quietly handle error conditions. With try/catch we can detect an error condition and allow the application to continue uninterrupted. But when the debugger has been told to break on exceptions, the quiet thing doesn’t happen any more. The form stops and you get a message to the console.

Performance. Do you remember the game of life form? I used it to illustrate some performance characteristics in this blog entry. The form has a couple of buttons with around 30,000 lines of JavaScript in their click event. Under normal circumstances, these scripts finished in between 4 and 10 seconds. With the JavaScript debugger enabled, these took … 10 minutes. That’s right, roughly a 50 times slow down in script performance with the debugger enabled.

The Inevitable Question

When will XFA have proper JavaScript debugging support? This is a hard question to answer. But it gets asked a lot. Believe me when I say that we’ve taken a run at this problem many times in the past. But the fact remains that there are some substantial technical barriers that are holding us back.

Hey, it has been a while since I wrote a blog entry. I just spent a week visiting a customer site and getting familiar with a major form deployment. Lots of learning happened in both directions. And lots of stuff for me to report back on at this blog — starting today.

Images and PDF sizes

We do not support linked images from PDF files. Remember, the "P" in PDF means "Portable". If the file has references to external content, it isn’t exactly portable. Another issue with linked images is that a PDF file with an external reference cannot reliably embed a digital signature.

In your XFA template you have the choice to link or embed images. But the final PDF will always have the images embedded — even if the template referenced them with a link.

Given that, what factors do you consider when you choose whether to embed or link images in your XFA templates? Choosing between the two has size implications in the generated PDF. Here is a bit of detail on how they get processed:

Embedded Images

Embedded images are stored in the XFA template XML as a base64-encoded value. As with all base64 encoded binary data, the size expands by a factor of 4/3. When Adobe Reader renders a dynamic form, it extracts the image data from the template and draws it to the screen. If a template embeds the same image multiple times, we carry all copies of the duplicated image in the template and consequently, inside the PDF.

Linked Images

When we create a PDF from an XFA template with linked images, the images are stored in a PDF resource area. We create an indexed name for the image based on the image file reference. There are two efficiencies gained here:

The images are stored in binary format — not base64 encoded

Multiple references to the same image are reconciled to a single copy of that image in the PDF

So clearly, if you are including images, and especially if you are including multiple copies of the same image in your XFA form, your final PDF will be smaller when you include those images as links instead of by embedding.