info: an array of info messages, each consisting of a return code and a (non-localized) text briefly describing the info

warning: an array of warning messages, each consisting of a return code and a (non-localized) text briefly describing the warning

error: an array of error messages, each consisting of a return code and a (non-localized) text briefly describing the error

The "direct" block"

The "direct" block is a more or less direct translation of PDF syntax into JSON syntax. For requesting the Catalog root object, "$.direct.Root:true" must be used, for requesting entries in the trailer dictionary, such as Info or ID, use "$.direct.Info:true" and "$.direct.ID:true"

For stream dictionaries, the stream portion will be omitted.

In the Quick Check configuration, specific parts of the PDF data structure can be requested by using the respective entry names in a concatenated path expression. For example, in order to request the ExtGState dictionary for pages in a PDF, the following filter expression could be used (which only works if the Page objects are direct children of the Kids element):

$.direct.Root.Pages.Kids.Resources.ExtGState:: true

PDFs can include pages in very different ways – either as Kids entries directly under the Pages key. But like in real life, Kids can have Kids, and these again can also have Kids. This makes it very unpredictable to actually locate where pages of interest can be found in the PDF data structure. Of course one could simply retrieve any data below the top most Pages entry – but this create massive output for any not so small multi-page PDF files, and would also require undue burden on JavaScript code that would have to parse and interpret the collected data.

Future versions of pdfToolbox will offer more elegant ways to walk nested trees of arrays, but for now the current approach has to be accepted as a known limitation.

Currently there is no mechanism to retrieve data inside stream objects. Usually this is not much of a problem – Quick Check is not the right approach to form example retrieve raw image data. There is at least one type of data that exists in stream objects: XMP metadata. In some scenarios it might be useful to be able to retrieve raw XMP metadata in the context of using Quick Check. For now this is not supported. Depending on user demand, we may add extended capabilities in future versions of pdfToolbox. If this is of interest to you, please get in touch via our support email address, support@callassoftware.com, and please make us understand why this would matter to you.

The "aggregated" block

The "aggregated" block contains several sub-divisions that reflect aggregated information from various areas, such as color or font resources, transparency and overprint, etc.

In principle, most of the information provided here could also be retrieved by accessing data structures using the "direct" block mechanism, but that would require solid understanding of the underlying data structures and also sometimes quite complicated processing. The "aggregated" block offers such information in a ready to use fashion. Still, if a some information is needed beyond what "aggregated" offers, it might be feasible to retrieve such information from processing "direct" data structures.

The best approach to find out how "aggregated" can be used is to request all data under "aggregated", find the area needed, and then build the configuration filter expressions by following the 'path' to that area (see Quick Check configuration syntax for a detailed description about how to configure Quick Check).

Areas inside "aggregated" block

"bookmarks" area

Creates output reflecting bookmarks in a PDF file (called Outline in the PDF syntax). A flat array contains a list of all bookmarks found in the PDF file. The nesting level of each bookmark is indicated by the level data element, reflecting the nesting of bookmarks as typically displayed in a PDF viewing program. The main piece of information actually conveyed is the text of the bookmark. The bookmarks arrays does not reflect the actual PDF data structures in any way.

The following configuration

$.direct: false
$.aggregated: false
$.aggregated.bookmarks: true

when used for the PDF file for the ISO 32000-1 standard results in the following Quick Check output:

"embeddedfiles"

Creates output reflecting the files embedded in the PDF (as represented in the EmbeddedFiles name tree of the PDF file). The following properties are reported for each embedded file:

name: file name of the embedded file

created: creation date of the embedded file (typically based on the creation of that file in the file system at the time the file was embedded into the PDF)

last_modified: last modification date of the embedded file (typically based on the last modification of that file in the file system at the time the file was embedded into the PDF)

bytes: file size of the embedded fiel in Bytes. Note: embedded files are typically compressed inside the PDF file, and use less space inside the PDF file than they would once extracted again to a file system.

The following configuration

$.direct: false
$.aggregated: false
$.aggregated.embeddedfiles: true

when used for a demo PDF file results in the following Quick Check output:

For each page it is also possible to request summary information about color usage, and information about color resources and font resources referenced by that page's Resources dictionary and any Resources dictionaries of Form XObjects referenced on that page

Note: resources referenced by a page's Resources dictionary – or those referenced by form XObjects on that page – do not actually have to be used by that page or its form XObjects.

"pages.page.resources.color"

Creates information about color usage for a page, and information about color resources referenced by that page (or formXObjects on that page).

The "color" area contains three sub-areas:

summary: a list of entries where each represents a certain aggregated aspect of color usage; for example, if DeviceCMYK has a value of 0, there is no graphics object on the page that uses DeviceCMYK (but there might be a graphics object that uses DeviceN with the colorants Cyan, Magenta, Yellow, Black, or ICC based CMYK). In comparison, Any_CMYK reports any use of CMYK, whether DeviceCMYK, ICC based CMYK, DeviceN with one, several or all of the Cyan, Magenta, Yellow, Black colorants, or Separation color space Cyan, Magenta, Yellow, Black.

colorspaces: a list of entries (only those are shown that are applicable) reflecting the presence of certain color spaces.

spotcolors: a list of entries for spot colors used, including their name and the alternate color space used. together with associated color values for 100% tint value of the spot color

"summary"

All entries under summary have as its value an integer reflecting how often the respective type of color is used. The meaning of each entry under summary is described below:

Any_CMYK: any use of the colorants Cyan, Magenta, Yellow or Black, whether by means of DeviceCMYK, ICC based CMYK, DeviceN with one or several of the four colorants or Separation color spaces using one of the four colorants

DeviceCMYK: use of DeviceCMYK

ICCBased_CMYK: use of ICC based CMYK

Any_RGB: any use of RGB, whether by means of DeviceRGB, ICC based RGB or CalRGB

DeviceRGB: use of DeviceRGB

CalRGB: use of CalRGB

ICCBased_RGB: use of ICC based RGB

Calibrated_RGB: use of CalRGB or ICC based RGB

Lab: use of Lab colorspace

ICCBased_Lab: use of ICC based Lab

Any_Gray: use of any gray colorspace, whether DeviceGray, ICC based gray or CalGray

Any_Calibrated: use of any calibrated color space whether any ICC based color space, Lab, CalGray or CalRGB

Any_Spot: use of any spot color, whether Separation color space with a colorant name other than Cyan, Magenta, Yellow, Black, None or All, or DeviceN with at least one colorant with a colorant name other than Cyan, Magenta, Yellow, Black or None

Not_DeviceCMYK: any use of a color space that is not DeviceCMYK

Not_DeviceCMYK_Or_Spot: any use of a color space that is not DeviceCMYK or a spot color

Smooth_Shades: use of smooth shades

Pattern: use of patterns

Any_Separation: use of any Separation color space, whether a spot color, or a colorant name that is Cyan, Magenta, Yellow, Black, None or All,

Separation_All: use of Separation All (also often referred to as registration color)

Separation_None: use of Separation None (any object using Separation None will not be rendered)

Separation_Cyan: use of Separation Cyan

Separation_Magenta: use of Separation Magenta

Separation_Yellow: use of Separation Yellow

Separation_Black: use of Separation Black

Separation_Any_Of_CMYK: use of Separation color space with any of the colorants Cyan, Magenta Yellow or Black

Separation_Any_Spot: use of a Separation color space for a spot color

Any_DeviceN: use of DeviceN

DeviceN_Any_Of_CMYK: use of DeviceN where at least one colorants has a name that is Cyan, Magenta Yellow or Black

DeviceN_All_Of_CMYK: use of DeviceN where all four colorants Cyan, Magenta Yellow or Black, either with or without additional colorants (spot colorants or None)

DeviceN_All_Of_CMYK_And_Spot: use of DeviceN where all four colorants Cyan, Magenta Yellow or Black, but also at least one spot color (one or more colorants None might also be present)

DeviceN_All_Of_CMYK_No_Spot: use of DeviceN where all four colorants Cyan, Magenta Yellow or Black, but no spot color (one or more colorants None might also be present)

DeviceN_All_Of_Spot: use of DeviceN where all colorants are spot colors (one or more colorants None might also be present)

"colorspaces"

Under colorspaces two entries can be found:

colorspace: an array listing the color spaces used

length: the number of color spaces listed in the colorspace array

The entries in the colorspace arrays can be any of the following:

DeviceCMYK: DeviceCMYK was used at least once

ICCBased: an ICC based color space was used at least once

ICCBased_CMYK: an ICC based CMYK color space was used at least once

Separation: a Separation color space was used at least once

Separation_Spot: a Separation color space with a spot colorant was used at least once

Separation_CMYK: a Separation color space with a colorant whose name is Cyan, Magenta, Yellow or Black was used at least once

Separation_All: a Separation All (registration) color space was used at least once

Separation_None: a Separation None color space (graphics object using this color space will not be rendered) was used at least once

DeviceN: DeviceN was used at least once

DeviceN_SpotOnly: DeviceN using spot color but none of the CMYK colorants was used at least once

DeviceN_Spot_CMYK: DeviceN using a combination of spot and CMYK colorants was used at least once

DeviceN_CMYK: DeviceN using only CMYK, but not spot colorants was used at least once

"spotcolors"

Under spotcolors two entries can be found:

spotcolor: an array listing the spot colors used

length: the number of spot colors listed in the spotcolor array

The entries in the spotcolor array list each spot color using the following entries:

name: name of the spot color

alternatespace: alternate space for the spot color; can be DeviceCMYK, ICCBased_CMYK, DeviceRGB, CalRGB, ICCBased_RGB, Lab, DeviceGray, CalGray, ICCBased_Gray or undefined; undefined occurs in cases where a DeviceN color space includes a spot color, but no alternate space for that spot color is provided (and only for the DeviceN color space as a whole an alternate space is provided)

alternatevalues: an array of values that when used on the background of the alternate space emulate the appearance of a 100% tint value of the spot color

"pages" – aggregated information about page geometry boxes

Under the pages area, not only the list of pages in the form of the page array is available, but also several other sub-areas that cover various aspects of aggregated information about page geometry for the pages in the PDF document

For each type of page geometry box (e.g. CropBox or TrimBox), a substructure is used to convey information about all page geometry boxes of that type in the given PDF document (showing a TrimBox entry as an example):

num_portrait: number of occurrences the page geometry box reflect a portrait orientation

num_square: number of occurrences where the width and the height of the page geometry box are the same

num_landscape: number of occurrences the page geometry box reflect a landscape orientation

width_min: smallest width for this page geometry box in the current PDF document

width_max: largest width for this page geometry box in the current PDF document

height_min: smallest height for this page geometry box in the current PDF document

height_max: largest height for this page geometry box in the current PDF document

For TrimBox and CropBox page geometry types, there are two additional sub-areas:

effective_TrimBox and

effective_CropBox

with more or less the same information as the other variants, except that the entries for the smallest and largest dimensions – width_min, width_max, height_min and height_max – take the page scaling factor (UserUnit) into account.

The following QuickCheck filter expressions:

$.direct: false
$.aggregated: false
$.aggregated.pages: true

when used for a demo PDF file results in the following Quick Check output:

Quick Check usage examples

/*
Sample configuration for a Quick Check Process Plan step
- requests document info for the document and TrimBox data for all pages
- adds the Quick Check result to app.vars.quickcheck_sample
- the name of the variable does not have be 'my_config', any name can be used
*/
var my_config = {
app_vars_sub_path: "quickcheck_sample",
quickcheck_config :
[
"$.direct: false",
"$.direct.Info: true",
"$.aggregated: false",
"$.aggregated.page.pages.TrimBox: true "
]
}
my_config

Same sample configuration as above in the format used for Quick Check in the command line:

The large amount of data is mostly caused by using "$.direct:true" which collects all data structures from the PDF syntax (skipping stream data). Typically using "$.direct:true" is not practical. Instead it is recommended to focus on options in the "aggregated" block.