Parser - XML Printer (http://xmlprinter.com/support/parser)

Parser

Parser is a built-in XML Printer utility that filters the output XML text elements based on relative or absolute location. For example if you’re using XML Printer to archive printed invoices, you might want to add some kind of post-processing of the invoices. In that case you need to extract the actual data from the invoice and work only with that data. Parser will help you with that.

The basic idea behind document types and Parser in general is that you may have documents that are pretty much alike (e.g. an invoice from a specific supplier or system). To filter the data out, you will create a document type called (for example) InvoiceFromSupplier1 and add a sample document to that type. Then you tell XML Printer what fields to search for and where to search for them by selecting the fields on the sample document. XML Printer will then be able to learn and recognize other similar invoices from that supplier.

Apart from invoice processing, other use cases may include delivery reports, bank account statements or any kind of formular data.

Nomenclature used in this document

Document is output from XML Printer that gets parsed. Document contains pages with elements (text, image, path) from which only text elements are interesting for us.

Element is a text block, image or a path on a document page. Only text elements are processed in parsing, all others are ignored.

Document type is a definition of a document layout (e.g. one type of documents printed from a system is considered a document type). Parser learns to recognize document types and then tries to assign a document to a specific document type.

Field is a node in a document type (i.e. a document type is a tree of fields). The term field is used in this document in two meanings. One is a general field (might be a normal field, a structure or a formula), the other is the normal field in its natural meaning. The exact meaning should be deductible from the context.

Creating a new Document type

Configuring Parser

Apart from creating the new document type, you have to tell XML Printer, to actually use the document type on all printed documents (or only some, depending on your setup). Navigate to XML Printer Settings page (click here for easy access) and in the lower part (Processing section) add a processing step action Parse. A new combo-box appears with a list of existing document types to pick from. If you want XML Printer to pick the document type that matches the printed document the best, you can select “(autoselect)”. Don’t forget to click the Add button so that you don’t lose any changes.

Fields

Every piece of data you want to extract from the XML is represented as a Field (e.g. invoice number, supplier name, etc.). To add a field, simply right-click the document type name on the left side and select Add field…:

The name of the field must contain only letters, numbers, hyphen and underscore, other characters will not be accepted. Let’s ignore structures and formulas for now (we’ll discuss them later), as well as absolute/relative positioning.

Default value is the value that will be included in the output XML in case none of the text elements matches the field enough to pick it. This is useful if you need some output for optional fields or for static fields (fields having constant value on every document). Default value may contain a function, for example “REPLACE({_self}, ‘i’, ’1′)” (see below).

Optional flag suggests to the Composer whether or not to force you to select the field on the document. It has no effect on the actual parsing, it is only used to determine if all fields have been selected on the sample documents.

Temporary fields might be used as a mid-step in either complicated formulas or when extracting more information from a single text element (you create one temporary field and other two or more formular fields that will user the value of the temporary one). All temporary fields are removed from the output document prior to saving the XML.

The function text area is a place for placing formulas to evaluate. An icon will tell you if the entered formula is valid or not.

In case of a field with corresponding text element, default value and a formula, the order of evaluation is following: first the result is set to the content of the matching text element, then the default value is processed (existing value is either overwritten with static value or the result of the formula) and then the formula is evaluated setting the final value as the result of the formula.

Structures

Structures are a special kind of fields that group other fields. For example, you may want to create a structure named Supplier with sub-fields Name, Address, etc. Structures are also very handy when it comes to tabular data (e.g. invoice items), where it groups corresponding elements in lines. Other reason for using structures is readability of the document type structure and parsed data.

Creating a structure is very similar to creating a field, except you select the Structure radio button in the field properties. When adding sub-fields into the structure, you have to right-click the structure, instead of the tree root.

Selecting fields on a document

Once you have the basic structure of the document type created, you need to tell XML Printer where in the document to look for the fields. This is done by assigning fields to actual text elements on the sample document(s).

Right-click a field in the document type tree and select Select element. Composer will ask you to click on the correct element in the document:

Once you click the element, it gets marked as belonging to the field:

This way you can assign all fields one-by-one.

Assigning structures (repeating structures especially) is a bit more complicated, as XML Printer needs to know which elements belong to the same group (line). Therefore you’re not able to select elements for field inside a structure, but need to select all the fields within the structure. Right-click the structure name and pick Select elements. You’ll then get a prompt:

After you select the element for the first field, you’ll be immediately prompted for next one, until you get to the last one, after which you get a choice to stop the selection:

This way you can select elements for the structure multiple times and stop using the Done button. If you have an optional field in the structure and you don’t want to select an element for that field, the Skip button will be enabled for that field.

Removing a selection

You can remove any existing field-element binding by right-clicking the element and picking Remove field binding.

Absolute/relative references

(This part of documentation is still to be written)

RSS Feed

To stay in touch with XML Printer updates and news, subscribe to our News RSS Feed.

Latest news

XML Printer 3.6.2 released

(20.02.2012)

Finally, after 1.5 years of waiting, we have released an update to XML Printer 3.6.

It has many reliability issues fixed, along with some new features. XML Printer still remains free for personal and non-commercial use.