Given is a list of TIFF images, some of them are compressed as "Group 4 Fax" TIFF images. The compression causes issues in some application contexts, therefore it might be required to remove the compression from a large TIFF images input file set.

The diagram below provides a global overview on the components that are used in the workflow.

Notes in order to understand the diagram

The green boxes are operating nodes that apply a characterisation or file format conversion to a file.

The purple boxes are input parameters on the top of the diagram and output results on the bottom of it.

The violette boxes, like "readFile, Read_Text_File, Flatten_List, Get_Image_From_URL" are local services that are available in Taverna by default, so they can be generally applied in a wide variety of data analysis and conversion contexts.

The dark blue boxes are so called splitters that create kind of input slots where parameters can be connected to out of XML descriptions of these parameters that are available in a Web Service Description (WSDL), for example.

Finally, the brown boxes (URL2List, Beanshell) are so called Beanshells which are customisable components that can define their own input and output parameters and then process them using a Java Scripting language (also external Java libraries can be used by making them available to Taverna and defining the dependency on the library for the Beanshell).

The workflow has various parameters in order to configure the workflow run:

The "Get_list_of_images" component has a surrounding box because it is a workflow that is used as a nested workflow in the containing workflow. Parameters of the workflow"url_to_textfile_with_image_urls", a textfile containing URL references to the images that should be processed (kind of a batch process)

"csresult_regex" which is a regular expression that is used to identify the compression scheme, for example the expression .*Group 4 Fax.* is used to find the items where FITS identified the compression scheme T6/Group 4 Fax.

"convert_compression" is an integer number that indicates the compression scheme that should be applied when converting the images that have been identified according to the regular expression just mentioned. More concretely, 0 means to remove the compression, and further values from 1 to 6 mean None (0), LZW (1), PACKBITS(2), DEFLATE (3), JPEG (4), CCITT G3 Fax (5), CCITT G4 Fax (6)).

"convert_numcolors" is the number of colours that the target image should have.

Workflow execution

For the batch processing, the workflow takes a URL reference to a textfile that contains a list of URL references to the TIFF image files as input:

Taverna's list handling then hands over these images one by one to the FITS operation "characteriseFile" which tries to identify the file format and some file properties. This means that it creates an XML description of the identification result which is based on a set of identification tools that FITS uses (FITS wraps e.g. Droid, Jhove 1 amoung others and normalizes the characterisation output).
The „Read_Text_File“ component reads the XML identification result and uses an Xpath expression in order to extract the compression scheme property value:

In the example setting, most of the images have the compression scheme value „uncompressed“, and some have the value „T6/Group 4 Fax“.

The intention is to identify those images that have the compression scheme value „T6/Group 4 Fax“, therefore the „Beanshell“ component is used to determine the images that have this property.

The Beanshell component has the characterisation results list charactres_in_list and the images list tiff_images_in_list as input and picks out those where the regular expression csresult_regex matches, e.g. the expression .*Group 4 Fax.* can be used.

This is the Java code snippet that is used in the beanshell in order to filter out the „Group 4 Fax“ compressed images.

The output list of the Beanshell component then only contains those images that have the Group 4 Fax“ compression scheme, and those images are handed over to the operation convertTIFFtoTIFFByURL which is a conversion service based on „The GIMP“ image manipulation tool. This service is configured by the convert_compression and convert_numcolors parameters. In this scenario, convert compression is set to 0 (NONE) and the number of colours is set to 2 (bitonal).

The GIMP service uses a java wrapper which executes GIMP on the command line.

In order to execute the command, the Java class ProcessBuilder is used which takes a string array in order to create the command.
The following array of command strings is an example for a GIMP command that can be handed over to the ProcessBuilder which can then be used to execute the command.

where /usr/bin/gimp is the gimp executable, -b is used for starting the command in batch mode, -i means that we do not require the GIMP interface, -d means that we do not need the tools. Then the "convertTIFFtoTIFF" script is called with 4 parameters, the first two being the input and output files, then the number of colours and the compression scheme to be used (0 := NONE). The JAVA wrapper cares about handing over the parameters from the workflow layer (Taverna) down to the fu-script command execution layer. Finally gimp-quit 0 exits the batch process.

The following fu-script (GIMP scripting language) shows the source of theconvertTIFFtoTIFFscript which does the actual image conversion: