Data processing rules

Data processing rules are a powerful means of cleaning up / standardising metadata. You can furthermore create rules to apply search filter codes, to apply file properties based on the metadata that you receive – and to automatically enrich your metadata by use of your vocabulary.

Processing conditions

By configuring processing conditions, you can specify which files are subject to being processed and when. You may want to create rules that are processed only for new files, only for changed files – or always.

You can create conditions so that your rule is processed only for files from a specific supplier or supplier group, or for all files except files from specific suppliers. It’s also possible to create metadata based conditions. E.g. to process rules only if certain values are found in certain fields.

Data refinery

Use this action to remove unwanted characters from a field, to remove everything before or after any of a list of specified separators, to remove web and e-mail addresses, to remove certain words and/or to find and replace words.

For example, if the credit field for files from supplier XYZ always contains a copyright sign, then his name followed by a slash, and then a company name – the you can simply create a rule that will ensure that only the photographer name stays. In casu, a rule that removes the copyright sign and that then strips everything after (and including) the slash.

Filter rules

Create rules that will apply filter codes that you can use for your custom search filters. If for example you have created a custom search filter for “Gender”, then you can set up data-processing rules to assign the filters to your files. E.g. if the words man, men, boy, boys, guy or male are found in the keywords field, then apply the filter code for Male. You can also apply default filter codes (applied if none of your rules result in a filter code).

Create rules to look up terms in your controlled vocabulary and to add synonyms to either the field that’s being processed, or to a different field. You can choose to look up preferred terms only, or to look for terms that are defined as synonyms. You can also add the broader term of found words. Rules to extract words from fields can also be created. For example to automatically add keywords from the caption if the words are found in the vocabulary. You can limit the scope by selecting one or more levels in the vocabulary hierarchy.

Create processing rules to find terms in your vocabulary for automated translations. You can replace translated terms in the source field, or you can output the translated term to a different field. Synonyms can be added automatically as well. You can limit the scope for looking up terms by selecting one or more levels in the hierarchy of your vocabulary. The vocabulary has functions to export terms for off-line translation that you can then import again with job server.

Processing conditions

Processing conditions are created to limit execution of a rule to certain files. This can be done based on the metadata and based on the supplier or supplier group to which a file is linked (note that supplier and supplier group are the generic terms used, think for example photographer and agent.

The above screen shot shows how this rule is limited to the supplier group (agent) PUBLIS only. Note that the radio button Included is checked. This means that only the specified supplier groups are included. If the radio button Excluded is checked, then all supplier groups are excluded with the exception of the ones specified in the list. The above screen shot also shows that two photographers are excluded. So all files of all suppliers belonging to the group PUBLIS are included, but not files from the two photographers in the list.

You can furthermore create metadata based conditions on the Processing conditions tab sheet. If you create multiple conditions here, that means that all of the conditions must be met. Let’s say you want to remove web and e-mail addresses from the captions of files that are entering the system from PUBLIS. You can then create a condition that tests if the caption is not blank. This will save processing time because there’s no need to run the action if the caption doesn’t contain any text.

Filter rules

The above screen shot shows a simple rule configuration to create custom filters for gender. These filters can then be used on the client facing pages to let users filter their search results for files about women or men only.

The first rule looks for the words wom[a?e]n,girl[s],lad[y?ies] in the keywords field. The letters between brackets are shortcodes. I.e. won[a?e]n is the same as writing woman, women. This particular string will expand to: woman,women,girl,girls,lady,ladies. So if the keywords field has any of these words, then the filter code GFF will be added. Infradox XS filter codes always start with the @ sign and end with the # sign. You don’t have to specify these, this is done automatically.

The next rule looks for the words man,men,boy,boys,guy and guys – and if any of these words is found in the keywords field, then the filter code GFM is added.

At the top of the tab sheet you see that any existing filter codes starting with GF will be removed first. This is used to make sure that old filter codes are removed when the metadata of a file is changed and therefore reprocessed. For example, a file may have initially gotten the filter code GFM (for male) assigned, but during an update the words that triggered this may have been removed. So the filter code should be removed as well.

Note that when you use contains (or does not contain) to find words in the metadata fields keywords, caption, category or subcategory – this function will look for whole words only. So if your keywords field has boy,girlfriend,park,dog,trees and you are testing for the value girl, it will not match.