Filter Component

Filter Component

Filter rows from the input to pass a subset of rows to the next component based on a set of conditions.

The function of this component is similar to the 'Data Filter' property found on data staging ('Query') orchestration components.

Properties

Property

Setting

Description

Name

Text

The descriptive name for the component.

Filter Conditions

Input Column

The name of the column from the input component to which the filter is applied.

Qualifier

Is (default) or Not describes if matching rows in the data should be included or excluded.

Comparator

Less than (default): Value in the Input Column must be less than that specified in the Value Column.
Less than or equal to: Value in the Input Column must be less than or equal to that specified in the Value Column. Equal to: Value in the Input Column must be equal to that specified in the Value Column. Greater than (default): Value in the Input Column must be greater than that specified in the Value Column. Greater than or equal to: Value in the Input Column must be greater than or equal to that specified in the Value Column. Like Case sensitive match of the value in the Input Column and the pattern in the Value Column.

Null: Checks if the value of the Input Column is the sql "null" value. Blank: Checks if the value of the Input Column is an empty string. Null or blank: Checks if the value of the Input Column is either an empty string or the sql "null" value.
Less than (default): Value in the Input Column must be less than that specified in the Value Column.
Less than or equal to: Value in the Input Column must be less than or equal to that specified in the Value Column. Equal to: Value in the Input Column must be equal to that specified in the Value Column. Greater than (default): Value in the Input Column must be greater than that specified in the Value Column. Greater than or equal to: Value in the Input Column must be greater than or equal to that specified in the Value Column. Like: Case sensitive match of the value in the Input Column and the pattern in the Value Column. ILike: Case insensitive match of the value in the Input Column and the pattern in the Value Column.Similar to: Matches the value in the Input Column with a SQL standard regular expression pattern in the Value column.Null: Checks if the value of the Input Column is the sql "null" value. Blank: Checks if the value of the Input Column is an empty string. Null or blank: Checks if the value of the Input Column is either an empty string or the sql "null" value.

Value Column

The value entered by the user to use for comparison.
If this value is enclosed in "double quotes" as so, then it will be treated as a column
in the generated SQL, which will allow for comparing columns.

Combine Conditions

AND or OR

When multiple Filter Conditions are present they can be separated by AND or OR. AND means all of the Filter Conditions must be true, OR means any of the Filter Conditions must be true.
Hint: For more complex conditions involving grouping several AND and OR conditions together consider creating a new field using a Calculate component that specifies your complex condition and then use that new field in a simple Filter Condition.

Strategy

Generates a where clause.

Example

In this example, we have a huge volume of flight data and we wish to take just a subsection of that data for transformation.

In this particular case, we're only interested in analysing flights that occurred during the second half of each month in 1999, between airports RSW and DFW. So, it will be useful to first reduce the dataset to only data rows that fulfill our conditions. Below, we show a short Transformation job for the data filtering.

Data is taken from a table using the Table Input component. This links to the Filter component before being output using the Table Update component. The Filter component properties are set up as follows.

If we sample the input data (below) we can see there are well over five million rows - most of which will be irrelevant to our analysis.

Thus we set up the following filters: The first filter ensures we only take flights dated during the year 1999. The next two filters narrow the origin and destination airports to RSW and DFW, respectively. Finally, the fourth filter ensures that our data is from days of the month after day 15.

It is important to note that we have set the 'Combine Conditions' property in the Filter component to 'AND' so that all filter conditions must be met by a given row. Applying this filter should drastically reduce the number of rows in our data set, which can be confirmed by sampling the output.

As you can see, we've gone from more than five-and-a-half million rows of data to just seventy-six rows of data.