JDBC Lookup

The JDBC Lookup processor uses a JDBC connection to perform lookups in a database
table and pass the lookup values to fields. Use the JDBC Lookup to enrich records with
additional data.

For example, you can configure the processor to
use a department_ID field as the column to look up department name values in a database
table, and pass the values to a new department_name output field.

When a lookup results in multiple matches, the JDBC Lookup processor can return the first
matching value or return all matching values in separate records.

When you configure JDBC Lookup, you specify connection information and custom JDBC
configuration properties to determine how the processor connects to the database. You
configure the SQL query to define the data to look up in the database, specify the
output fields to write the lookup values to, and choose the multiple match behavior. You
can configure the behavior for when the lookup returns no values and optionally
configure a default value for the same situation.

You can configure the processor to locally cache the lookup values to improve
performance.

To use a JDBC version older than 4.0, you can specify the driver class name and define a
health check query.

When you monitor a pipeline that includes the JDBC Lookup processor, you
can view stage statistics about the number of queries the processor makes and the
average time of the queries.

Installing the JDBC Driver

Before you use the JDBC Lookup, install the JDBC
driver for the database. You cannot access the database until you install the required
driver.

Lookup Cache

To improve pipeline performance, you can
configure the JDBC Lookup processor to locally cache the values returned from a database
table.

The processor caches values until the cache reaches the
maximum size or the expiration time. When the first limit is reached, the
processor evicts values from the cache.

You can configure the following ways to evict values from
the cache:

Size-based eviction

Configure the maximum number of values that the processor caches.
When the maximum number is reached, the processor evicts the
oldest values from the cache.

Time-based eviction

Configure the amount of time that a value can remain in the cache
without being written to or accessed. When the expiration time
is reached, the processor evicts the value from the cache. The
eviction policy determines whether the processor measures the
expiration time since the last write of the value or since the
last access of the value.

For example, you set the eviction policy to expire after the last
access and set the expiration time to 60 seconds. After the
processor does not access a value for 60 seconds, the processor
evicts the value from the cache.

When you stop the pipeline, the processor clears the
cache.

Retry Lookups for Missing Values

When you enable local caching, the processor also caches the configured default value
when a lookup for a given column fails and a default value is defined for that column in
the Column Mappings. The processor then always returns the default value for the column
to avoid unnecessary lookups.

You can configure the processor to retry lookups for known missing values by enabling the
Retry on Cache Miss property. Configure the processor to retry lookups when the lookup
table might be updated as the pipeline runs.

For example, if you expect that new values will be inserted in the table as the pipeline
runs, you’d want to configure the processor to retry the request rather than returning
the cached default value.

Note: If a lookup for a given column fails and a default value is not configured for that
column, then the processor handles the record based on the Missing Values Behavior
property.

Monitoring a JDBC Lookup

When you monitor a pipeline that includes the JDBC Lookup processor, the Summary tab
displays statistics about the queries that the JDBC Lookup processor performs. Use the
statistics to help identify any performance bottlenecks encountered by the
pipeline.

When you select the
JDBC Lookup processor in the canvas while monitoring a running pipeline, the Summary tab
displays the following stage statistics:

The Select Queries Meter displays the number of queries that the processor makes per
second. The Select Queries Timer displays the average amount of time that the queries
take to complete.

Use to override the default column to field mappings. By
default, columns are written to fields of the same name.

Enter the following:

Column Name - Name of the database column that
contains the lookup value. Enter a column name or
enter an expression that defines the column
name.

SDC Field - Name of the field in the record that
receives the lookup value. You can specify an
existing field or a new field. If the field does
not exist, JDBC Lookup creates the field.

Default Value - Optional default value to use
when the query does not return a value for the
field. If the query returns no value and this
property is not defined, the processor handles the
record based on the Missing Values Behavior
property.

To enter a default value for the Date
data type, use the following format: yyyy/MM/dd.
To enter a default value for the Datetime data
type, use the following format: yyyy/MM/dd
HH:mm:ss.

Data Type - Data type to use for the SDC Field.
Required when you specify a default value. The
processor uses the database column data type by
default.