Extended Tutorial

The extended tutorial builds on the basic tutorial, using an additional set of stages to
perform some data transformations and write to the Trash development destination. We'll also
use data preview to test stage configuration.

You can write to a real destination instead of the Trash destination. The Trash destination
allows you to run the pipeline without writing to a real destination system.

The extended tutorial continues with the following steps:

Configure a Field Type Converter to convert field types.

Manipulate data with the Expression Evaluator.

Use data preview to test and update pipeline configuration.

Complete the pipeline with the placeholder Trash destination.

Reset the origin and run the extended pipeline.

Convert Types with a Field Type Converter

Since the sample data is read from a file, the fields are all String. Let's use a
Field Type Converter to convert some data types.

Let's convert datetime fields to Datetime, and convert monetary fields as well as
the longitude and latitude fields to Double.

Add a Field Type Converter to the canvas.

To route all data from the pipeline through the new stage, connect the Field
Masker and the Expression Evaluator to it as shown:

Click the Conversions tab.

Convert fields with datetime data to Datetime as follows:

Field Type Converter Property

Datetime Conversion

Fields to Convert

Click in the field. From the list of fields, select the
following fields:

/dropoff_datetime

/pickup_datetime

Convert to Type

Datetime

Date Format

Date format used by the data.

Select the following
format: yyyy-MM-DD
HH:mm:ss.

To convert fields that contain monetary information to Double, click the
Add icon and configure the properties as follows.

Use the defaults for properties that aren't listed:

Field Type Converter Property

Double Conversion

Fields to Convert

Click in the field and select the following fields:

/fare_amount

/dropoff_latitude

/dropoff_longitude

/mta_tax

/pickup_latitude

/pickup_longitude

/surcharge

/tip_amount

/tolls_amount

/total_amount

If a field doesn't display in a list, you can type
in the field name and use the Tab or Enter key to
complete the action.

Convert to Type

Double

The pipeline and Field Type Converter should look like this:

Manipulate Data with the Expression Evaluator

Now we'll use an Expression Evaluator to create pickup and dropoff location fields
that merge the latitude and longitude details. We'll also calculate the basic trip revenue
by subtracting the tip from the total fare.

Add an Expression Evaluator to the canvas and connect
the Field Type Converter to the stage.

On the Expressions tab, click the
Add icon, and then enter the following information to
generate the pickup and dropoff location data:

Notice the fields created by the stage - dropoff_location,
pickup_location and trip_revenue - are highlighted in green.

Though it isn't
necessary for these calculations, let's see how you can edit preview data to test
stage configuration:

In the first input record, in the Input Data column, click on the Pickup Latitude
data, 40.730068, add a negative sign before the data. Hit
Enter or click outside the data.

As shown below, the
edited input data becomes red to indicate a change.

To test the change, click the Run with Changes icon.

The
Data Collector runs the preview with the change. Notice the corresponding output record now
has -40.730068 for both pickup_latitude and pickup_location.

You can see how this functionality might come in handy when you want to
test some cases that didn't come up in the preview data.

To revert that change, click the Revert Data Changes icon:
.

This icon reverts changes to preview data.

Note:Revert Data Changes does not revert changes to
stage or pipeline configuration. Manually revert configuration changes that you
don't want to keep, as we did earlier in the tutorial.

When you're done exploring the preview data, click Close Preview.

Write to Trash

To wrap up the extended tutorial, let's use the Trash destination as a temporary
placeholder.

The Trash destination deletes any records that pass to it. This allows you to test a
pipeline without writing data to a production system.

If you prefer, you can use the Local FS destination to write to file as we did
earlier in the tutorial, or you can use another destination to write to a
development destination system available to you.

The Trash destination requires no configuration, so just add it to the canvas and
connect the Expression Evaluator to it:

Run the Extended Pipeline

Now that the extended pipeline is complete, let's reset the origin and run the
pipeline again.

Reset the origin when you want Data Collector to process all available data instead of processing data from the last-saved
offset. Not all origins can be reset, but you can reset the origin for
Directory.

In the UI menu bar, click the More icon: .
Then, click Reset Origin.

When the confirmation dialog box displays, click Yes,
then close the dialog box.

To start the pipeline, click the Start icon.

Data Collector goes into Monitor mode and the data alert triggers again. Before long, you'll see
some error records in the Jython Evaluator and the Field Type Converter.

For each
stage, you can see the error messages for latest error records.

To look at
all the error records, you can review the error record files in the directory that
you specified. Error records are written in the SDC Record data format so you can
create an error pipeline to process error records. We'll show you how to create an
error pipeline to process these error records in a future tutorial.