Note that the Data Connector picked up custom fields like “ENV” and “RAILS_ENV” as well as standard fields like “$current_url”. Also, note that column names are normalized in the type: rename filter section.

Step 5: Load Data

Finally, submit the load job. It may take a couple of hours depending on the size of the data. Users need to specify the database and table where their data is stored.

It’s also recommended to specify --time-column option, since Treasure Data’s storage is partitioned by time (see architecture)
If the option is not provided, the Data Connector will choose the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long and timestamp type.

The above command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD this command will not succeed, so create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table:

Scheduling Incremental Data Loading

Unless you are migrating off of Mixpanel completely, Mixpanel data must be incrementally loaded into Treasure Data regularly. The Data Connector’s scheduling function comes in handy for this purpose.

Once scheduled, the Mixpanel Data Connector’s successive runs increment the from_date parameter by fetch_days. For example, if the initial run in load.yml was

from_date: '2015-10-28'fetch_days: 1

Then, the next run will be

from_data: '2015-10-29'fetch_days: 1

You do not need to update load.yml once it is uploaded since from_date field is automatically updated on the server side.

Suppose you wish to schedule a daily upload. Then, make sure that the initial from_date is at least two days ago and set fetch_days: 1 in load.yml. Then, the following command creates a daily job called “daily_mixpanel_import” which loads historical data to mixpanel_historical.app_name on Treasure Data every day.

The historical runs of the import can be seen with td connector:history <name>, e.g., td connector:history daily_mixpanel_import.

Incremental Data Loading With Incremental Column

Certain Mixpanel account has project set up with additional field added to indicate the time data get processed by Mixpanel. For example: mp_processing_time_ms

User can add an additional parameter to Mixpanel Input Plugin incremental_column. The max incremental_column value of a run session will store and use as a filter for next run, Example: where incremental_column > previous_run_max_value.

Look back for data with back_fill_days

For devices data like phone, tablets… Mixpanel can keep data in user’s device for a while before send them to Mixpanel Server. So data appear in query can be delayed up to 5 days. When incremental import data from Mixpanel, we can missed data that are cached in user devices.
To solve this issue we can set back_fill_days parameter(Default to 5). Plugin will look back for a number of days(from_date - back_fill_days). Due to performance issue, this feature only work with incremental_column.

Split range query into smaller API queries with slice_range

For some cases, the data return could be too big, data return in 1 query could exceed Mixpanel limitation and cause job to failed. In that case, we can split the big range query into smaller one using slice_range configuration parameter.
This parameter are optional and default to 7.