This documentation applies to Trifacta Wrangler. Download this free product.Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Contents:

A column is referenced by the name of the column, which can be inferred from the first row of data in your dataset.

When a dataset is loaded, the application inserts a few transform steps automatically. If the application can identify that the first row of data is likely to contain the column headers for the dataset, this row is promoted to be used as the first version of the names of each column.

In some cases, however, this auto-generation of column headers may not work as expected, or you may have chosen at import time to not detect the structure of the dataset.

This section describes how you can generate column headers from within the application.

If your data has a header row in row 1

If the initial transforms do not promote your first row of data to be the column headers, you can use the following transform to promote the first row of data to be the column headers:

header

In some cases, the first row of data might not contain the headers or might not contain all of them.

For example, you may have some columns that contain nested data, and the column headers may not be immediately accessible.

Tip: After you unnest data in one or more columns, the first row might contain column headers. You can apply the header transform to promote these new values to be the names of the columns. The other column headers should not be overwritten.

If your data has a header row after row 1

In some cases, data may be imported such that header information is stored in a row other than the first one in your dataset.

Steps:

Hover your mouse over the black dot to the left of the row that contains your header information. The popup displays something similar to the following:

If your data does not have a header row

If for some reason your source data does not include header information, you can insert header information using the following method.

NOTE: In general, it is easiest to manually rename columns through the application. See Rename a Column.

However, if your data contains a large number of columns, manually renaming each column may be time-consuming, and each column rename adds a step to your recipe. For wide datasets, this solution may be easier to execute and to maintain.

Steps:

Open the dataset in the Transformer page.

Open an application such as Microsoft Excel, which can write out CSV files.

For each column in the Transformer page, add a string name for the column in the other application.

If you are using Excel, insert this column name in the top row of the spreadsheet, with each new column added in the cell to the right of the previous one.

If you are using another application, make sure that you are inserting commas between each value and putting your column names between double-quotes.

NOTE: You may need to create a dummy second row, which forces the application to treat the imported dataset as multiple columns. Otherwise, it may treat the incoming CSV as a single columnar value.

Among the column headers, locate a column by which you are comfortable sorting the dataset.

For example, if your dataset includes transaction information, you may want to sort the data by the primary key TransactionId column.

Rename this column to prepend the name with aaa. For our TransactionId column, the new column name would be the following:

aaaTransactionId

This modification enables the sorting of the values in the dataset, starting with this row. All rows in the dataset are sorted according to this column.