Quick Links

Account Settings

Why use Tableau Data Extracts

This is part two in a three-part series about Tableau Data Extracts. In the first post, we looked at how Tableau Data Extracts are built and used by Tableau. If the content of the first post did not already sell you on the benefits of TDEs, then here are several reasons that Tableau Data Extracts (TDEs) are valuable—even essential—to Tableau users:

7 reasons for using Tableau Data Extracts:

Performance

Data extraction not only offers increased performance when the underlying data source is unacceptably slow, but it also can speed up the performance when the use of CustomSQL slows it down (see here).

Reduced load

Replacing a live connection to an OLTP database—or any database—with a TDE reduces the load on the database that can result from heavy Tableau user traffic.

Portability

A TDE can be bundled with Tableau visualizations in a packaged workbook for easy sharing and collaboration.

Pre-aggregations

When creating a TDE, Tableau gives you the option to aggregate your data for all visible dimensions. This is known as an aggregated extract. An aggregated extract is smaller and contains only aggregated data, as the name implies—not all of the row-level data that is stored in a standard TDE. Accessing the values for additive aggregations in a visualization becomes near-instantaneous because all of the work to derive the values has already been done. So, the most basic reason to use an aggregated extract is performance.

You can also choose to roll the aggregations up to the selected level—e.g. month, quarter, year, etc. —of one of the date fields in the underlying data source. This further reduces the size of the extract by reducing number of aggregate values stored in the extract, and for that particular level of aggregation, further increases performance. For more information, check out the following articles here and here.

Materialization of calculated fields

When you optimize a Tableau extract, all of the calculated fields that have been defined are converted to static values upon the next full refresh. At that point, they essentially become additional data fields that can be accessed and aggregated as quickly as any other field in the TDE. The increase in performance can be especially strong when working with string calculations as string calculations are significantly slower than numeric and/or date calculations. So, as was the case with aggregated extracts, the most basic reason to optimize a TDE is again performance.

Publishing to Tableau Public and Tableau Online

Tableau Public only supports TDEs. While Tableau Online can connect live to cloud-based data sources, TDEs are the most common data source used in that environment.

Support for functionality not available when using MS Jet

Versions 8.1 and earlier of Tableau use the MS Jet engine for accessing Excel, MS Access and text files. By creating an extract, certain features not supported by Jet—count distinct, for example—can be used. (In version 8.2, Tableau replaced MS Jet for accessing Excel and text files with a new, more performant and functional engine.)

Example use cases

Representing all of the possible use cases for TDEs would not be possible in a blog post as short as this one. What follows is meant to give the reader a sense of the unique kinds of things that can be done with TDEs to extend the functionality of Tableau.

Compare an aggregate for all rows in an underlying source with the same aggregate for a subset of the rows. By blending a data source with an aggregated extract based on the same data source, you can filter and slice data to compare aggregations of the subset to the entire data set (this can also be done using RAW SQL functions or Custom SQL).

Create “double aggregates.” For instance, if the default aggregate for a measure is SUM, creating a pre-aggregated extract would allow you to calculate an AVG of SUMs in the visualization.

Build a KPI-style dashboard that combines worksheets based on aggregated extracts with worksheets based on live connections. This design pattern has performance advantages in that KPI-style aggregations are pre-calculated and do not require a live connection, reducing the load on the underlying data source(s). By the way, here’s a nice article from this same blog about KPIs.

Fig. 1 A dashboard that combines aggregated extracts and live connections for ease of navigation and performance

Hopefully this post has given you an even better sense of how and why Tableau Data Extracts can help you use Tableau to see and understand your data. Next week, we’ll wrap up the series with an extensive list of tips, tricks and best practices.

You might also be interested in...

Comments

Submitted by Joshua Milligan on July 31, 2014 - 12:00pm

I've really been enjoying this series and found the first part very helpful in understanding some details about extracts. I also find the use cases presented in this part very useful.

I do feel that reason # 5: Materialization of calculated fields, should have a little more explanation. Not all calculated fields are materialized. Aggregations and table calculations are definitely not. Row-level calculations using user functions or date functions are not. And even some other row-level calculations may not be materialized based on what the optimizer determines.

In 8.2, are tde files no longer created and stored in the same folder locations as tbwx files? In the past, when I created a tbwx file, the tde file would always appear in the same folder as the tbwx. Now, it looks like the tde automatically goes to "C:\Users\achandarana\AppData\Local\Temp\TableuTemp" folder. I tried following this article, which didn't help: http://kb.tableausoftware.com/articles/howto/changing-the-file-path-for-extracts?lang=en-us
Haven't tried this one yet, because I think I'd need Admin access on my laptop from our corporate-IT: http://kb.tableausoftware.com/articles/knowledgebase/default-location-when-creating-extracts?lang=en-us
The bigger issue I'm having related to this is that now when I go back into my source Excel data files and update data, the tbwx doesn't pull in new columns (of text formatted as General in Excel) properly. It will pull in the header for the columns I add, but not the data below it, which all comes in as null.
What's even more bizarre is that when I take that same column and put it on a new Excel tab with just the unique identifier, Tableau will recognize that new spreadsheet tab and pull in the new column of data just fine.

I have use cases where most user activity on a dashboard could use extract aggregate data, but, occasionally, those users might need to drill to row-level detail. Is there an easy way to design a dashboard so that the transition between using an extract and drilling to row level int he original data source is transparent to the user?

This was very informative. It would be nice to know if there are situations where you wouldn't recommend using TDEs. Because TDEs tend to like flat files there might be some more complicated data models where they may not be appropriate.
Any suggestions on whether TDEs should be used when the dataset contains 1B records and:
1. Number of columns: Is there a recommended max limit? What if it would contain 150+ columns?
2. Complicated model: Needs to resolve many-to-many relationships
3. Data modification: Underlying source data experiences inserts/updates/deletes.
4. Limited Refresh window: Refresh needs to complete within 12 hours.

What we'd like to do is build some kind of decision matrix that will help guide us to make the appropriate choice to get the best performance from Tableau using an appropriate data source that works with the constraints for that project.