Why doesn’t Tableau perform as well on Hadoop? And how to make it faster

Tableau, one of the most widely used business intelligence
tools, is designed to simplify analysis for business users by letting them
interact with their data visually, using drag-and-drop operations, without the
need to write complex code or SQL queries. Depending
upon their requirements, users can create their dashboards and have real-time
conversations with data that lives across multiple systems and platforms.

But what happens when the size of data rises to big data
proportions? Do the users still have the same agility as they had when
analyzing smaller datasets?

Tableau users often report slowdowns when they connect
directly to Hadoop and try to perform complex analysis on large volumes of
data.

Why Tableau slows
down

Tableau is designed to enable the Cycle of Visual Analysis: get data,
view data, ask questions, get answers, repeat. This cycle becomes challenging
on Hadoop. The main reason is that Tableau generates and executes a SQL
statement for every interaction and on every visualization. As the size
of data increases, direct queries from Tableau to Hadoop take very long to
return – sometimes minutes and hours instead
of seconds. Additionally, when the
number of dimensions and cardinalities increase, it takes even longer to fetch
query results.

From the user’s perspective, as the analysis deepens, they want to drill down, drill up, or drill across their data to get useful business insights. However, Tableau has to connect to Hadoop to fetch the results for each interaction. This leads to a lag in dashboard refreshes, and interactivity suffers. Though Tableau is the perfect tool to help business users visualize their data, it is not designed to handle big data analysis.

Optimize Tableau performance

Analysts and data experts have
tried several approaches to maximize the performance of Tableau on Hadoop. They
put in massive amounts of effort to tune their queries. Instead of pulling
detailed reports, they make do with summary reports. Sometimes they reduce the
size of their datasets to smaller subsets and then run queries on them instead
of running them on the entire dataset. These approaches are restrictive and cannot
meet the analytical needs of a growing business.

Another popular method is to pull data out of the big data
platform in the form of extracts, and then make Tableau work on it. However, there
are several scalability and performance limitations on the amount of data that
can be processed this way. Also, this approach is resource-heavy and introduces
latency as the data is not live.

Over the years, several enhancements have been made to Tableau
to speed up big data analytics such as adding query optimizations capabilities, providing named
connectors for Hadoop and other big data platforms, and many more. However, it
is still difficult to match the speed at which the size of the data is growing.

Instant analytics on
Hadoop

An innovative way to improve the performance of Tableau on
Hadoop is to create a BI acceleration layer between the big data
platform and Tableau that enables quick access to big data. Once this layer is
in place, Tableau can connect to this layer instead of connecting directly to
the big data platform.

This layer is built using OLAP
on big data technology. In this approach, data is pre-aggregated
using the processing and storage capacity of Hadoop. Once the OLAP cubes are
ready, queries become incredibly light-weight, and response times are instant,
even for the most complex queries. Tableau can connect to this layer using
standard connectors. Another key advantage of this approach is that this layer
is transparent to the end-users. They can continue using their Tableau
interface for big data analytics and enjoy the same speed and interactivity as
before, without limitations on the amount of data they can bring into their
visualizations.

Conclusion

Tableau
was initially developed to work on relational technologies. Therefore,
it is unfair to expect it to deal with massive volumes of data on Hadoop. By
creating a BI acceleration layer on Hadoop, you can quickly scale up and use
Tableau’s compelling visualizations for analyzing big data with high
performance and unlimited scalability.