Leverage the power of Python in Tableau with TabPy

TabPy is a new API that enables evaluation of Python code from within a Tableau workbook.

When you use TabPy with Tableau, you can define calculated fields in Python, thereby leveraging the power of a large number of machine-learning libraries right from your visualizations.

This Python integration in Tableau enables powerful scenarios. For example, it takes only a few lines of Python code to get the sentiment scores for reviews of products sold at an online retailer. Then you can explore the results in many ways in Tableau.

You might filter to see just the negative reviews and review their content to understand the reasons behind them. You might to get a list of customers to reach out to. Or you might visualize overall sentiment changes over time.

Other common business scenarios include:

Lead scoring: Create a more efficient conversion funnel by scoring your users' behavior with a predictive model.

Churn prediction: Learn when and why users leave, and predict and prevent it from happening.

You can easily install the TabPy server on your computer or on a remote server. Configure Tableau to connect to this service by entering the service URL and port number under Help > Settings and Performance > Manage External Service Connection in Tableau Desktop. Then you can use Python scripts as part of your calculated fields in Tableau, just as you’ve been able to do with R since Tableau 8.1.

TabPy uses the popular Anaconda environment, which comes preinstalled and ready to use with many common Python packages including scipy, numpy, and scikit-learn. But you can install and use any Python library in your scripts.

If you have a team of data scientists developing custom models in your company, TabPy can also facilitate sharing those models with others who want to leverage them inside Tableau via published model.

Once published, all it takes to run a machine-learning model is a single line of Python code in Tableau regardless of model type or complexity. You can estimate the probability of customer churn using logistic regression, multi-layer perceptron neural network, or gradient boosted trees just as easily by simply passing new data to the model.

Using published models has several benefits. Complex functions become easier to maintain, share, and reuse as deployed methods in the predictive-service environment. You can improve and update the model and code behind the endpoint while the calculated field keeps working without any change. And a dashboard author does not need to know or worry about the complexities of the model behind this endpoint.

Together, Tableau and Python enable many more advanced-analytics scenarios, making your dashboards even more impactful. To learn more about TabPy and download a copy, please visit our GitHub page.

Do more with Tableau and Big Data

Find out how Tableau solves many of the problems Big Data can present to organizations of any size.Learn more.

También podría interesarle...

Comentarios

Enviado por Wilbur (no verificado) el 4 Noviembre, 2016 - 10:44

This is really amazing. This opens up potential opportunities for Tableau to be used in semi-real-time applications. This can be used to call microservices or to trigger actions.

Chris S.: not sure what you mean. Perhaps you can provide a specific example? Pure Python is generally comparable or faster than pure R on most benchmarks. Of course, in real-life use cases, it boils down to the specific libraries you use. Like R (with Rcpp), most of the data libraries in Python (Pandas, Numpy, etc.) call underlying code written in a lower-level language like C/Fortran (e.g. BLAS/Lapack) so the performance difference should be non-issue.

Question though -- How do the Script_XXX() functions determine whether it is a R script or a Python script? Does Tableau infer that the script somehow? Do you have to configure a workbook to either use R or Python, but not both? For example if the script is just the number 1, which is valid in Python and R, where does Tableau send that script to execute?

Hi Ashish,
Did you download the zip file from Github and follow the install instructions or did you try 'pip install tabpy'? There is no such package in PyPI at this point so the latter will fail. If you don't want to go through full install, there is install instructions for users who already have anaconda configured also in Github. The steps provided pip using the local package contained in the download from Github repository.

It was more than three years ago when I first wrote about my desire to have a feature like this in Tableau. At the time, R integration was a proposed and was an upcoming idea. With each passing year, I have to keep updating this article because Tableau keeps delivering new integrated technologies. It's becoming a full-time job trying to keep up with Tableau:

Looking at the error, issue seems like that the script is being sent to R, instead of Python. Can you check if Manage External Services dialog is pointing to TabPy server? Also note that this will work with Tableau 10.1 or higher.

Looking at the error, issue seems like that the script is being sent to R, instead of Python. Can you check if Manage External Services dialog is pointing to TabPy server? Also note that this will work with Tableau 10.1 or higher.

Looks like the Python library used in the example got an update that has some breaking changes. I reinstalled from scratch and observed the same error. So please try the following instead. We will update the image in the blog post with the same script.

My company network is not allowing to installing and Run the TabPy server, I was suggested to configure the Proxy, I get ProtocolError('Connection aborted') error message.
Could you please suggest some steps to resolve this?

hi,i'm getting this problem while using tabpy. using the sample - supterstore datasets, and i want to cluster the sub-category using the sum([Profit]) and sum([Sales]), but with the error returns: ValueError : n_samples=1 should be >= n_clusters=2. And here's my script:
SCRIPT_str("
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2)
tmp=[]
for i in range(len(_arg1)):
tmp.extend([[_arg1[i],_arg2[i]]])
KMmodel = kmeans.fit(tmp)
labels = KMmodel.labels_
return labels",
sum([Profit]),sum([Sales]))

I have my tableau python server running . i open my code in juypter and do run all . it does not give any errors . but it does not create the endpoint . i was looking under staging folder .
any idea why it is not creating end points . i don't see anything under query objects also