Clearly and Simplyhttp://www.clearlyandsimply.com/clearly_and_simply/
Intelligent Data Analysis, Modeling, Simulation and Visualizationen-USWed, 14 Sep 2016 16:00:00 +0200http://www.typepad.com/http://www.rssboard.org/rss-specificationhttp://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/images/public/somerights20.gifSome Rights ReservedClearlyAndSimplyhttps://feedburner.google.comUser-defined Excel Chart Tooltipshttp://feedproxy.google.com/~r/ClearlyAndSimply/~3/WaS_cJ60SZI/user-defined-excel-chart-tooltips.html
http://www.clearlyandsimply.com/clearly_and_simply/2016/09/user-defined-excel-chart-tooltips.htmlHow to provide different options of customized tooltips on a Microsoft Excel chart and let the user decide which one to displayHow to provide different options of customized tooltips on a Microsoft Excel chart and let the user decide which one to display

One of the previous posts described four different techniques how to create customizable, meaningful tooltips on XY Scatter Charts in Microsoft Excel:

Two of the approaches used a camera object (aka linked picture) to allow more formatting options of the tooltip.

This idea can be taken one step further: with camera objects, you can easily provide more than one type of tooltip and let your user interactively select the tooltip which is most helpful for his analysis.

Today’s article describes how to enable your user to select from five different tooltips with one single click: dimensions and measures, only a dimension and three additional charts displaying further information on the data point currently hovered over with the mouse. The post describes the idea and the implementation and of course makes the example workbooks available for free download.

The Idea – Let the user decide which tooltip shall be displayed

The basic idea is to predefine a certain number of tooltips showing different views on additional information of the selected data point (i.e. the one hovered over with the mouse) and let the user decide on the fly, which one is most useful for the analysis.

In our example, we define 5 different tooltips and let the user select one with a simple list of radio buttons next to the chart:

Note: Like in the previous post the examples for today’s article are using a data set of fertility rate (children per woman) and life expectancy (in years) of more than 200 countries. This time however, the data includes all years from 1950 to 2013 (data source: Gapminder).

Now, let’s have a look at the results first: which tooltip types are provided?

Tooltip 1: Country, Region and Measure Values

The first option is very close to the tooltip used in the previous post. It displays the name of the country, an icon of its flag, the region, the exact values of the plotted measures and the population as a third measure which is not visualized in the main view:

Tooltip 2: Flag and Country Name

The second tooltip is the minimalistic option. It only shows the flag icon and the name of the country:

Tooltip 3: Trail (XY Scatter over time)

Option #3 shows country, flag icon and the same type of XY scatter chart (fertility rate on the X-axis and life expectancy on the Y-axis) in the tooltip, but for the selected country only and for all years, i.e. the trail or development of this country over time. The red dot highlights the selected year in the main view:

Tooltip 4: Sparkline Fertility Rate over time

Tooltip 4 takes a closer look at the development of the fertility rate in the selected country over time with a sparkline. Again, the red dot highlights the selected year:

Tooltip 5: Sparkline Life Expectancy over time

Last option is a sparkline again, this time showing the development of life expectancy:

Many more or different views on the data could be displayed in the tooltips, no doubt about it. However, with these 5 versions and the possibility to easily switch from one tooltip type to another by simply clicking on the radio buttons, the user of the workbook already has an interesting additional interactive option for exploring and analyzing the data.

The Implementation

The example in this article is based upon the techniques #2 and #4 described in the previous post, i.e. either taking advantage of the chart events (#2) or an invisible ActiveX Label sitting on top of the plot area (#4) plus a camera object to display the tooltip.

You may assume that quite a few changes of the VBA code would have been necessary for implementing the user-defined tooltip options, but this is not the case. All enhancements to make the feature available are done by standard Excel worksheet functionality.

Here is an overview of what has to be done to get from one static tooltip to 5 different options:

Step 1: The Data

The database on worksheet [Data] was enhanced by adding all years from 1950 to 2013. Furthermore, to facilitate data lookups, an ID was added to the data consisting of the country name and the year.

Step 2: The Radio Buttons and the Control worksheet

First step is to add five radio buttons and texts on the dashboard and link them to a new target cell on the worksheet [Control]. Everything else on the [Control] worksheet stays as it has been in the original workbooks of the previous post.

Step 3: The Calculations

On the worksheet [Calculations], the data sources for the charts are prepared. The sheet contains three data tables:

the first ([tab_xy_scatter_data]) consolidates the data source for the main chart

the second one ([tab_tooltip_chart_data]) calculates the data sources used in the tooltip charts

finally [tab_hovered_point] is a one row table for highlighting the data point currently hovered over in the main view.

The highlighting with the red dot in the main chart and in the tooltip charts is done by the usual, well-known charting trick in Excel: the data source of the charts consists of two data series: the first one with all the data for all data points (formatted in grey) and the second one which has the measure value only in the row of the selected country or year and an #N/A in all other rows. The second data series is then formatted with a red fill color and a slightly larger marker size.

The formulas to calculate these series are simple: IF, MATCH, INDEX and some string concatenations. Not much to explain here, I think.

Step 4: The Tooltips

On the worksheet [Tooltips], the views of the different tooltip options are created in 5 different ranges:

Some static texts, some links to the [Control] worksheet, some formulas using INDEX to get the measure values, an embedded XY scatter chart and two embedded line charts, based on the data consolidated on sheet [Calculations] (see above). Nothing new under the sun.

The flags are camera objects linked to a named formula. We will come to this in a minute.

Step 5: The Named Ranges

Next, we define a name for each cell range on the [Tooltips] sheet:

Step 6: The Named Formula for the Flags

The flags inside these ranges are also camera objects. Worksheet [Flags] contains a sorted list of all countries and an icon of their flags:

To display the flag of the currently selected country, the camera objects inside the tooltips are linked to the named formula “myFlag”:

The named formula “myFlag” uses a simple INDEX function, looking up the currently selected country in the [Flags] worksheet and returns what is in the cell (the flag icon).

Step 7: The Named Formula for the Flags

Finally, we use another named formula called myTooltip to switch between the five named ranges defined in step 5 based on the target cell of the radio buttons:

This is done by a CHOOSE formula which selects the tooltip range based on the target cell of the radio buttons.

That’s it.

The Download Link

The download link below contains three workbooks:

a fully-fledged version based on the ActiveX Label control approach including the flags. This version was used to create the screenshots and the explanations above

a simplified version of the ActiveX Label control workbook: it provides the same functionality as the first workbook, but does not contain the flag icons

a workbook using the chart events approach, also in a simplified version, i.e. without the flag icons

Disadvantages

As mentioned above, the workbooks of today’s article are built upon the techniques described in the previous post. Thus, they come with the same disadvantages of the techniques #2 and #4 already mentioned there.

Having said that, the fully-fledged workbook with the flags comes with two additional drawbacks:

the tooltips are less responsive because of the flags and the additional camera objects, i.e. it takes a bit longer until the tooltip appears when hovering over a data point

Please decide for yourself whether spicing up the tooltip with a flag is worth accepting those two disadvantages.

Important Advice

The workbooks provided above were developed using Microsoft Excel 2016. I recently noticed that with opening the workbooks in an earlier version (e.g. 2010), the size of the inner plot area of the chart and/or the size and position of the Label control may be changed by Excel. I have no clue why.

Hence, if you are using the workbooks with an earlier version than Excel 2016, you may have to adjust and align the plot area and the label control to have the same size and the same position. To do so, go into design mode on the developer tab, select the Label and resize and reposition it until it sits exactly on top of the inner plot area of the chart and finally turn off the design mode again.

I tried to do this automatically via VBA code, but I failed. It is pretty easy to align the sizes of the inner plot area and the label, but I couldn’t manage to position the label automatically. If anyone has an idea how to do this, I’d appreciate if you would let me know.

Acknowledgement

It is certainly no surprise that it was again Leonid Koyfman (Leonid's LinkedIn Profile), who provided me with the idea for this article. Many thanks, Leonid. You are a true inspiration.

Stay tuned.

]]>ChartsMicrosoft ExcelRobertWed, 14 Sep 2016 16:00:00 +0200http://www.clearlyandsimply.com/clearly_and_simply/2016/09/user-defined-excel-chart-tooltips.htmlList of all Files in a Folder and its Subfoldershttp://feedproxy.google.com/~r/ClearlyAndSimply/~3/KuANCtI0DKM/list-of-all-files-in-a-folder-and-its-subfolders.html
http://www.clearlyandsimply.com/clearly_and_simply/2016/08/list-of-all-files-in-a-folder-and-its-subfolders.htmlHow to create a list of all filenames inside a specified folder and its subfolders including additional information like file types, sizes, dates and path lengths in Microsoft ExcelHow to create a list of all filenames inside a specified folder and its subfolders including additional information like file types, sizes, dates and path lengths in Microsoft Excel

From time to time, I am taking the liberty to post something totally off topic (see here or here), i.e. something that has nothing to do with data analysis and data visualization.

Today’s short article belongs to this category.

A couple of weeks ago, I ran into a problem with my offline backup software. The application threw a few errors, reporting it could not backup a couple of my documents. The issue was easy to find: the path of those files exceeded the maximum path length of the Windows API (260 characters).

It wasn’t so easy to fix, though, because the error log file of my backup software isn’t very helpful. I needed something to easily identify the files with a path length exceeding the 260 characters limitation, so I could shorten the folder and file names.

Hence, I wrote a little tool which automatically creates a list of all files inside a specified folder and all its subfolders, including the file names, the paths, the file types, the dates (created, last modified, last accessed) and the lengths of the path and filename. This list sorted descending by path lengths made it easy to identify the files my backup software couldn’t handle.

According to this Microsoft article in the Windows Developer Center, the maximum length limitation shall disappear in Windows 10, version 1607. The root cause for creating this workbook and code may go away soon, but I assume the little tool of easily getting a list of filenames inside a specified folder may be helpful on other occasions, too.

Download, unzip and open the tool, enable macros, click on the import icon at the top of the sheet, select a folder and wait until the code is finished (the status bar at bottom left shows the progress).

Please be advised that the code is not optimized for performance. Importing the ~41,000 files in my document folder took ~6 minutes on my machine. Not really fast, but since I am not doing this on a daily basis, it is good enough in my book.

Back in December 2010, I published an article about Better Chart Tooltips with Microsoft Excel. The post described the weaknesses of Microsoft Excel’s standard chart tooltips and provided a VBA-based technique how to get to customizable, meaningful and more useful tooltips in Excel.

Today, I would like to revisit this topic for several reasons:

Firstly, Microsoft hasn’t done anything about this shortcoming in the versions 2013 and 2016. The problem of insufficient chart tooltips in Excel remains.

Secondly, the workbook provided in 2010 implemented the better chart tooltips on another dashboard with additional features, which was published here: Bluffing Tableau Actions with Microsoft Excel. Many people have asked for a simplified, generic template only providing the tooltips on a standard XY Scatter chart.

Next, reader Will Clark came up with the great idea of using a camera object instead of a simple textbox, in order to have more formatting options.

Today’s post will discuss again Excel’s shortcomings in terms of chart tooltips, briefly describe 4 different techniques how to overcome this issue and – as always - provide the Microsoft Excel workbooks for free download.

The Requirement and Excel’s Shortcoming

Tooltips are a very helpful feature for exploring and analyzing data. This is especially the case on XY Scatter or Bubble charts with a lot of data points.

An XY Scatter Chart is the perfect choice to display the relationship between two variables X and Y and the distribution of the data regarding these two measures. However, without further chart elements (like data labels) or interactive features (like tooltips), there is no way to easily explore and analyze the data in detail, e.g. examine the maximums and minimums or the outliers.

Let’s have a look at an example. Here is an XY Scatter chart comparing fertility rate (children per woman) and life expectancy in years of more than 200 countries in the year 2013 (data source: Gapminder):

You now easily see the distribution of the data and the relationship between the two variables. But what if you want to analyze the data in more detail and get some information about e.g. the country at bottom left with a relatively low fertility rate of ~2.5 and a life expectancy of only ~47 years?

Knowing the exact values of the two measures is certainly one thing you are interested in. More important, however, is the very simple question: which country is this?

Adding data labels is usually not an option, because with this amount of data points, labels will clutter the entire chart and make the view useless. Thus, tooltips are the best way to display additional information on user request, i.e. when hovering the mouse over a data point.

Shouldn’t be a problem, right? Excel does provide tooltips on charts, doesn’t it?

Yes, it does and here is how this looks like in our example:

Excel’s tooltip gives us the name of the data series (which can be helpful, if you have more than one), information about the point (Point “3”) and the exact values of the measures (2.6, 47.6).

You may assume, Point “3” means that this country is the third row in the data source, but it isn’t. Point “3” means that this point belongs to the third category of the X-axis. All data points with an X-value between 2.6 and 3.5. belong to category 3, all points with an X-value between 3.6 and 4.5 belong to category 4, and so forth.

So, you can’t even use the point-information to look up the country in the data table.

The only way to get to the name of the country is to

remember the values of the measures

switch to the data table

sort the data table by fertility rate and life expectancy (this is optional, but definitely helpful for the next step)

and finally scroll through the data table to find the row with the values of fertility rate and life expectancy and get the name of the country

Now we see it. It’s Botswana. It works, but I am sure you will agree that this procedure is everything else than convenient, effective and user-friendly.

Let’s have a look at other data analysis and data visualization applications. Based on the same data, it took me less than 3 (!) minutes to create the following view with Tableau Software (including the formatting of the tooltip):

Hovering over any given data point, the tooltip displays the values of a selection or (like in this example) even all dimensions and measures in the data.

Although it definitely should, Microsoft Excel does not provide a standard option to customize the chart tooltips. All you can do is turning them on or off in the advanced Excel options.

As always, VBA can help to overcome this shortcoming: a few extra formulas in a formatted cell range, a camera object and a few lines of VBA code and you can provide your user with a tooltip functionality comparable to Tableau:

The following sections will describe four techniques of how to create meaningful and easy-to-use tooltips on an XY Scatter chart.

Adjustments of the sub myEmbeddedChart_MouseMove to detect the position of the mouse and – if the mouse hovers over a data point on the chart – assign the defined tooltip text to the textbox and make the tooltip visible. Otherwise the textbox is set back to invisible. If you are interested in the details, have a look at the VBA code of the template (download link see below)

That’s it. A text formula, a textbox and 117 lines of VBA code.

The disadvantage: the tooltips are only visible if the chart is selected, i.e. the user has to click on the chart first before the tooltips appear.

Technique 2 – Chart Mouse Move Event and a Camera Object

The second technique is a slight variation of the first one: instead of using a simple textbox and changing the text via VBA, a camera object is used as the tooltip:

The additional column in the data table is not needed, because the code does not assign a text to the shape anymore.

Instead, it writes the index of the current data point (i.e. the one the mouse hovers over) to a cell called [myPointIndex]

Based on the value of [myPointIndex], the relevant information is fetched from the data table by simple INDEX formulas in a cell range called [myTooltip]

The cell range [myTooltip] can now be formatted as desired (font type, sizes and color, number formats, alignment, etc.)

Finally a camera object is inserted on the worksheet with the embedded chart, named “myshpTooltip” and linked to the cell range [myTooltip]

The major advantage of technique 2 is the option to format the text of the tooltip as you like.

Technique 3 – Label Mouse Move Event and a Textbox

Last year, I discovered another technique to implement interactivity on Excel charts. Excel MVP Andy Pope uses an ActiveX label control on top of a chart to track and manage mouse positions. The main advantage compared to techniques 1 and 2: Andy’s approach doesn’t require to activate the chart first.

Like in technique 1, we need an additional column in the data table with the tooltip texts

The heart of the solution is an ActiveX label control (insert one via the DEVELOPER tab and Insert):

Position and resize the label control to sit exactly on top of the plot area of the chart

Format the label control to be invisible (white back color and transparent back style)

On the additional worksheet Control, define three named ranges ([myCurrentX], [myCurrentY] and [myDisplayTooltip]) and a data entry cell to let the user define a tolerance deviation

Add the MouseMove event sub of the label control on the worksheet object of the sheet with the XY scatter chart

The code inside this sub calculates the values (not the coordinates) of the current mouse position and writes them to the cell ranges [myCurrentX] and [myCurrentY]

On the worksheet Control, an array formula using MATCH, SQRT and MIN detects the nearest neighbor in the data to the current position (cell C9 on worksheet Control)

The named cell [myDisplayTooltip] contains a formula checking if the nearest neighbor is within the user-defined tolerance to the current mouse position. If it is, the formula returns TRUE, otherwise FALSE

The code checks the value of [myDisplayTooltip] and if it is TRUE, the text will be added to the textbox, it will be positioned on the worksheet and made visible. If [myDisplayTooltip] is FALSE, the code hides the tooltip

That’s it. Agreed, compared to technique 1, you need an additional worksheet and more complex formulas. On the other hand, the code is even slimmer (only 52 lines).

Technique 4 – Label Mouse Move Event and a Camera Object

This is a variation of technique 3 and – similar to technique 2 – uses a camera object instead of a textbox.

Not so much to explain here, I guess.

The Disadvantages

All four techniques are workarounds and - like most workarounds - they all have their disadvantages:

Techniques 1 and 2 are not working seamlessly. You have to activate the chart first (i.e. click on the chart area) to make the tooltips working. This is not intuitive and the user has to know this

Techniques 3 and 4 are providing a more seamless user experience, but they only work at a zoom level of 100% and if the entire chart is visible in the application window. The code forces the zoom level to 100% and this usually solves the issue, but it is still a drawback

Techniques 1 and 3 are very limited in terms of formatting options

Techniques 2 and 4 provide more formatting options, but the camera object is known to be a bit buggy sometimes and may lead to problems under certain circumstances

All techniques require VBA

All techniques require a considerable time for setting up the workbook, the chart and the code

Agreed, a lot of disadvantages, but the techniques are working and provide much better chart tooltips than Excel’s standard.

Download Link

Here is a zipped folder containing one template for each of the four techniques described above for free download:

Acknowledgements

Many thanks go again to Microsoft Excel MVP Jon Peltier for providing the chart mouse event code and to Microsoft Excel MVP Andy Pope for sharing the idea of using a label control for chart interactivity (see here: Clocks). Thank you, gentlemen.

More new posts to come soon.

Stay tuned.

]]>ChartsMicrosoft ExcelRobertSat, 27 Aug 2016 18:00:00 +0200http://www.clearlyandsimply.com/clearly_and_simply/2016/08/customizable-tooltips-on-excel-charts.htmlWriting and Reading Tableau Views to and from Databases and Text Files &ndash; Part 2http://feedproxy.google.com/~r/ClearlyAndSimply/~3/8itswQU3p-4/writing-and-reading-tableau-views-to-and-from-databases-and-text-files-part-2.html
http://www.clearlyandsimply.com/clearly_and_simply/2016/06/writing-and-reading-tableau-views-to-and-from-databases-and-text-files-part-2.htmlHow to use R, a Tableau connection to R and Calculated Fields to store and load the underlying data of a defined Tableau View in a Database or Text File (part 2 of 2)How to use R, a Tableau connection to R and Calculated Fields to store and load the underlying data of a defined Tableau View in a Database or Text File (part 2 of 2)

This is the second part of Michael Martin’s guest post on how to use Tableau and R to read and write the underlying data of a Tableau view from / to an ODBC database.

Michael is the Managing Partner at Business Information Arts, Inc, Tableau Partner, Tableau Certified Consultant, Tableau Instructor and leader of the Toronto Tableau User Group. Last, but not least, Michael is the only person I know who is using Tableau Software since version 1.0.

NOTE: You may need to zoom into some of the screen shots below for legibility reasons. Also, the text below contains references to the "Level of Detail Shelf". This is synonymous with the Detail Pill in the Tableau Marks Card within the Tableau 9.x and 10.x user interface. As mentioned in the introduction to part one of this post, including a '$' character as part of an RODBC object name (for example crs$odbc, crs$odbc.tables, crs$dataset) could cause an error on your system, and the affected calculation will fail. If this happens, replace instances of $ within an RODBC object name with another character within the affected calculation, such as an underscore - i.e. _.

Supporting files and a README file are available for download at the end of this part of the post. Please look at the README file first.

writing the Calculated Fields in Tableau to read the underlying data of a Tableau view from a database

In the first part, we have only run SELECT queries, but we could also run INSERT, DELETE, UPDATE or INSERT INTO (Append data) queries – all we need to be is pass a valid SQL statement to a Calculated Field that establishes an ODBC connection to the Database and run the SQL statement.

Today’s second part of the post will

show how to insert data of a Tableau view into a database

demonstrate the necessity of aligning the orientation of the view and the Table Calculation

show how to perform data definition queries

provide the general caveats of the technique

demonstrate an analysis technique that leverages multiple exports of a Tableau View based on the Superstore Dataset to SQL Server

give some insights on performance and upper limits

provide some useful links regarding the integration of Tableau and R

Insert Data of a Tableau View into a Database

Our next example INSERTS data in a Tableau view into a table in the database.

Within the database with the Superstore Data, create a new table called Data_From_Tableau with the following column definitions:

The RowID column is optional and should be defined as an IDENTITY column in SQL Server that gives each row a unique number at load time. This will be the destination table for an INSERT statement that runs within a Tableau Calculated Field.

Create a new sheet in Tableau - connected to the Superstore Orders Table in your Database.

The pattern is familiar. An ODBC connection is established and a query is run. This time, we run an INSERT query to insert data in the database that Tableau has “materialized” in the view.

As the SQLINSERTStatement Calculation needs to be a valid string from a Tableau point of view, we cannot include the literal text of SUM(Sales)) in the Calculation as the Calculation would no longer evaluate to a text string because Sales is a numeric field. The other values to be inserted (Category, Segment, and Region) are string values. However, the destination field in the database for Sales is numeric (float). To get around this, and since in my case, the destination database is SQL Server, I use a call to the CONVERT function to convert the call to the Tableau STR function to a float.

Using the CONVERT Header “sub-calculation” within the SQLINSERTStatement Calculation results in Tableau passing the call to the CONVERT function to the database (SQL Server in this case) in the correct format.

and the INSERT would fail because of the quote before the word CONVERT as there is no SQL Server function named ‘CONVERT.

Using the Convert Header calculation within the INSERT statement leads to a different (and better) result at run time:

INSERT INTO Data_From_Tableau ([Category],[Segment],[Region],[Sales])

VALUES ('Furniture','Consumer','Central',

CONVERT(float,86229.218999999968))

Note the absence of a quote before the word CONVERT. This makes all the difference to the database at runtime as the word CONVERT is passed directly to (in my case SQL Server) to run without the preceding quotation mark. Since CONVERT is a recognized T-SQL function, it works as expected. With a Microsoft Access database, we would have used the CDBL Function. Like SQL Server, Oracle uses a CONVERT function to convert values into different datatypes.

NOTE: Some database platforms will coerce a STRING value to a NUMERIC one when INSERTING a new row of data (i.e. perform the data conversion). In this case, you would not need to define and use the CONVERT Header calculation referred to above.

Thanks to Leonid Koyfman for suggesting an alternate and simpler version of the SQLINSERTStatement Calculation below:

In the code above, the value of 1.0 is multiplied by STR(SUM(Sales)). This technique will perform the conversion from STRING to FLOAT calculation against a SQL Server back end - without having to use the CONVERT HEADER "sub-calculation" referred to previously, which simplifies the calculation. Hopefully, this technique will also work with your database server.

Turn auto auto-updates off in the new sheet you recently created.

Then lay out the new sheet as shown below:

Remember that calculations that use R are Table Calculations in Tableau. Configure the Insert INTO Data From Tableau Table Calculation to run Pane (Across then Down) .

You could also put the Table Calculation in the Level of Detail Shelf as shown below:

Confirm that the Table Calculation is set to run Pane (Across then Down) – which matches the way the view is laid out.

Now do a manual update of the sheet (F9). After a few seconds (assuming that the Insert INTO Data from Tableau calculation is in the Level of Detail Shelf), your sheet will look like:

We can see how Tableau evaluated the SQL INSERT Statement at run time. This is useful for double checking your SQL statement syntax.

If you go to your database and query the Table you just inserted rows into, you should see:

This is the data that Tableau aggregated and “materialised” in the view.

You can make changes to the view (for example, filtering) and if you update the sheet manually, the updated data will be appended to the Data_From_Tableau Table in the Database.

If you connect to the Data_From_Tableau table in Tableau, you can make a visualisation based on the data you inserted as shown below:

Insert Filtered Tableau Views into a Database

Let’s look at another example, which takes what we have done so far a step further.

To start, create a second table named Data_From_Tableau_Nr_2 with the structure below in your database:

Once again, the RowID column is optional and should be defined as an IDENTITY column in SQL Server that gives each row a unique number at load time. This will be the destination table for an INSERT statement that runs within a Tableau Calculated Field.

In a new sheet, connected to the Superstore Orders Data in the back end Database, create the following calculations:

With a connection to the Superstore Orders table, build a view like the one below.

Note that many fields are set as filters. Within each region and customer segment, the view is showing in which categories fewer than 20 distinct products were ordered in a given year in a given region within a given customer segment. The sales metric is sized by profit, and the bars are sized by discount. Note that the aggregation for discount needs to be set to average.

The following fields are in the Level of Detail Shelf:

Order Count

Product Count

Profit

Discount

AvgItemsPerOrder

AvgYearlySales

Note that I have configured a Reference Line to show average sales as follows:

In the view above, the AvgYearlySales Table Calculation needs to be configured to run as follows:

The Table Calculation now performs the same calculation as the Reference Line I placed in the view.

If we want to INSERT a field or calculation into the database, it needs to be either in the view or in the Level of Detail Shelf. We can then include the field or calculation in the INSERT statement.

I have also created a Calculated Field named Comment:

The Comment describes the how the view is currently filtered. When the data is inserted into the database, I’m reminded of the profile of the data that was inserted.

We need to define one last Calculated Field (Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN) to run the INSERT query:

This calculation needs to be either in the view or in the Level of Detail Shelf in order to be run by updating the worksheet – but we’re not going to use it just yet – so don’t put the calculation in the view or in the Level of Detail Shelf – for now.

We can use the current view to explore the Data, find patterns of interest, tag these patterns using the Comment calculation (which means you need to edit the Comment Calculated Field so that your filtering is correctly described), and then INSERT the Data in the view (that Tableau “materialises” for us) into the database.

Alignment of the Orientation of the View and the Table Calculation

However, there’s a catch. If we don’t lay out the view a certain way, Tableau will not INSERT all of the data in the view, no matter how we configure calculation Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN to run. Only some partitions of data – but not all of the data, will be inserted.

Fortunately, there is a work-around. If I create a new worksheet and lay out the same view as shown below, the SQL INSERT will work as expected.

The view is laid out in a “wide” or “horizontal” orientation. Because of this, you need to confirm that the Table Calculation Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN is configured to run as shown below:

This Calculation can be put in the Level of Detail Shelf - but before putting this Calculation in the Level of Detail Shelf, disable automatic updates to the worksheet. The configuration of the Table Calculation now matches the orientation of the view.

Also, the AvgYearlySales Table Calculation needs to be configured as shown below to run correctly in this view:

If you put the Calculation Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN in the Level of Detail Shelf and manually update the view, the Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN Calculation will run. If you go to the database, the data (in this case 20 rows) meeting the filtering condition should have been inserted:

These are the rows with less than 20 distinct products ordered by region, category, and customer segment within a single year.

Alternatively, you could lay out the data in the view with a “vertical” orientation, as shown below:

In the view above I use a copy of the Calculated Field Insert INTO Data From Tableau Nr 2 PANE ACROSS | DOWN which I have named Insert INTO Data From Tableau Nr 2 PANE DOWN | ACROSS and configured it as shown below:

Running the Table Calculation Pane (Down then Across) matches the orientation of the view.

Data Definition Queries

You can also run Data Definition (DDL) Queries within a Calculated Field. For example you could put the following SQL commands that DROP and then RECREATE table into a Calculated Field named DDL_Queries:

Note the semicolon delimiter between the individual SQL commands. Note that a delimiter is not need after the last DDL statement for orders in the East as this is the last command in the batch of commands.

Note: Thanks again to Leonid Koyfman for an alternate and simpler version of this calculation:

SELECT * INTO Superstore_Sales_WEST FROM Orders WHERE Region='West';SELECT * INTO Superstore_Sales_SOUTH FROM Orders WHERE Region='South';SELECT * INTO Superstore_Sales_CENTRAL FROM Orders WHERE Region='Central';SELECT * INTO Superstore_Sales_EAST FROM Orders WHERE Region='East'"

You could then wrap this Calculation within a Wrapper Calculation named Run_DDL_Queries that would run the SQL DDL Code:

To run the DDL queries below, you could set up a View that looks like:

The Run_DDL_Queries Table Calculation needs to be set to run Table Across as shown below given the “horizontal” set up of the view:

You can also run UPDATE queries within a Wrapper Calculation. As a hypothetical example, if you wanted to UPDATE the contents of a column named UpdatedCommnt in a table named [East SS], you would write a calculation like the one immediately below named UPDATE Query SQL. You could then define a "wrapper calculation" named Run_Update_Query that would run the UPDATE Query SQL calculation as shown below.

Leonid Koyfman has suggested a simpler version of the UPDATE Query SQL calculation:

"UPDATE [East SS] SET UpdatedComment='This field was updated from Tableau'"

The Run_UPDATE_Query calculation above should be configured to run Table (Across) as shown below:

Assuming there was a table named [East SS] in your database, opening that table after running the Run_UPDATE_Query calculation would show that the UPDATE command worked as expected.

Of course, this was the simplest possible UPDATE query, but you can define and run UPDATE statements that JOIN tables, reference calculations in your Tableau workbook, and include WHERE and HAVING clauses

General Caveats and Considerations

Based on the author’s experience, it appears that you can’t mix query types within an SQL String that runs within a Wrapper Table Calculation. In other words, trying to run an INSERT query and a DDL Query within the same Wrapper Calculation will fail. On the other hand, running multiple INSERT or DDL Queries within the same Wrapper Calculation works. The author has no theory as to why this is.

Filtering Views by a Table Calculation (such as AvgYearlySales) causes the SQL Statement within the Wrapper Calculation to run incorrectly. In the example below, we are filtering for Average Yearly Sales of at least $28k. If we were to try and INSERT these records into the Database, the filtering shown below would not be properly applied.

Filtering a view by field values (dimensions or measures), or normal Calculations, as we have done in the examples we have covered, works fine.

Obviously, all SQL commands you write must be composed using the SQL implementation your database expects, and must be supported by the RODBC package we have called in our Calculations. For more information about the RODBC package, check out https://cran.r-project.org/web/packages/RODBC/RODBC.pdf.

The examples in this post are aimed at an instance of Microsoft SQL Server.

RODBC and your Calculation that manipulates the database do not return error messages in case the Calculation fails – your Calculation will fail silently. Therefore, the author recommends that you place the Calculation containing your SQL Code either in the view or in a tooltip, so that you can see how Tableau evaluated your SQL command at run time. You can then examine it and (hopefully) see where things went wrong and make corrections.

The Superstore dataset that you use may be different from the Superstore dataset this writer has used as Tableau periodically makes changes to this dataset. Therefore row filtering may result in results that do not align to the screen shots used in this post.

The examples in this post that INSERT data into database tables assume your Tableau connection to the destination table is live (i.e. not a Tableau extract). To see the results of INSERT queries in Tableau connected to a live data source, refresh the data source. If your connection to a given destination table is through an extract, refresh the extract.

If you run UPDATE or DDL queries against tables you are connected to via a live connection, refresh the data source to see the results of your queries. If the connection is through an extract (assuming you did not add one or more columns to a table the extract is based on), refresh the extract. If a DDL query added one or more columns to a table materialised within an extract, you’ll need to remove the extract and delete the extract file, refresh the (now live) data source, and then re-create the extract to ensure that new columns are available to you in Tableau.

You need to take care that each SQL statement in the Wrapper Calculation that runs the SQL statement runs only when you want it to. The best way to do this is:

Build a view that you use to explore your data. Proceed to step b when you have a view you would like to save in your back end database.

Build another view that applies the same filtering and uses the same Calculations as the view referred to immediately above, but lay out the view either horizontally or vertically as per the examples in the post. Once you have built this sheet, you can re-use it.

Place the data fields you want to be INSERTED into the back end database on the Rows or Columns Shelf, or in the Level of Detail Shelf. In the examples we have seen, the sales measure was in the view, other calculations and fields were placed in the Level of Detail Shelf.

Edit the value of the Comment Calculated Field so that it accurately describes the Data View you created in step a, and put the Comment Field in the view or in the Level of Detail Shelf.

Create / modify the Calculations needed (SQL String and Wrapper Calculation to run the SQL string) to execute the SQL statement you have defined.

Bring the Calculation that defines the SQL statement into the view (on the Rows or Columns Shelf) and put it in a Tooltip so that you can easily see how Tableau evaluates the SQL statement.

After you have laid out the view, turn auto updating off in this worksheet.

Drag the Wrapper calculation that runs the SQL statement you defined to the Level of Detail Shelf in Tableau. As this is a Table Calculation, make sure it is configured to run correctly: Pane Across (and then Down) or Pane Down (and then Across) depending on how out laid out the View as per point b above.

Double check that you have done the previous steps.

Manually update the worksheet, which runs the Calculation containing the SQL statement.

Drag the Wrapper Calculation that runs the SQL statement out of the Level of Detail shelf at this point. This prevents the SQL statement from being run again until you want to run it - otherwise the SQL statement will be run every time the worksheet is updated.

Turn auto updating for the worksheet back on again. You can now make changes to this sheet and see the updates right away, without running the last defined SQL statement

In another sheet, connect to the table(s) in your back end database affected by the last SQL statement that ran and confirm that your SQL command worked as expected and you see the data you expect to see in the destination table in the back end database. If your SQL query didn’t work as expected, double check the SQL syntax within the Calculation that defines the SQL statement(s) and run the query again by manually updating the worksheet.

Return to the first step and build another view.

What for?

Having covered all of this ground, one might ask “what’s the point? Why bother going through all this just to move data from one place to another – especially as the data we see in Tableau came from a database in of one type or another in the first place?”

When we aggregate, manipulate, and filter data in Tableau, we create (hopefully interesting) sub-sets of the entire dataset. Of course, we can create and combine Sets in Tableau, and filter by them. This writer filtered the Superstore data in a variety of ways, and exported these views to SQL Server - updating the value of the Comment Calculation with each change of filtering. The list below shows the different filtering applied to the dataset (based on the value of the Comment Field):

Each of these filtering conditions could have been saved to a Tableau Set. That being true, saving the data returned by each filtering condition (also a Set), persisted in aggregated form in a database table enables a very fine grained view of the relationships between members of one or more of these Sets by filtering on the value of the Comment data field in a Tableau view. We can observe the various contexts in which one or more conditions exist – and “capture” these contexts within a “Super Set” created in Tableau.

An Example

The Tableau view below looks at years, categories, regions, and customer segments where the total yearly profit was at least $2500 - which in relation to the total dataset would be considered “Well Performing Categories, Regions, and Customer Segments”.

The view surfaces some interesting situations:

There are instances of good profitability in years, categories, regions, and customer segments where there were <60 orders, fewer than 20 distinct products ordered, and where total sales was below the yearly total dataset average per category, region, and year ($15,954). An example of this would be 2013 in the Technology category, in the South, within the Corporate Customer segment (see the black rectangles in the View above). Despite less than optimal business conditions, there was profit.

If we look at the Office Supplies category in 2013 and 2014, in the East within the Consumer Customer segment, we see that despite very high sales (>$28k), a higher number of orders (>100), profit was between $2500 and $5000. We could have reasonably expected that total profit would have exceeded $5000 given the more favorable business conditions (see the red rectangle in the view above).

If we look at the Technology category in 2011, in the East and Consumer Customer segment we see that there were high sales (>28K) and higher levels of profit (>$5000) with fewer than 60 orders (see the purple rectangle in the view above).

The data in this view could be saved as a Set within Tableau named “Better Performing Years, Categories, Regions, Customer Segments: Profit >=$2500”. I can reset all filtering of the view, and filter using the Set as shown below (see the red rectangle in the Filters Shelf):

This type of analysis can be enhanced by filtering the data further within Tableau as shown below:

This view helps surface two situations in 2013 within the Technology category within the Central region and Corporate Customer segment and Southern region within the Consumer Business segment (see black rectangles above) where higher levels of Profits were achieved (>= $4000) despite there being <60 orders. The lower end cut-off for Profit has been raised from $2500 to $4000 through filtering by Profit. The lighter green rectangles now represent Years, Categories, Regions, and Customer Segments where profit was at least $4000.

Whatever orders there were, they tended to higher profit.

A drilldown from this view could show us whether or not the products ordered in these Regions and Customer Segments differed from what was ordered in other Regions and Customer Segments within the Category in 2013.

We could then create a Set within Tableau, perhaps naming it “Profit >$4000 in Better Performing Years, Categories, Regions, and Segments”.

Below: the data is being filtered under the Specifications of a Set named “All Unprofitable Years, Categories, Regions, and Customer Segments”.

This View brings some interesting facts to light. There are instances of larger amounts of Product Sales (> than the all Dataset average of $15,954, > $20,000 and even > $28,000), as well as instances with >60 and >100 orders, and yet none of the Years, Categories, Regions, or Customer Segment turned a Profit.

The ability to observe the various contexts in which one or more conditions exist – and “capture” these contexts within a Tableau “Super Set” is in the opinion of the author, a useful Analytic Technique.

Please Note

Remember that:

R is external to Tableau: the R platform must be downloaded and installed on your system or on a system your computer has permissions to connect to.

Tableau cannot use R if the Rserve package is not loaded and running before starting Tableau.

The R programming language uses libraries that are delivered in (documented) packages

Packages that your Tableau Calculated Fields will use must be installed within your R environment.

The library your Calculated Fields will use must be loaded within your R environment or by your Tableau Calculations (using a library[Library Name] statement).

Some R libraries have dependencies on other R libraries.

Some R libraries are trickier than others to load (example: rJava, zoo, etc.) You may need to change your system PATH statement to get some libraries to run (Control Panel -> System -> Advanced).

Tableau Calculated Fields that leverage R are Table Calculations and therefore will most likely require customisation to function as expected.

Performance and Upper Limits

The Superstore dataset used by the author contains just under 10,000 rows of data. On the author’s system (64 bit Windows 7, 64 bit Tableau, 64 bit SQL Server, 16 GB system RAM, I7 processor) the Tableau calculations that run SQL commands, or write data to text files using the Superstore data source run instantaneously.

Obviously, there is an upper limit as to how many rows can be manipulated by a SQL statement “hosted” within a single Tableau Calculation. Before your SQL runs, Tableau does the heavy lifting to aggregate the measures and calculations in a view based on the active data source and filter the results accordingly.

If you have written INSERT statements, the aggregated rows in a Tableau “materialised” View are inserted into a database table, it’s a question of how many rows (<100?, >1000?, >1000,000? >1,000,000?) are to be inserted in your back end database given the capabilities of your system, the back end database, RODBC, and communications over port 6133 on your system. If you have written DDL queries that DROP tables in a back end database, and / or SELECT data in Tables in the back end database INTO other tables in your back end database, it’s a question of how quickly your back end database can execute these statements through an ODBC connection “hosted” by Tableau.

Ditto if you have written UPDATE queries that operate on tables in your back end database (and do not update these tables with any data from the current Tableau view). If you are updating rows in your back end data with some or all of the data in a Tableau view, there will be an upper limit as to how many rows you will be able to update within a “hosted” ODBC connection within a single or series of Tableau Calculations. If you want to push the envelope, you’ll need to experiment – the author makes no guess as to what will be possible on your system.

As Tableau does not give you the chance to recover unsaved work in the event of a crash, save your Tableau workbook files often.

Many thanks to Michael for this brilliant article and the amazing detailed and thorough step-by-step instructions. If you enjoyed Michael’s posts, please drop him a line in the comment section. Thank you.

]]>TableauRobertThu, 09 Jun 2016 20:00:00 +0200http://www.clearlyandsimply.com/clearly_and_simply/2016/06/writing-and-reading-tableau-views-to-and-from-databases-and-text-files-part-2.htmlWriting and Reading Tableau Views to and from Databases and Text Files &ndash; Part 1http://feedproxy.google.com/~r/ClearlyAndSimply/~3/2JjPFp8EjQU/writing-and-reading-tableau-views-to-and-from-databases-and-text-files-part-1.html
http://www.clearlyandsimply.com/clearly_and_simply/2016/06/writing-and-reading-tableau-views-to-and-from-databases-and-text-files-part-1.htmlHow to use R, a Tableau connection to R and Calculated Fields to store and load the underlying data of a defined Tableau View in a Database or Text File (part 1 of 2)How to use R, a Tableau connection to R and Calculated Fields to store and load the underlying data of a defined Tableau View in a Database or Text File (part 1 of 2)

My good friend Michael Martin is kind enough to contribute another guest post to Clearly and Simply.

In his article, Michael will show us how to store the underlying data of Tableau Views in a database or a text file using R and Tableau Calculated Fields. Michael's post will come in 2 installments.

NOTE: It may be necessary to zoom in on some of the graphics within the body of the text below for better legibility. If you are running 32 bit Microsoft Access under 64 bit Windows, and wish to output the data in a Tableau view to 32 bit Microsoft Access, please read the Appendix to this post before proceeding. Supporting files and a README file are available for download at the end of the second part of the post. Please look at the README file first.

The author would like to thank Leonid Koyfman for his suggestions.

Here is part one:

In the Bar Graph View below (based on the Superstore dataset that comes with Tableau), yearly sales are summed and colored by profit and sized by discount:

Reference lines show average sales within each region and customer segment by year for all categories.

How can we “capture” the data in this view and save it in a database for further analysis and re-use?

Tableau does let you view underlying data and copy it to the clipboard, export it to a text file, or even a Microsoft Access database, but these are manual operations. What if you could write a Tableau Calculated Field to export the underlying data in a Tableau View to any Database Server that supports the ODBC protocol?

You can, using R, and an R language library called RODBC.

Tableau opened the door to using R language libraries within Calculated Fields in version 8.1. This article walks you through downloading and installing R on a desktop computer, configuring Tableau to use your R installation, and writing the Calculated Fields required to export data from a Tableau view to a database such as SQL Server or Microsoft Access.

As of this writing, the most current version of R is version 3.2.5. The examples in this article should work with versions 3.0 and higher. The computer this writer uses runs R 3.2.4.

Download the R distribution, which should contain a 64 and 32 bit implementation of R. If you are running 64 bit Windows, you can install the 32 and 64 bit versions of R.

Note: If you are running 64 bit Windows and 32 bit Microsoft Office and want to write data to an MS Access database, make sure to install the 64 and 32 bit R implementations. If you are running 32 bit Windows, your only option is to download the 32 bit R distribution.

In your Windows Program Groups (assuming 64 bit Windows), you’ll see:

which confirms that both 32 and 64 bit R were installed. R comes with a simple editor called Rgui, that you can use to write R scripts. As you can see, this writer is using R version 3.2.4.

I suggest you head over to www.rstudio.com and download an alternative development environment named R Studio.

It’s a free download, easy to install, and easy to use.

Download, Install and Load R Packages Rserve and RODBC

You can extend the functionality of your base R installation by downloading and installing packages. R has a very dedicated and productive developer community, and most packages from this community are well documented. To run the examples in this post, you will need to install two R packages that are not included in your base R installation: Rserve and RODBC.

Start up RStudio and invoke the Install Packages… command from the Tools menu.

An Install Packages dialog will appear. Enter Rserve and RODBC (separated by a comma) in the packages field as shown below and click on Install.

R Studio will download and install these two packages within your R installation. You should then see something that looks like:

By the way, both of these packages should be available to the Rgui Editor, should you choose to use it.

The next step is to load the RODB and Rserve packages. Within RStudio, by default on the right hand side of the screen, you will see a list of available packages:

Locate the RODBC and Rserve packages and tick the box to the left of each of the package names. RStudio loads the RODBC and Rserve packages. The R Scripts you will write as you work through this post also load RODBC, but pre-loading RODBC in RStudio ensures this package will be available when referenced from Tableau.

As Rserve is a binary .exe file, we also need to start it from within RStudio so that Tableau can access R packages within Calculated Fields.

Type Rserve() and press ENTER at the > prompt as shown above.

Configure Tableau to use your R Installation

To use Tableau (through version 9.2) with R, select Help -> Settings and Performance - > Manage R Connection as shown below

and enter localhost (or 127.0.0.1) in the Server field and 6311 as the Port.

In Tableau 9.3 and above, Select Help -> Settings and performance -> Manage External Service connection as shown below.

Enter 127.0.0.1 or localhost in the Server field and 6311 as the Port.

Then click on the Test Connection Button, and you should then see:

You are now ready to use R with Tableau.

NOTE: The supplied tutorial Tableau Workbook (a download link is at the end of part 2 of this post) was made in Tableau 9.3, and cannot be opened with earlier versions of Tableau Desktop. You can however use earlier versions of Tableau (starting with version 8.1) to re-create the examples shown in parts 1 and 2 of this post.

Set Up your Database and an ODBC Data Source

You then need to export the Excel Superstore Orders data that comes with Tableau to a database. The examples in this article use a SQL Server Database, but you could export the data to Microsoft Access, Oracle, or any database server that supports ODBC.

NOTE: Microsoft SQL Server and Microsoft Access versions of the Superstore data set provided in Excel by Tableau that the author used to create the tutorial workbook are provided for download to use when re-creating the examples discussed in both parts of this post. Download links for the data and other resources are at the end of the second part of this post.

If your version of Access is 32 bit, but you’re running 64 bit Windows, see the appendix at the end of this article, which describes how to configure a 32 bit ODBC connection under 64 bit Windows before proceeding. Make sure that you export the data in the Orders sheet in Excel to your database. This should result in there being a table named Orders in the database.

The next step is to configure an ODBC Data Source (in Windows 7 or 8 via Control Panel - > Administrative Tools -> ODBC (in OS X, use ODBC Manager). To configure an ODBC data source in Windows 10 check out this link which explains how to run the ODBC Connection Manager within Windows 10:

Again, if you are running 32 bit Access under 64 bit Windows, please read the appendix at the end of this post before proceeding.

Below are screens for creating an ODBC Data Source (Windows) for Data in SQL Server for a database named Tableau_Superstore_Orders. This database contains the Superstore Sales Data (in Excel) that ships with Tableau.

Open the ODBC Administrator Tool, select the System DSN Tab and then Add to add a new Data Source as shown below:

For SQL Server, select either the 10.0 (SQL Server 2008) or 11.0 (SQL Server 2012) driver as shown below, and then click on Finish.

Another dialog will then appear:

Give the ODBC Data Source a name and indicate the SQL Server Instance name, and then click on Next.

Another dialog appears:

Indicate how the login to the database will be authenticated. By default, Integrated Windows authentication has been selected. If you administrate the machine SQL Server is installed on, this option should work. Otherwise, enter credentials for a SQL Server user who is authorized to log into the database. Then click on Next.

Change the default database to Tableau_Superstore_Orders in the next Dialog that appears as shown below, and then click on Next.

Another dialog will open. Accept the default entries and click on Finish.

A dialog appears that lets you test the ODBC connection you just configured. If there are no issues, you will see a message like what you see below:

Closing the dialog boxes above will bring you back to the main screen of the ODBC Administrator. Click on OK to close the ODBC Administrator.

Create a new Tableau Workbook and Connect to the Database

We can open Tableau. The author used Tableau 9.3 when writing this post. Create a new file and make a connection to the Tableau Superstore data you exported to a database. The author exported the data to SQL Server. The Excel Superstore sales data is in a sheet named Orders, so there should be a table named Orders in the database you exported the Excel data to. If you download and open the tutorial Tableau workbook mentioned earlier (in Tableau 9.3 format), you could open that workbook as well, and use it for reference as you build your new workbook based on the content of this post.

Create Calculated Fields in Tableau

After connecting to the Superstore Data, define a new Calculated Field named OrdersSQLString.

The calculation is:

'SELECT * FROM Orders'

Note the single quotation marks surrounding the text of the SELECT statement.

Now define another calculation that connects to the Superstore Data in SQL Server and writes out the results from a SELECT query to a text file. The calculation will be named RunOrdersQuery.

NOTE: It has been reported that including the '$' character within the name of an RODBC object name (for example crs$odbc, crs$odbc.tables, crs$dataset) in a calculation could cause a runtime error on some computer systems and display 'Object not found' error. Leonid Koyfman suggests replacing the '$' character with an underscore ( _ ) in the affected calculation.

Also note that the calculation above contains a reference to a hard coded path to a file on disk. You can alter this path in the calculation to a location of your choice on your system. You will need to create the required directories and sub-directories (i.e. folders) on the path to the destination file on your system before running the calculation. The RunOrdersQuery calculation will not create them.

Calls to R functionality within Calculated Fields have to follow certain conventions. The calculation must start with the word SCRIPT_ and then specify a return value, in this case a String (STR), followed by an open parenthesis and a quotation mark. The string return value of the calculation is the informational message “Query Results from refreshed Data Connection”.

The calculation then proceeds to connect to the Tableau_Superstore_Orders ODBC data source you created and runs a query of the value contained in the incoming parameter .arg1.

The write.table statement within the calculation writes out the results of the SQL command to a text file to file name indicated in the command. You can change the file path and name as desired. Note that you need two backslashes to indicate drive and directory names when constructing the file path. Note the | (Pipe) delimiter which prevents scrambled data in the output text file if any data returned by the query contains commas in a data field.

The calculation then closes the ODBC connection and displays the informational message, which - as mentioned earlier - is the string return value for the calculation. Immediately following the informational message is a closing quotation mark that offsets the quotation mark that follows the open parenthesis in the first line of the calculation.

The last line of the calculation contains a comma, and then the value of .arg1 – the value of the OrdersSQLString calculation you defined earlier.

The Calculated Field ends with a closed parenthesis.

Disable Auto-Updating

For our Calculated Fields to work as expected, we need to disable Auto-Updating of worksheets.

Why do you have to update the view (and other views with similar queries) manually?

We do the manual update to control when the Table Calculation that executes an SQL command actually runs. This is similar to re-calculating an Excel worksheet manually. If we left auto updates on, every change made to the view (adding or removing fields, formatting, filtering, etc.) would cause any calculation running SQL commands to run, which could cause all sorts of problems in your database. In this example, the text file would simply be re-written to disk. Turning auto update off solves this problem.

After you have written these two Calculated Fields, set up a Tableau view that looks like:

Note that Tableau considers all calculations that use R to be Table Calculations.

Now run a manual update of the worksheet by clicking on the Run update icon or tab the F9 function key.

Running the update executes the RunOrdersQuery Calculation. The view should then look like:

We’re seeing the sum of sales from our connection to SQL Server, which is no surprise. We see the values of the Orders SQL String Calculation and the return value of the RunOrdersQuery Calculation above the bar.

Create a New Data Connection

We now want to configure a new data connection to the text file that the RunOrdersQuery calculation should have created.

Create a new sheet, configure a Text File connection, and point to the folder specified in the RunOrdersQuery calculation.

After making the connection, and back on the Tableau worksheet, set up a view that looks like:

If you see what appears above, the “proof of concept” we have worked through worked expected. You’ve connected to a database using ODBC within a Calculated Field, and re-directed the output from a SQL Query to a table in that database to a text file.

If you navigate to the folder on your system where the text file is, and open the text file in an editor, it will look like:

Parameters and Calculated Fields for User defined Conditions

We’ll now define another “proof of concept” calculation that includes a WHERE clause. Start by creating two parameters: Category Condition and Regional Condition.

This calculation works just like the RunOrdersQuery Calculation. It establishes an ODBC connection to the SQL Server Databases, and runs the SQL Query contained in the Full SQL Statement Calculation. Remember to specify the correct path to the output file in your version of the calculation.

The Full SQL Statement Calculation uses parameters to gather input for a WHERE clause. The parameters are currently set to the Office Supplies Category in the South.

Additional Tableau Worksheets

In a new sheet, with a connection to the Orders table in your back end database, disable automatic updates for this sheet as shown previously, and set up a view that looks like:

Now run the RunFilteredOrdersQuery Calculation by manually updating the worksheet.

You’ll then see:

Again, the sales numbers are coming from SQL Server.

Create a new sheet and make a Data Connection to the text file that the RunFilteredOrdersQuery calculation should have created.

When we go back to the Tableau worksheet, we see that sales are constrained to the Office Supply Category in the south

If you were to go back to sheet 3, change the parameter values, manually update the worksheet (which re-runs the RunFilteredOrdersQuery Calculation), then come back to sheet 4 and refresh the FilteredOrdersQuery Data Source, the Bar Graph in sheet 4 will reflect the new filtering.

Appendix: 64 bit Windows and 32 bit Microsoft Access

If you are running 64 bit Windows and have 32 bit Microsoft Access, this appendix describes the steps to follow to configure your environment to run 32 bit R and create a 32 bit ODBC Data Source.

Download the R platform installer that contains 32 and 64 Bit versions of R. When running the installer you downloaded, 32 and 64 bit R will be installed by default – multiple versions of R can be installed on a single PC. If you have not already done so, download and install RStudio as described earlier in this post. If you are running 32 bit Windows, you are also running 32 bit Tableau, and only the 32 bit version of R will be installed on your system.

Next: create a 32 Bit ODBC Data Source using the 32 bit version of the Windows ODBC Connection Manager.

Using Windows Explorer, navigate to C:\Windows\SysWOW64\odbcad32.exe (under Windows 7 and Windows 8). If you are using Windows 10, check out

Double click on odbcad32.exe to load it. Select one of the available Microsoft Access database drivers to configure your Data Source. Create an ODBC data source as described earlier in this post (a User or System Datasource). Creating the data source includes a step where you navigate to your MS Access database file with Windows Explorer and select it with your mouse.

Next: Open R Studio – tell RStudio that you want to use the 32 Bit version of R. Select Global Options from the Tools menu as shown below and select the 32 Bit R Installation as your default R implementation to use.

Close and restart RStudio to save your changes.

Next: Install the Rserve and RODBC R Packages within your 32 Bit R Installation. From the Tools menu, select Install Packages and type in Rserve, a comma and then RODBC in the Packages field. Tick the “Install Dependencies” option - then click on Install.

As described earlier in this post, configure Tableau to use your 32 bit R installation.

Load the RODBC package and then load and start Rserve as described earlier in this post.

Use the 32 bit MS Access ODBC Data Source name you defined in your Wrapper Calculations that run SQL statements.

If you are running 64 bit Windows, and want to run 64 bit R later, follow the instructions you followed to use 32 bit R in reverse. Select Global Options from the Tools menu as shown below and select the 64 Bit R installation as your default R implementation to use. Close and restart RStudio to save the change.