My new foray into making this work would hopefully be via R-Integration. Can you think of a way to capture the vector of user-defined entries from a multi-select quick-filter and send these values into R ?

Here is a sample data set. It simply maps fictitious bocce ball teams to keywords, each with a word count. INDEX() is there, to show how many teams meet the quick filter criteria.

One caveat, however, is that the number of keywords will be very large. We could expect at least 100,000 of them, perhaps many more. For this reason, among others, the duplicated dimension approach isn't very attractive (memory & performance problems from the Quick Filter).

Any thoughts on how to capture the user provided vector and send these values into R ?

I thought I had a brilliant solution using data blending last night, but in the cold light of the morning it's broken. That leads to several more questions, because there are still multiple other possibilities:

3) Is the current performance for a Quick Filter with 100K values acceptable?

4) How many teams at a time could theoretically be selected by this intersection filter? (I'm looking for an order of magnitude here, are you talking 10, 100, 1000, 1000000, etc.?)

5) Could you be ok with a multi-step process, like choosing the filter criteria and then entering a parameter for the number of criteria selected, or choosing the filter criteria, then using those results to drive a Filter Action?

Hi Johnathan Robert, thanks for digging deeper. This one is a nutshell to crack. And I'm happy to have companions helping me to think through the puzzle pieces!

To answer Jonathan's questions:

3) Is the current performance for a Quick Filter with 100K values acceptable?

- not really

- if the INTERSECTION logic were driven from a single quick-filter (idea 3375), then perhaps

- by duplicating the keyword dimension as a "hack-ey work-around" we're now loading an extra 100k words into memory for each additional keyword in the ven diagram

- requiring the intersection at least three keywords, this means 300k+ worth of quick-filter & Tableau chokes on memory

An Available Compromise

Granted, 100k+ is a LOT of keywords. And bigger data isn't always better. So if the size of the keywords dimension were the only blocker, then as a compromise we could trim it down based on value. e.g. limit the keyword dimension to:

- Top N keywords per bocce team (~1000)

- Only keywords in the top N quantile(s) / percentile for each team

- etc

100k of them is nice to have. It is holistic, but also has a very long tail. Should the volume of keywords be the only blocker, then I can scale them back by "value-add" until a healthy compromise between performance & functionality is found.

4) How many teams at a time could theoretically be selected by this intersection filter? (I'm looking for an order of magnitude here, are you talking 10, 100, 1000, 1000000, etc.?)

At the moment, the total number of bocce teams is in the small-hundreds. So, we're starting from a list of <500 teams and, by way of intersecting keywords, seeking to narrow those down to just a handful that meet the user's criteria.

Remember the importance of "Show only Relevant Values". If the users chooses keyword 1, then we would hope & expect the options for keyword 2 to be "only relevant values". Wildcard matching is another, less attractive, but perhaps viable option.

5) Could you be ok with a multi-step process, like choosing the filter criteria and then entering a parameter for the number of criteria selected, or choosing the filter criteria, then using those results to drive a Filter Action?

Sure! As long as it works & is practical. Which means: "enter a few words & press the submit button", Yea!

But of course, anything that's kludgey or irritating will be met with resistance.

To note: the very purpose of this dashboard is to narrow the list of bocce teams. So to that end, a various parameterized filters are already in play. That is: the dashboard already has a variety of parameter-driven filters whose purpose is to narrow the list of bocce teams. The aim now is to integrate this new ability to narrow by keyword, as well.

6) What is the R code going to be doing?

Compensating for the fact that idea 3375 isn't a reality =). Without the ability to send the user's vector of keywords into R: I've not thought it through completely yet. It doesn't make sense yet to invest a ton of time until I know that I can send the user's vector of keywords into R.

Robert's blocker is a good example of something I would address with a data manipulation & programming environment like R: the ability to restructure, loop, & perform complex logic on the data.

If not R, then the another option is the JavaScript API. I lean towards R only because I'm familiar and I've not yet worked with the JSAPI. I do know that some folks are using the JSAPI as a means to escape the QuickSand (quick filter) effect on performance. So if that's the route, then I'm definitely interested to learn!

One advantage of the JSAPI: processing is done client-side in the browser.

Loosely, I envision the R-Integration would look something like this:

- send the user's vector of keywords into R

- perform the INTERSECTION logic

- return the list of bocce teams that meet the INTERSECTION criteria

Given that the Tableau-R integration works as "vector in vector out", I would likely structure the response from R as a single string which gets repeated in the response for every row in the table. I imagine that string to be composed of the "concatenated list of bocce teams which meet the user's keywords criteria". And from here, a boolean filter in the dashboard should doo the trick as follows:

[response from R] CONTAINS [bocce name]

That would be one way to go about it.

6a) The deeper question is can the R code accept Null or NA values?

Sure! If it has to. Performance wise, it seems wasteful (perhaps prohibitive ?) to pass extremely long vectors into R composed mostly of NA's. But if a few NA's exist, that's perfectly fine. The first step in R can be to simply:

na.omit([vector received from Tableau])

7) What is the goal for the final result?

The goal is precisely what we're working with in the attached bocce_teams_INTERSECTION.twbx. We want to begin with a list of <500 bocce teams and narrow that list down to just a handful, based on the INTERSECTION of keywords they have in common.

Beyond performance and practicality, the only one obligatory criteria is to integrate this new functionality into a dashboard / data source that already performs similar "narrowing down" with parameters built upon other, quantitative values.

Wow! That was a lengthy response to your questioning. And they are good questions!

Yes. Very nice work. Thank you! This is a good demonstration of what it takes to be an expert in Tableau: the secret sauce is the ability to dissect the data available, using the tools available to build new data components which themselves can be used as tools, and then building upon those to reach your objective.

i.e. "often"

- you can do anything in Tableau

- when you think like MacGyver =)

Thanks, Jonathan! I'm looking forward to catching up on my reading in the articles you've linked to, and also putting this into play on the larger data set to see how it performs. Should anything interesting come up, I will certainly let you know.

It's true. The exploring vs. composing continuum is a source of disappointment for many. They get their hopes up & are later disillusioned by how complex Tableau work can really be. You've described the scenario very acutely with your post about the letdown. It's one I've referenced often with clients, and is a scenario I've observed many times (in myself, and in others). The frustrations are real.

One note about the solution you've provided so far, I do find one small problem. The logic breaks with no keywords in the filter. When there's no filter, then we would expect the dashboard to show all teams.

But instead of going to zero, the calculation for [# of Keywords Selected] actually skyrockets upward to the total distinct count of all keywords across the data set. And this, in turn, breaks the [INTERSECTION] calculation. Result being: no teams are shown.

I'm only now taking a look to see what can be done, but it occurs to me the answer may be to produce yet another boolean to evaluate whether [# of Keywords Selected] is equal to TOTAL(COUNTD([keyword])) across the entire population (regardless of any filtering). And to then combine this new BOOLEAN with the INTERSECTION to drive the filtering.

To reach this new boolean I'm thinking of, it is necessary to maintain percent of total whilst filtering, for which I often choose the duplicated data source approach. It's do-able, but clunky. So I wonder: can you think of any other way to make the vis work correctly when there's no filter at all ?

First of all thanks a lot for this post. Its an awesome post and solution for what looks like a simple problem.

But I have a follow up problem. Lets say we also add another field to the original bocce_teams file called Category and each booce_team has a category that is its a one to one relationship. I added a column Category to the excel file. You can consider Category as one of the attributes of a team for instance.

Now I not only want to select multiple keywords ( get INTERSECTION result) but I also want to add a Category value quick filter ( select one value or multiple values ) - note that in this case I want the filter to behave like an OR - It cannot be an AND anyways since its a one to one relationship with the team.

So for the solution posted which resulted in 6 teams team for keywords 3d and awesome

it should only show me those 6 categories ( when i hit relevant values) corresponding to the teams which I obtained but it doesnt. This confuses the end user.