Project Wrap-Up

Chris has published his write-up about the project here. For my retrospective, I will highlight aspects of the data pipeline, the tool sets, and the collaboration.

Vectorization

Various pre-compute steps were executed independently within the overall workflow for each topic:

keyword parsing (Python)

keyword scoring (Python)

network coordinates generation (R)

network centrality measurements (R)

orchestration & data reshaping (Alteryx)

So, with 28 topics, you can imagine that I didn't want to run these five steps manually, for each topic on every data refresh! So vectorizing these individual components inside of the overarching workflow was important for automation.

Multi Disciplinary

Making use of four tools, Python > Alteryx > R > Tableau, our pipeline was rather sophisticated.

Each tool has an inherent strength, and it follows naturally that all four analytics environments had

Detailing Twitter mentions from across four years of the annual Tableau Conference, in a collection of 45 interactive network graphs, this project is published in close collaboration with Chris DeMartini. He is also presenting a curated collection of his beautiful hive plots from the same data.

Bringing It Together

My interest in the analysis of network graphs first piqued while studying in Stanford Online MOOC, Social and Economic Networks: Models and Analysis. A graduate level course intensive in math and theory, it was challenging; and also left me wanting for real world application of the concepts I had learned.

Bringing together my recent studies in R, Alteryx and Tableau, this project is that application.

If public data from Twitter is perhaps relatively benign? Then consider the power of enabling visual exploration of other more highly valued network

This post imbues the importance of innovation with color in data visualization, offers a variety of resources and reference materials, and encourages personal innovation with color as absolutely vital to moving your visual communication of data forward in Tableau.

Emotion and Behavior

The effective use of color is fundamental to the visual communication of data.

As our eyes take in color, they communicate with the hypothalamus, which in turn signals the pituitary gland. Then, on to the endocrine system, the thyroid gland signals the release of hormones. Those hormones influence BEHAVIORS and EMOTIONS. Color is so powerful, in fact, that the effective use of color can improve learning by 75% and increase comprehension by up to 73%. 1

Important as these basics are, now is the time to move our conversation beyond the entry level. Now is the time to dramatically expand our thinking around color.

With behavior change, comprehension, and augmented decisioning as the purpose of data visualization, and Tableau as our tool of choice in the field, then we as visualization authors must become more sophisticated

This post builds upon the theme of designing a performant data architecture for your high volume solutions in Tableau.

One core performance concept is that good design considers the entire solution stack. If you fail to design for performance at all vertical levels, then the worst performing layer will make the solution slow. A train is only as fast as the slowest car. And worse, if various layers have design problems, then your train likely isn’t moving at all.

We must consider the entire vertical solution, together as a holistic system, from the top to the bottom. And this design investment is best made at the outset. To focus performance efforts at only a single layer or to return to a poor performing system in hindsight in search of "one thing" to fix is insufficient.

Of the various layers in the typical solution stack, this post is focused on two: User Interface Design and Semantic Data Architecture.

Some of my favorite mobile apps like Slack, Feedly and Google Maps have a slide-out menu that appears when I tap a small icon. That common design element makes plenty of room for user inputs and gets them out of the way when you're done - perfect for small screens.

To elaborate further on that concept, in this note today I explain how we can leverage the idea of a "dynamic and collapsable menu” to tackle some additional, rather complex data design challenges.

Why Multiple Data Sources?

First off, why would we deliberately use multiple data sources in a single dashboard?

Well, on large data volumes, for performance! In fact as your data volume grows large, Data Architecture decisions like this one quickly become imperative.

All of the 2014 conference materials are an excellent resource. There are ten different talks with the keyword “blending", and my Tableau Conference Television makes it easy to find what you’re looking for.

So now, on with the show!

Slide Projector

As an analogy, think of Tableau as a slide projector for your data where each Tableau Data Source is a slide.

Born from a hackathon among Tableau’s engineers, Data Blending is indeed a clever hack! It allows us to place more than one slide into the projector at once :)

Starting in version 8, "Data Blending 2” also allows us to manually turn off & on the linking fields, regardless of whether those fields are utilized in the view. The difference

The most interesting thing about each of the scenarios in Jonathan's collection is not their individual solution in isolation. But rather, the underlying pattern behind those solutions: the what they share in common.

And when Joe Mako helps me get through something on a Sunday afternoon, you know the answer is worth sharing!

The number one, most important facet of learning Tableau, and learning from Joe, is to recognize the patterns that recur. By recognizing common patterns when working with data, and by learning the behaviors of Tableau, one learns to reach a flow state with similar encounters in the future, even while the details may vary.

In this post I make the logical argument for Alteryx to evolve their pricing model towards freemium.

Two years ago when I first came across Alteryx at a meet-up in their San Francisco office space, the product wasn’t as mature as I find it today. And returning just now from the Inspire 2015 conference in Boston, I'm quite pleased by both the scope & pace of recent developments, as well as the future product roadmap.

During these past two years, my Twittersphere has also been increasingly abuzz about Alteryx. In fact, given the strong endorsement it receives from people who’s technical opinion I rely upon, Alteryx is a tool that I would have tried again by now, if not for the entry price.

Below I will argue that tens of thousands of data workers exist in the world, just like myself, who are each potential Alteryx customers, but who will never try the tool in earnest until they have access to a more gradual on-ramp in terms of free & low cost pricing for simplified versions of the tool.

Back of the Napkin

To examine today’s pricing model with some napkin calcs, if we assume that a

A tweet recently arrived from Pam Gidwani, to let me know that now everyone can make use of my Tableau Conference Television vis. Thank you Tableau, for making all of this amazing knowledge from TC14 available to the public!

The gist of the finder problem goes like this: in TC14-TV, for example, each session recording has multiple keywords.

And we sometimes want for a multi-select quick filter to find the intersection between the chosen keyword values. A multi-select quick filter in Tableau normally finds the union.

A Series of Calculations

To get there, Jonathan taught me to think in terms of logical building block calculations:

# Keywords for Session

Of the various keywords chosen, how many of those exist for each conference session?

# Keywords Selected

In total, how many keywords have been chosen in the quick-filter?

I've wrapped the total calculation in a PREVIOUS_VALUE() wrapper to improve performance. From number 10 in The Next N Table Calcs, this works because all rows will print the value from a single computation.

(PS - though, if I'm not mistaken, I understand now that TOTAL() behaves differently from most other table calcs & is computed only once, anyway :)

Keyword Intersection Filter

Now comes the good stuff. When filtering for the intersection, we only want conference sessions for which the

[# Keywords for Session] >= [# Keywords Selected]

AND / OR

A parameter decides whether to use AND vs. OR logic. And there's a bit of an edge-case workaround, to help the intersection logic behave correctly when none of the conference sessions contain all of the keywords you're searching for.

That part about working around the edge-case is described in detail with my final update to our forum conversation.

The Finder Concept

In addition to highlighting the intersection logic, I really hope this post helps to illuminate the useful concept of a finder dashboard.

In short, a series of quick filters can help you to "find" the widgets you're searching for. And then, from that reduced list which match your criteria, the dashboard filter actions then bring other sheets into view.