Learning to Swim in the Data Deluge

Main menu

Monthly Archives: June 2013

Technology Innovations in Statistics Education has published a paper by Noleine Fitzallen that I think many readers of this blog will find interesting. She examines a group of young students to see the ways that they use Tinkerplots to analyze data. Here’s the abstract and link:

Exploration of the way in which students interacted with the software package, TinkerPlots Dynamic Data Exploration, to answer questions about a data set using different forms of graphical representations, revealed that the students used three dominant strategies – Snatch and Grab, Proceed and Falter, and Explore and Complete. The participants in the study were 12 year 5-and-6 students (11-12 years old) who completed data analysis activities and answered questions about the data analysis process undertaken. The data for the inquiry were collected by on-screen capture video as the students worked at the computer with TinkerPlots. Thematic analysis was used to explore the data to determine the students’ strategies when conducting data analysis within the software environment.

Share this:

It just seems to me that this is what data science was meant to do: give us fun toys. This particular “toy”, called Every Noise at Once, lets you explore the musical universe. Ours may be the last blog to comment on it–I think I stumbled upon this too late. But it provides a great example for our students about the power of data analysis.

The data come from the company EchoNest, and the visualization (although that’s a weak word for this—its visual and aural, maybe “visauralization”?) from their chief engineer Glenn McDonald. According to McDonald’s blog (via www.furia.com, on May 31 2013), songs are depicted in a 10-dimensional space, reduced to two dimensions here. The dimension are: vertical: “organic” (on the bottom) to “mechanical” on top. I love the designation of “organic”, which says so much more than “acoustic”. The horizontal axis is a what McDonald calls “bounciness”, with songs on the right being bouncier than songs on the left.

The joy of this visuaralization is that it is interactive. Click on a genre and hear a representative sample. Click on the “>>” symbol next to the genre label, and it expands to show you practitioners of the genre.

I suppose part of me feels that music has too many labels. This graph gives this point of view some support. And yet, I confess, it was quite satisfying to learn that there is a difference between “indie pop” and “indie rock”. (Both are roughly equally bouncy, but pop is more mechanical.) “String Quartet” is its own genre, and if you double click, you see the names of actual string quartets. The Takacs Quartet is apparently more mechanical than the borodin quartet. The only recording of Takacs I have is of the Bartok quartets, and so I guess this makes sense. Still, string quartets consist of four stringed instruments, and so I suppose the scale of the variation here must be quite small. A mechanical string quartet is, I suppose, one that amps its strings: I couldn’t find the Kronos Quartet, which I looked for somewhere in the upper-right quadrant. Nor could I find my LA-based favorites the Calder Quartet, which I would expect to fall somewhere in the center-right of the graph.

Dimension reduction in all its many forms is an important part of the visualization world. Which raises the question: when do we teach this to our students? Can it be taught, in some form, in introductory statistics? These questions seem related to one of my pet peeves, namely that we don’t teach statistics students how to interpret maps. Maps are, today, summaries of data. Most are quite crude, but students should learn to be critical (in the constructive sense) of data maps. Is there a data-mapping framework that would allow us to teach how to be critical of heat maps, google-type maps, traffic maps, and maps of musical genres?

Share this:

You can win $1000 for turning your Ph.D. thesis into an interpretive dance. More importantly, you will also receive a call-out from Science and get to perform your dance at TEDX in Belgium. This contest is not only open to more recent Ph.D.s, but anyone who got a Ph.D. (in the sciences) and also to students working on a Ph.D.

Gonzolabs has tips and examples over on their website. So put on your dancing shoes, grad your Ph.D. advisor and do-si-do. Now, if I can only get the Jabbawockeez and figure out what a mixed-effects model looks like as a dance…

Share this:

Ever since we wrote an article in which we analyzed the articles which were been published in the Statistics Education Research Journal (Zieffler et al., 2011), I have been thinking about the relationships within the network of literature published on statistics education. What are the pivotal articles? Which are foundational? How inter-connected are the articles?

This spring I started documenting those relationships by putting together a social network of articles published in Technology Innovations in Statistics Education and the articles they referenced. I just finished that work and used Gephi to produce a couple network plots.

The first network graph (shown above) examines the community structure of the network by decomposing the network into sub-networks, or communities. I have made the nodes for the actual TISE articles larger for visual ease of interpretation. The node labels are the first author’s last name and year of publication. Currently (and not unsurprisingly), the subnetworks generally consist of the actual article published in TISE and the literature that was referenced therein. There are some commonalities between articles as well. For example, the two articles by McDaniel were identified as a single community. It will be interesting to see how these communities change as I add more literature into the network.

The second network graph has the size of the node and node label sized by in-degree. In this case, in-degree is a measure of how often a particular article was referenced. The most cited literature in TISE is:

Share this:

I just finished reading An Accidental Statistician: The Life and Memories of George E. P. Box. The book reads like he is recounting his memories (it is aptly named) rather than as a biography. I enjoyed the stories and vignettes of his work and his intersections with other statisticians. The book also included pictures of many famous statisticians (George’s friends and family—Fisher was his father-in-law for a bit) in social situations. My favorite was the picture of Dr. Frank Wilcoxon on his motorcycle (see below).

There were some very interesting and funny anecdotes. For example, when George recounted a trip to Israel, he was told to get to the airport very early because of the intense security measures. After standing in a non-moving line for several hours, he apparently quipped that he had never before physically seen a stationary process.

My favorite sections of the book were the stories he told of writing Statistics for Experimenters, his book—along with William (Bill) Hunter and Stu Hunter—on experimental design. He wrote about how the book evolved from mimeographed notes for a course he had taught to the published version. It took several years for them to finish the writing of the book, only to be met with horrible reviews. (Note: This makes me feel slightly better about the year it took to write our book.)

In a chapter written about Bill Hunter (who was one of George’s graduate students at the University of Wisconsin), George relates that Bill started his PhD in 1960. After he finished (in 1963!) he was hired almost immediately by Wisconsin as an assistant professor. Three years later he was made associate professor, and in 1969 (eight years after he started his PhD) he was made full professor. Unbelievable!