In the 1984 movie, The Karate Kid, a teenager named Daniel LaRusso triumphs in a local martial arts tournament after being coached by his apartment’s handyman, Mr. Miyagi.

Mr. Miyagi used a unique coaching style that developed Daniel’s fundamental skills in martial arts before teaching their context and application. Initially, this confused Daniel as he went to Mr. Miyagi’s home expecting to learn standard karate moves (e.g., how to punch), but ended up doing repetitive household chores, such as painting fences and waxing cars. The only instructions Mr. Miyagi offered pertained to the chores themselves: “Left hand, right hand.” “Up, down.” “Wax on, wax off.” “Breath in, breath out.” Although Mr. Miyagi knew his motives and felt gratified in his teachings, Daniel was frustrated. Ultimately, he lost his cool and the following conversation insued:

Daniel: “I’m being your #!$@ slave … We made a deal … You’re supposed to teach, and I’m supposed to learn … I haven’t learned a #!$@ thing.”

Mr. Miyagi: “You learn plenty.”

Daniel: “I’m going home, man!”

Employing considerable—and entertaining—artistic license, Mr. Miyagi subsequently demonstrated how the mundane chores translated to practicing karate. In a controlled setting, Mr. Miyagi punched and Daniel blocked using the arm motion from painting; Mr. Miyagi kicked and Daniel blocked again using the arm motion from waxing. It then became clear to Daniel that each chore had a purpose and taught defensive arm movements. From that point onward, Daniel trusted Mr. Miyagi’s coaching and eventually applied what he learned in a tournament to become a karate champion!

Today, many of us teach introductory statistics, or more broadly, introductory data analytics in a way that is similar to Mr. Miyagi’s coaching style. Like Mr. Miyagi, we think we are serving the students well by first teaching quantitative methods or tools for assessing data out of context. For example, we insist students master how to solve systems of equations or optimize functions before we give them an opportunity to summarize a data set. Also, the tools we use are demonstrated in a controlled setting. The data sets provided to students often come from contrived textbook problems and do not include technicalities or issues that real, modern data sets present. This means we do not require students to develop data-based, critical thinking skills, including the ability to:

Compartmentalize large problems into manageable pieces

Formulate and evaluate solutions with both quantitative and qualitative rigor

Make judgments that assimilate current information with new

Reflect upon such judgments

Yet, like Mr. Miyagi, we expect students to perform well and implement unpracticed skills when faced with tough, real-world problems.

Unlike in the movies, our students do not benefit from artistic license. Thus, many of our students miss having an “a-ha” moment in that they fail to connect classroom concepts to real-world applications. In particular, students who receive a Miyagi-like data analytics education may get caught up in mundane analytical calculations and not grasp how data analytics serves in real-world contexts. As professors of introductory data analytics classes, we can do better.

Here, we demonstrate how the use of interactive data visualizations (IDVs) may enable students to think critically about tough real-world problems before, during, and after they learn quantitative, analytical tools. IDVs rely on 1) mathematical models such as weighted multidimensional scaling (MDS), as introduced by Kruskal and Wish in 1978, to display complex, high-dimensional data sets in two dimensions and 2) methods that can reparameterize the models in response to user (e.g., student) interactions (with the current display) to create new displays. As a result, the new displays are based on data, a model, and judgments (as communicated by interactions) of students.

Crucially, when students have access to interpretable, mathematically driven displays of data and experience how the displays change in response to their judgments, the students have the potential to gain simultaneously an intuitive understanding for both the analytical methods underlying the displays and information available in the data. In other words, while students are thinking critically with data, as defined by Linda Elder and Richard Paul’s eight “Elements of Thought” (EoT) (see Figure 1), published in A Thinkers Guide to Analytic Thinking, they also are developing insight about mathematical or data analytics concepts underlying IDVs. The data analytics course we develop here exploits this “simultaneous learning” to emphasize both quantitative methods and critical thinking with data jointly. In particular, we show how one may teach critical thinking with weighted MDS using IDVs.

Figure 1. “Eight Elements of Thought” by Elder and Paul (2010)

A New Teaching Approach for Weighted MDS

Contrary to standard practice, a course that relies on IDVs may emphasize both data analytics methodology and critical thinking within the context of realistic case studies. While addressing problems in the case studies, the students experience using their current knowledge base with new, technical methods to extract information from data and master data analytics concepts. Our proposed teaching approach is similar in spirit to that described in the Change Agent for Teaching and Learning Statistics (CATALST) project. In response to the initiatives of CATALST, using model­-eliciting activities (MEAs) is suggested for teaching both statistical concepts and thinking. These activities provide open-ended research questions and satisfy requirements to enable students to develop thoughtful, transferable problemsolving skills. In effect, our case studies could be considered examples of MEAs. However, we encourage students to take advantage of data visualizations to emphasize the role of intuition, personal judgment, assessment, and reflection in every data analysis.

Consider weighted-MDS a data visualization scheme that seeks to find a low-dimensional (e.g., two-dimensional) representation or map of data that portrays how the data spread in the high-observations in the high-dimensional space is preserved in a weighted-MDS map (see Figure 2). The coordinates of the observations in the weighted-MDS map are determined by minimizing a stress function that, to some, is hard to conceptualize. Typical approaches for teaching weighted-MDS are Miyagi-like in that they first rely on explaining the abstract stress function and showing how to minimize it. Only after the students master the minimization scheme do they have an opportunity to apply weighted-MDS to a high-dimensional data set. Since the data sets often lack relevance to real-world scenarios, it is clear that the emphasis in typical teaching approaches is on only the method. Successful students are often those who memorize the weighted-MDS procedure and not necessarily those who can apply weighted-MDS effectively in nontraditional problems.

Figure 2. (Left) Three scaled dimen­sions of the Iris data set. (Right) Weighted-MDS mapping of data in top figure. Weighted-MDS preserves the rel­ative distance between observations in the high-dimensional (in this case, three­-dimensional) space. The two distant observations (denoted by red '*') in the top figure are also distant in the bottom.

In a course that relies on IDVs, the focus shifts from data analytics methodology to solving real-world problems. We recommend teaching weighted-MDS by presenting an open-ended, real-world case study and progressing through the following four phases:

Assess and explore

Methods

Implement

Reflect

During these phases, the students use IDVs to assess the case study, learn a data analytics technique (e.g., weighted-MDS, principal component analysis), implement a technique computationally, and reflect upon results and implications.

We define these phases so they correlate strongly with the EoT. The EoT is comparable to Chris Wild and Maxine Pfannkuch’s 1999 model of statistical thinking, called PPDAC, that appeared in the International Statistical Review. PPDAC is an initialism for five components of statistical thinking: Problem, Plan, Data, Analysis, and Conclusions. Relative to PPDAC, the EoT is more refined and general in that it describes eight quantitative and qualitative aspects of critical thinking that may apply to all problems, not just those with solutions that rely on statistics. By using the EoT as our compass, students have the potential to develop critical thinking skills that may transfer to decisiomaking problems outside a data analytics classroom.

Similar to statistics, data analytics is the science of discovering knowledge and gaining insight from data. However, data analytics encompasses theories and methods from many research areas, including statistics, data mining, machine learning, computer science, and data visualization.

Weighted-MDS

Weighted-MDS is an extension of multi-dimensional (MDS). Thus, to explain weighted-MDS, we start with MDS. As is weighted-MDS, MDS is a data visualization scheme that preserves pairwise distances in high-dimensional observations within a low-dimensional data representation (e.g., in two dimensions). To explain by example, consider the well-known “Iris” data set that was analyzed by Sir R. A. Fisher in 1936 (and available in R). This data set includes four continuous variables, including Sepal Length, Sepal Width, Petal Length, and Petal Width (three of these variables are plotted in Figure 2), and one categorical variable, Species. One application of this data set is to learn how iris species differentiate based on sepal and petal measurements.

MDS is not a clustering nor predictive algorithm. Rather, it is a dimension-reduction method that can enable us to plot high-dimensional data sets and identify clusters (if they are present) visually. For this example, we use MDS to reduce the dimension of the Iris data set from four to two. Let di = (di,1;…, di,p) denote each high-dimensional (in this case, four-dimensional) observation i (for i ∈; {1, …, n}, n = 150) so that D = [d1, …, dn]ʹ. MDS solves for R = [r1, …, rn]ʹ, where each ri = (ri,1, ri,2), represents a reduced, two-dimensional version of di. The solution R minimizes a stress function that calculates the difference in corresponding pairwise distances within D and R. That is, for points a and b, the low-dimensional distance between ra and rb, ||ra − rb||, approximates the high-dimensional distance between da and db, denoted δa,b. Mathematically, we write

(1)

The metric used to define δi,j is application specific and typically relies on a univariate distance function Dist(·), such as euclidean distance, so that

where,. Given a definition for δi,j, solving Equation (1) 1s an
optimization problem for which closed form expressions exist under certain mathematical constraints.

Based on w, weighted-MDS may emphasize (or de-emphasize) some dimensions in data D over others in the solution for R. When wi = wj for all {i,j} ∈; {1,…,p}, weighted-MDS and MDS solve for the same values of R. To portray the impact that specifications for w may have on data visualizations, we plot in Figure 3 three weighted-MDS maps of the Iris data that rely on different specifications for w. Figure 3a) sets each weight to 0.25; Figure 3b) sets w = [0.3, 0.4, 0.3, 0.0]; and Figure 3c) sets w = 0.2, 0.06, 0.0, 0.74]. Notice that each weight specification provides a different spatialization of the data.

Figure 3. Three weighted-MDS views of the Iris data set given different weight vectors w.

Visual to Parametric Interaction (V2PI)

Visualizations of high-dimensional data sets often rely on parametric algorithms or models to map observations to two- or three-dimensional coordinates. To change the visualization, users—such as students—may interact with the data visualization and take advantage V2PI. For example, when given a display with V2PI capabilities, students may drag observations in a low-dimensional data display to new locations and watch how the display updates in response.

V2PI is the process of quantifying display interactions to adjust parameters. Here, we describe the mathematics of V2PI within the context of weighted-MDS. Let di and ri represent high- and low-dimensional coordinates for observation i. Given a specification for w, values for R = [r1, …, rn]´ were solved from using weighted-MDS.

Suppose a student selects K observations and rearranges them. Let S represent the set of K observations selected; represent the news coordinates for observation k (k ∈ S); = {}k∈S; and = {dk}k∈S. A new weighted-MDS display of the data aims to preserve the rearranged coordinates . To do so, V2PI inverts the stress function in Equation (1) and solves for a new weight vector w* based on and ,

Given w*, weighted-MDS is applied to the entire data set D and a new display is created.

To interpret weighted-MDS displays, the important metric is relative distance between observations; arguably, the observation coordinates can be considered irrelevant. Data points that appear close in proximity (or distant) are similar (or different) in the dimensions that are weighted heavily. For example, observations in Figure 3c) separate the species fairly well. This suggests that observations within the same species are comparable in the dimensions that are weighted heavily in the display; these dimensions include Petal Width (w4 = 0.74) and Sepal Length (w1 = 0.2).

Since the interpretation of weighted-MDS displays is straightforward, students do not need to understand the technicalities of weighted-MDS to use the displays and tackle tough data-driven problems. Additionally, if we make weighted-MDS displays interactive, we can let students change the weights, assess the data from different perspectives, and discover weight specifications that result in revealing structure in the data. To change the weights, students could specify them directly. However, for high-dimensional data, manual adjustments to parameters can be cumbersome or confusing. How would a student know which weights to adjust in the presence of, say, 100 variables?

We developed a method to interpret certain data display adjustments as suggestions for reweighting variables. Namely, if students move observations together or apart, the variables for which these observations are similar or different, respectively, are up-weighted. We refer to the process of quantifying display interactions to adjust parameters as visual to parametric interaction (V2PI). The mathematics we use for V2PI within the context of weighted-MDS is provided in “Visual to Parametric Interaction (V2PI).” We refer to a display that relies on an interactive form of weighted-MDS as an “IDV based on weighted-MDS.”

An important point to make is that, by using V2PI methods, students (again) do not need to understand mathematical technicalities to assess data from different perspectives that are based on weighted-MDS. Rather, students can explore the data based on conjectures they make about the similarities and differences among a subset of observations. Additionally, V2PI can serve as a motivator to learn the limitations of static or deterministic data summary methods.

Critical Thinking with Weighted-MDS and IDVs

As we mentioned previously, IDVs enable us to shift the focus from data analytics methodology to solving real-world case studies. In doing so, we can alleviate student frustration, motivate students, and provide realistic practice for students within the classroom. In this section, we develop a four-phase weighted-MDS unit based on the following case study from Endert et al. [2011]:

To construct informed economic, health, and educational policies, the U.S. Census Bureau attempts to survey every individual living within the United States. We have access to a subset of the 1990 census that includes 2.5 million observations and p = 68 features (i.e., variables), including salary, education, marital status, employment status, occupation, family details, driving patterns, etc. The U.S. president (in 1992) would like to implement policy that will help those with low socio-economic status. What would you (the students) recommend? Use census data to support the recommendations.

Across the phases, we address the elements of EoT, in addition to the mathematics and computation of weighted-MDS. We highlight the elements of EoT at the end of each phase description and provide Table 1 to bullet the phase objectives, categorize the objectives as either quantitative and qualitative aspects of problemsolving, and state (again) which elements of EoT are covered.

Table 1. Lessons to Teach MDS Group Into Four Phases That Cover One or More of the Eight Elements of Thinking (Eot)

Phase I. Assess and Explore the Data

The way in which the case study is phrased suggests there are multiple recommendations for the president. Thus, Phase I requires that the students 1) state in their words the goal of their endeavors, 2) hypothesize what they will learn from the data, and 3) explore the data. For the exploration, the students may look directly at an Excel file that contains the data, use quantitative methods they currently know to summarize the data, and assess the data visually using weighted-MDS.

Figure 4. a) provides an initial MDS (weighted-MDS with equal weights) view of the census data. The circles were added to draw attention to three clusters. b) shows that students can mark observations based on ranges of salary ('X' and '◻' show observations with salaries that are 'less that $15k' and 'within $30k and $60k,' respectively) and drag observations (denoted by arrows) to inject feedback into visualizations. In response to the feedback, V2PI reconfigures the data. c) displays the reconfiguration. We added a dotted line to show that the marked observations from b) separate. In d), we add circles to reference four clusters of interest.

Figure 4a plots an initial MDS (weighted-MDS with equal weights) display of a random sample (n = 3000) from the census data set. During Phase I, we do not explain the quantitative method used to display the data. Rather, we provide information about how to interpret and use Figure 4a to explore the data. In this case, each data point in Figure 4 represents an individual’s completed survey. Although the axes of the visualization do not have an explicit physical meaning, the distance between any pair of surveys conveys the degree to which they are similar (e.g., surveys that appear in clusters are more similiar to one another (according to the 68 data features) than surveys that appear in different clusters). However, the display, as currently plotted, does not convey how the surveys differ. That is, the mathematical method (MDS) used to create Figure 4a weighted the data features equally. Thus, to learn the features that differentiate the surveys, the students must explore the data and interact with the display (e.g., students may highlight observations according to requested criteria and/or change the perspective of the visualization by taking advantage of the display’s interactive machinery, V2PI).

For example, suppose some students focus on the word “socioeconomic” in the case study description and want to learn whether there are features that correlate with the variable salary. Given the obvious structure in Figure 4a, these students might first identify three clusters and use highlighting to discover that Group 1 represents surveys from working-class people, Group 2 includes surveys from unemployed adults, and Group 3 includes surveys from adults under 20 years of age. Since none of the clusters are based purely on salary, the students may next highlight surveys based on two salary ranges: “less than $15k” or “within $30k and $60k.” Figure 4b marks the surveys with the respective salary ranges by “X” or “◻.” The marked observations do not present a clear clustering structure. This means the display does not rely heavily on salary to differentiate observations. To change the perspective of the display and up-weight the role of salary in the display, the students may drag the marked observations from each group apart. (The arrows in Figure 4b depict dragging.) Using V2PI, the visualization reconfigures, as shown in Figure 4c. Now, the data appear in several small clusters and salary, in part, explains the spatialization of the clusters. We add a line to Figure 4c to show that the marked observations from Figure 4b separate perfectly; those above and below the line have surveys with salaries within $30k and $60k and less than $15k, respectively.

One advantage of using the IDVs based on weighted-MDS is that, unlike Figure 4a, Figure 4c weights some data features more than others in response to the students’ feedback in Figure 4b. The data features with the highest weights are the following: Salary (0.29), Have a reliable form of transportation to work (0.20), Whether or not employed (0.25), and Years of education (0.10). With this information, students may assess which variables work jointly with salary to create the observed cluster structure in Figure 4c. In particular, students may mark observations in Figure 4c to discover that 1) all observations for which r1 < −0.2 represent employed individuals, 2) clusters 1 and 2 include individuals who make within $30k and $60k, but do or do not have reliable modes of transportation to work, 3) clusters 3 and 4 include individuals who make less than $15k and either drive themselves to work or take public transportation, respectively. Now, students may conjecture that people with low incomes need transportation assistance.

We expect students to make several conjectures about the data based on their visual explorations. The students report their findings in journals and, at the end of phase I, during oral presentations. In the next phase, the students learn the mathematical and computational methods driving the visualization. An understanding of these methods may (or may not) affect their interpretations of the data.

EoT #1-5: The students assess their points of view, state the goal, and ask questions; gain an appreciation for the need of information/data to address questions; and interpret data visualizations to infer relationships in the data.

Phase II. Learn Mathematical Methods

Phase I does not require students to master mathematical concepts for data exploration. Now, in the second phase, students learn the mathematical theory of MDS and its constraints. Students complete standard problem sets to reinforce the mathematical concepts. At the conclusion of the phase, students conjecture and formulate mathematically how displays based on MDS may change, given changes in its theory.

EoT # 5,6,7: The students learn the mathematical formulations of visualizations that rely on assumptions and result in interpretations that may lead to inference.

Phase III. Implement Computation

In Phase I, students use software that implements the V2PI machinery based on the mathematics of Phase II. Now, the students program one or more modules within the software to reimplement V2PI. The software is coded in a way that includes self-contained modules which, when removed, can be replaced by code created by students. By replacing modules, students are shielded from high-level coding. The modules that the students will replace include those that 1) read large high-dimensional data sets and 2) solve for coordinates R using a variety of techniques.

Since some students may not have computer programming in their backgrounds, computer lab assignments are important and Phase III may last longer than other phases. Note that those experiencing programming for the first time have the benefit of a clear motivation to learn tedious (arguably), fundamental concepts, including, variable initialization, if/then statements, and loops.

EoT #5,6: Phase III reinforces the importance of summarizing and interpreting data using mathematical and computational concepts and models.

Phase IV. Reflect

Now that the students have explored the data, learned the mathematics of MDS, and programmed it, they have an opportunity to assess both the technical methods used to visualize the data and their personal thoughts while assessing and interpreting information in the data.

In regard to methods, the students experience in Phase I the need to adjust data displays, but only learn during phases II and III a deterministic approach for summarizing data. Thus, in Phase IV, the students hypothesize, formulate mathematically, and implement how the visualization can adjust to their data interactions. Effectively, the students construct an understanding of weighted-MDS and implement it by replacing the appropriate module. With the right guidance from professors, students may realize the dimensions for which the observations are similar or different, respectively, are more important than the remaining dimensions when they drag observations together or apart; the weights of the important dimensions (as determined by the dragging) should be higher than the remaining weights.

During Phase IV, students also reflect upon what they gained from the data. They address the goals of the case study, state whether they validated their hypotheses or corrected any misconceptions, and discuss any personal or analytical constraints. At the conclusion of Phase IV, students share their reflections and present their findings during an oral presentation and within a paper.

EoT #6,7,8,1: The students 1) evaluate the model and its interpretation given certain assumptions and 2) reflect upon implications (based on their points of view) of what they learned from the data and the role data served in making recommendations to the president.

Discussion

At the end of the fourth phase, students will have not only obtained the mathematical skills emphasized by traditional—or Miyagi—methods for teaching, but also the practice of applying weighted-MDS in both contrived and realistic scenarios. For some students, this approach for teaching data analytics will dramatically affect their understanding of weighted-MDS.

Of course, data analytics classes should include other technical methods, in addition to weighted-MDS. We envision teaching at least four modules (with the same phases) during one semester-long, undergraduate data analytics course. The additional modules may rely on data analytics techniques that are preferable to the instructor, but crucially, an IDV is needed for each technique chosen. In work from Leman et al. [2011] and House et al. [2011], V2PI has been developed for principal component analysis (PCA), mixture PCA, and isomap. None of these data analytics approaches is ideal for instructors, so we encourage instructors to develop their own V2PI method; V2PI is not specific to the data analytics techniques mentioned.

V2PI is, broadly, a process to consider for quantifying feedback in visualizations that may update model parameters. In fact, a mapping of the process is included within Figure 5: Step 1) Model or summarize the data quantitatively based on estimates unknowns θ; Step 2) Display the summary in a visualization v; Step 3) Prompt students (or users) to assess v and adjust it as desired; Step 4) Parameterize the adjustments; and Step 5) Update the original data summary to repeat the loop. We refer to the adjustments in Step 3) as “cognitive feedback” ƒ(c) in that it represents visually what the students think. Whereas “parametric feedback” ƒ(p) is a quantified version of ƒ(c) that transforms it to the parametric space of the data and enables model updating in Step 5).

Figure 5. The V2PI or BaVA process

Additionally, a data analytics course with IDVs may include probabilistic data analysis techniques as well. If the original data summarizing method in Step 1) is probabilistic, it is possible to parameterize feedback and update the model while maintaining the model’s probabilistic integrity. To differentiate probabilistic from deterministic versions of V2PI, House et al. in their 2011 technical report refer to the former as Bayesian visual analytic (BaVA) methods.

Conclusion

Using data analytics (e.g., statistics) as a platform to emphasize critical thinking is not a new idea. However, the way by which we propose to integrate critical thinking with complex mathematical and computational methods is new and similar in spirit to ideas from CATALST. This article discusses a way to reconsider the “wax on, wax off” teaching style invoked in many data analytics courses so that students may practice and develop skills in the classroom that are directly applicable to realistic scenarios. Students have opportunities in the data analytics course that we propose to tackle tough problems while they develop insight and master mathematical data summarizing techniques. In particular, we use IDVs as instructional tools for students to construct their understanding of 1) how to think critically, 2) the role of data in critical thinking, and 3) the mathematical and computational methods needed to summarize high-dimensional data. Based on the Census case study, we exemplified one approach for students to construct their understanding of thinking critically with data and the utility of weighted-MDS.

Because each course module begins with a case study, students have a clear purpose for learning data analytics. Unlike Daniel in the Karate Kid, students—from the beginning—have the potential to assess how each lesson fits into a larger scheme of learning from data. They are motivated by an interesting problem and may avoid the frustration that Daniel experienced when he did not understand the purpose of the household chores. With this in mind, we hope that students not only conclude our data analytics course enlightened by technical methods and critical thinking skills, but also with a level of satisfaction that will inspire them to continue their education in data analytics.