Pedantic rants about the use and misuse of language are a lot of fun. We all have our soap boxes, and I strongly encourage everyone to hop on theirs from time to time. But when we enter into conversations around the use and misuse of jargon, we must always keep two things in mind: (1) conceptual boundaries are fuzzy, particularly when common terms are used across different disciplines, and (2) our conceptual commitments have serious consequences for how we perceive the world.

Tim McKay recently wrote a blog post called Hey vendors! Stop calling what you’re selling colleges and universities “Predictive Analytics”. In this piece, Mckay does two things. First, he tries to strongly distinguish the kind of ‘predictive analytics’ work done by vendors from the kind of ‘real’ prediction that is done within his own native discipline, which is astronomy. Second, on the basis of this distinction, he asserts that what analytics companies are calling ‘predictive analytics’ are actually not predictive at all. All of this is to imply what he later says explicitly in a tweet to Mike Sharkey: the language of prediction in higher ed analytics is less about helpfully describing the function of a particular tool, and more about marketing.

Thanks for pitching in Mike. We agree about most things, just not about the term – good marketing, not what I call it when I do this work…

What I’d like to do here is to unpack Tim’s claims, and in so doing, soften the kind of strong antagonism that he erects between vendors and the rest of the academy, which is not particularly productive as vendors, higher educational institutions, government, and others seek to work together to promote student success, both in the US and abroad.

What is predictive analytics?

A hermeneutic approach

Let’s begin with defining analytics. Analytics is simply the visual display of quantitative information in support of human decision-making. That’s it. In practice, we see the broad category of analytics sub-divided in a wide variety of ways: by domain (i.e. website analytics), by content area (i.e., learning analytics, supply chain analytics), by intent (i.e., in the case of the common distinction between descriptive, predictive, and prescriptive analytics).

Looking specifically at predictive analytics, it is important not to take the term out of context. In the world of analytics, the term ‘predictive’ always refers to intent. Since analytics is always in the service of human decision-making, it always involves factors that are subject to change on the basis on human activity. Hence, ‘predictive analytics’ involves the desire to anticipate and represent some likely future outcome that is subject to change on the basis on human intervention. When considering the term ‘predictive analytics,’ then, it is important not to consider ‘predictive’ in a vacuum, separate from related terms (descriptive and prescriptive) and the concept of analytics, of which predictive analytics is a type. Pulling a specialized term out of one domain and evaluating it on the terms of another is unfair and is only possible under the presumption that language is static and ontologically bound to specific things.

So, when Tim McKay talks about scientific prediction and complains that predictive analytics do not live up to the rigorous standards of the former, he is absolutely right. But he is right because the language of prediction is deployed in two very different ways. In McKay’s view, scientific prediction involves applying one’s knowledge of governing rules to determine some future state of affairs with a high degree of confidence. In contrast, predictive analytics involves creating a mathematical model that anticipates a likely state of affairs based on observable quantitative patterns in a way that makes no claim to understanding how the world works. Scientific prediction, in McKay’s view, involves an effort to anticipate events that cannot be changed. Predictive analytics involves events that can be changed, and in many cases should be changed.

The distinction that McKay notes is indeed incredibly important. But, unlike McKay, I’m not particularly bothered by the existence of this kind of ambiguity in language. I’m also not particularly prone to lay blame for this kind of ambiguity at the feet of marketers, but I’ll address this later.

An Epistemological Approach

One approach to dealing with the disconnect between scientific prediction and predictive analytics is to admit that there is a degree of ambiguity in the term ‘prediction,’ to adopt a hermeneutic approach, and be clear that the term is simply being deployed relative to a different set of assumption. In other words, science and analytics are both right.

Another approach, however, might involve looking more carefully at the term ‘prediction’ itself and reconciling science and analytics by acknowledging that the difference is a matter of degree, and that they are both equally legitimate (and illegitimate) in their respective claims to the term.

McKay is actually really careful in the way that he describes scientific prediction. To paraphrase, scientific prediction involves (1) accurate information about a state of affairs (ex., the solar system), and (2) an understanding of the rules that govern changes in that state of affairs (ex., laws of gravity, etc). As McKay acknowledges, both our measurements and understanding of the rules of the universe are imperfect and subject to error, but when it comes to something like predicting an eclipse, the information we have is good enough that he is willing to “bet you literally anything in my control that this will happen – my car, my house, my life savings, even my cat. Really. And I’m prepared to settle up on August 22nd.”

Scientific prediction is inductive. It involves the creation of models that adequately describe past states of affairs, an assumption that the future will behave in very much the same way as the past, and some claim about a future event. It’s a systematic way of learning from experience. McKay implies that explanatory scientific models are the same as the ‘rules that govern,’ but I feel like his admission that ‘Newton’s law of gravity is imperfect but quite adequate’ admits that they are not in fact the same. Our models might adequate rules, but the rules themselves are eternally out of our reach (a philosophical point that has been born out time and time again in the history of science).

Scientific prediction involves the creation of a good enough model that, in spite of errors in measurement and assuming that the patterns of the past will persist into the future, we are able to predict something like a solar eclipse with an incredibly high degree of probability. What if I hated eclipses. What if they really ground my gears. If I had enough time, money, and expertise, might it not be possible for me to…

…wait for it…

…build a GIANT LASER and DESTROY THE MOON?!

Based on my experience as an arm-chair science fiction movie buff, I think the answer is yes.

How is this fundamentally different from how predictive analytics works? Predictive analytics involves the creation of mathematical models based on past states of affairs, an admission that models are inherently incomplete and subject to error in measurement, an assumption that the future will behave in ways very similar to the past, and an acknowledgement that predicted future states of affairs might change with human (or extraterrestrial) intervention. Are the models used to power predictive analytics in higher education as accurate as those we have to predict a lunar eclipse? Certainly not. Is the data collected to produce predictive models of student success free from error? Hardly. But these are differences in degree rather than differences in the thing itself. By this logic, both predictive analytics and scientific prediction function in the exact same way. The only difference is that the social world is way more complex than the astrological world.

So, if scientific predictions are predictive, then student risk predictions are predictive as well. The latter might not be as accurate as the former, but the process and assumptions are identical for both.

An admission

It is unfortunate that, even as he grumbles about how the term ‘predictive’ is used in higher education analytics, McKay doesn’t offer a better alternative.

I’ll admit at this point that, with McKay, I don’t love the term ‘predictive.’ I feel like it is either too strong (in that it assumes some kind of god-like vision into the future) or too weak (in that it is used so widely in common speech and across disciplines that it ceases to have a specific meaning. With Nate Silver, I much prefer the term ‘forecast,’ especially in higher education.

In the Signal and the Noise, Silver notes that the terms ‘prediction’ and ‘forecast’ are used differently in different fields of study, and often interchangeably. In seismology, however, the two terms have very specific meanings: “A prediction is a definitive and specific statement about when and where an earthquake will strike: a major earthquake will hit Kyoto, Japan on June 28…whereas a forecast is a probabilistic statement usually over a longer time scale: there is a 60 percent chance of an earthquake in Southern California over the next thirty years.

There are two things to highlight in Silver’s discussion. First, the term ‘prediction’ is used differently and with varying degrees of rigor depending on the discipline. Second, if we really want to make a distinction, then what we call prediction in higher ed analytics should really be called forecasting. In principle, I like this a lot. When we produce a predictive model of student success, we are forecasting, because we are anticipating an outcome with a known degree of probability. When we take these forecasts and visualize them for the purpose of informing decisions, are we doing ‘forecasting analytics’? ‘forecastive analytics’? ‘forecast analytics’? I can’t actually think of a related term that I’d like to use on a regular basis. Acknowledging that no discipline owns the definition of ‘prediction,’ I’d far rather preserve the term ‘predictive analytics’ in higher education since it both rolls off the tongue, and already has significant momentum within the domain.

Is ‘predictive analytics’ a marketing gimick?

Those who have read my book will know that I like conceptual history. When we look at the history of the concept of prediction, we find that it has Latin roots and significantly predates the scientific revolution. Quoting Silver again:

The words predict and forecast are largely used interchangeably today, but in Shakespeare’s time, they meant different things. A prediction was what a soothsayer told you […]

The term forecast came from English’s Germanic roots, unlike predict which is from Latin. Forecasting reflected the new Protestant worldliness rather than the otherwordliness of the Holy Roman Empire. Making a forecast typically implied planning under conditions of uncertainty. It suggested having prudence,
wisdom, and industriousness, more like the way we currently use the word foresight.

The term ‘prediction’ has a long and varied history. It’s meaning is slippery. But what I like about Silver’s summary of the term’s origins is that it essentially takes it off the table for everyone except those who who presume a kind of privileged access to the divine. In other words, using the language of prediction might actually be pretty arrogant, regardless of your field of study, since it presumes to have both complete information and an accurate understanding of the rules that govern the universe. Prediction is an activity reserved for gods, not men.

Digressions aside, the greatest issue that I have with McKay’s piece is that it uses the term ‘prediction’ as a site of antagonism between vendors and the academy. If we bracket all that has been said, and for a second accept McKay’s strong definition of ‘prediction,’ it is easy to demonstrate that vendors are not the only ones misusing the term ‘predictive analytics’ in higher education. Siemens and Baker deploy the term in their preface to the Cambridge Handbook of the Learning Sciences. Manuela Ekowo and Iris Palmer from New America comfortably makes use of the term in their recent policy paper on The Promise and Peril of Predictive Analytics in Higher Education. EDUCAUSE actively encourages the adoption of the term ‘predictive analytics’ through large numbers of publications including the Sept/Oct 2016 edition of the EDUCAUSE Review, which was dedicated entirely to the topic. The term appears in the ‘Journal of Learning Analytics,’ and is used in the first edition of the Handbook of Learning Analytics published by the Society of Learning Analytics Research (SoLAR). University administrators use the term. Government officials use the term. The examples are too numerous to cite (a search for “predictive analytics in higher education” in google scholar yields about 58,700 results). If we want to establish the true definition of ‘prediction’ and judge every use by this gold standard, then it is not simply educational technology vendors who should be charged with misuse. If there is a problem with how people are using the term, it is not a vendor problem: it is a problem of language, and of culture.

I began this essay by stating that we need to keep two things in mind when we enter into conversations about conceptual distinctions: (1) conceptual boundaries are fuzzy, particularly when common terms are used across different disciplines, and (2) our conceptual commitments have serious consequences for how we perceive the world. By now, I hope that I have demonstrated that the term ‘prediction’ is used in a wide variety of ways depending on context and intention. That’s not a bad thing. That’s just language. A serious consequence of McKay’s discussion of how ed tech vendors use the term ‘predictive analytics is that it tacitly pits vendors against the interests of higher education — and of students — more generally. Not only is such a sweeping implication unfair, but it is also unproductive. It is the shared task of colleges, universities, vendors, government, not-for-profits, and others to work together in support of the success of students in the 21st century. The language of student success is coalescing in such a way as to make possible a common vision and concerted action around a set of shared goals. The term ‘predictive analytics’ is just one of many evolving terms that make up our contemporary student success vocabulary, and is evidence of an important paradigm shift in how we view higher education in the US. Instead of quibbling about the ‘right’ use of language, we should instead recognize that language is shaped by values, and so work together to ensure that the words we use reflect the kinds of outcome we collectively wish to bring about.

Of these students, only 11% earn a bachelors degree in under 6 years. That’s compared to the rest of the population, which sees students graduate at a national rate of 55%. What this means is that 89% of first generation, low income students stop out, perpetuating a widespread pattern of socio-economic inequality.

I’ve been thinking a lot recently about product as praxis. Without putting too much conceptual weight behind the term ‘praxis,’ what I mean is merely that educational technologies are not just developed in order to change behavior. Ed tech embodies values and beliefs (often latent) about what humans are and should be, about what teaching and learning are, and about the role that institutions should play in guiding the development of subjectivity. As valued, educational technology also has the power to shape, not just our practices, but also how we think.

When thought of as praxis, product development carries with it a huge burden. Acknowledging that technology has the power (and the intention) to shape thought and action, the task of creating an academic technology becomes a fundamentally ethical exercise.

Vendors are not merely responsible for meeting the demands of the market. ‘The market’ is famously bad at understanding what is best for it. Instead, vendors are responsible for meeting the needs of educators. It is important for vendors to think carefully about their own pedagogical assumptions. It is important for them to be explicit about how those assumptions shape product development. The product team at Blackboard (of which I am a part), for example, is committed to values like transparency and interoperability. We are committed to an approach to learning analytics that seeks to amplify the power existing human capabilities rather than exclude them from the process (the value of augmentation over automation). These values are not shared by everyone in educational technology. They are audacious in that they fly in the face of some taken-for-granted assumptions about what constitute good business models in higher education.

Business models should not determine pedagogy. It is the task of vendors in the educational technology space to begin with strong commitments to a set of well-defined values about education, and to ensure that business models are consistent with those fundamental beliefs. It will always be a challenge to develop sustainable business models that do not conflict with core values. But that’s not a bad thing.

When it comes to the market for data in eduction, let’s face it: analytics are a commodity. Every analytics vendor is applying the same basic set of proven techniques to the same kinds of data. In this, it is silly (and even dangerous) to talk about proprietary algorithms. Data science is not a market differentiator.

What DOES differentiate products are the ways in which information is exposed. It is easy to forget that analytics is a rhetorical activity. The visual display of information is an important interpretive layer. The decisions that product designers make about WHAT and HOW information is displayed prompt different ranges of interpretation and nudge people to take different types of action. Dashboards are the front line between information and practice. It is here where values become most apparent, and it is here where products are truly differentiated.

EDUCAUSE is big. Really big. With so much to take in, conference-goers (myself included) are easily faced with the paradox of choice: a sense of paralysis in the face of too many options. To help myself and others, I have scanned this year’s conference agenda and selected five presentations that I think will be individually strong, and that as a group offer a good overview of the themes, issues, and state of analytics in higher education today.

Moderated by Michael Feldstein (e-Literate), and featuring John Whitmer (Blackboard), Russ Little (PAR), and Jeff Gold (California State University), and Avi Yashchin (IBM), this session promises to provide an engaging and insightful overview of why analytics are important for higher education, the biggest challenges currently facing the field, and opportunities for the future. Although most of the speakers are strongly affiliated with vendors in the analytics space, they are strong data scientists in their own right and have demonstrated time and time again that they do not shy from critical honesty. Attend this session for a raw glimpse into what analytics mean for higher education today.

Jisc is a non-profit company that aims to create and maintain a set of shared services in support of higher education in the UK. The Effective Learning Analytics project that Michael Webb will discuss in this session has aimed to provide a centralized learning analytics solution in addition to a library of shared resources. The outputs of this project to date have valuable resources to the international educational analytics community in general, including Code of practice for learning analytics and Learning Analytics in Higher Education. Jisc’s work is being watched carefully by governments and non-governmental organizations worldwide and represents an approach that we may wish to consider emulating in the US (current laws notwithstanding). Attend this session to learn about the costs and opportunities involved in the development of a centralized approach to collecting and distributing educational data.

The higher education community is abuzz with talk of how data and analytics can improve student success. But data and analytics are worthless unless they are put in the hands of the right people and in the right ways. I am really interested to see how Ivy Tech has worked to successfully democratize access to information, and also about the ways that access to data has driven the kind of institutional and cultural change necessary to see the most significant results from data-driven initiatives.

Everyone’s talking about analytics, and every institution seemingly has the will to invest. Attention paid to analytics in media and by vendors can lead to the impression that everybody’s doing it, and that everyone who’s doing it is seeing great results. But the truth is far from the case.

I’m not the greatest fan of benchmarking in general. Too often, benchmarking is productized by vendors and sold to universities despite providing very little actionable value. Worse yet, they can exacerbate feelings of institutional insecurity and drive imprudent investments. But when it comes to analytics, benchmarking done right can provide important evidence to counteract misperceptions about the general state of analytics in the US, and provide institutions with valuable information to inform prudent investment, planning, and policy decisions. In this presentation, I look forward to hearing Christopher Brooks and Jeffery Pomerantz from EDUCAUSE discuss their work on the analytics and student success benchmarking tools.

I am a huge advocate of open standards in learning analytics. Open standards mean greater amounts of higher quality data. They mean that vendors and data scientists can spend more time innovating and less time just trying to get plumbing to work. In this interactive presentation, Malcolm Brown (EDUCAUSE), Jenn Stringer (University of California, Berkeley), Sean DeMonner (University of Michigan-Ann Arbor), and Virginia Lacefield (University of Kentucky) talk about how open learning standards like IMS Caliper and xAPI are creating the foundation for the emergence of next generation learning environments.

The political environment in the United States has increasingly highlighted huge problems in our education system. These problems, I would argue, are not unrelated to how we as a country conceptualize student success. From the perspective of the student, success is about finding a high-paying job that provides a strong sense of personal fulfillment. From the perspective of colleges and universities, student success is about graduation and retention. From the perspective of government, it’s about making sure that we have a trained workforce capable of meeting labor market demands. For all of the recent and growing amount of attention paid to student success, however, what is woefully absent seems to be any talk about the importance of education to producing a liberal democratic citizenry. In the age of ‘big data,’ of course, part of this absence may be the fact that the success of a liberal education is difficult to measure. From this perspective, the success of a country’s education system cannot be measured directly. Instead, it is measured by the extent to which it’s citizens demonstrate things like active engagement, an interest/ability to adjudicate truth claims, and a desire to promote social and societal goods. Now, more than any time in recent history, we are witnessing the failure of American education. In the US, the topic of education has been largely absent from the platforms of individual presidential candidates. This is, perhaps, a testament to the fact that education is bad for politics. Where it has been discussed, we hear Trump talk about cutting funding to the Department of Education, if not eliminating it entirely. We hear Clinton talk about early childhood education, free/debt-free college, and more computer science training in k-12, but in each of these cases, the tenor tends to be about work and jobs rather than promoting societal goods more generally.

But I don’t want to make this post about politics. Our political climate is merely a reflection of the values that inform our conceptions of student success. These values — work, personal fulfillment, etc — inform policy decisions and university programs, but they also inform the development of educational technologies. The values that make up our nation’s conception of ‘student success’ produce the market demand that educational technology companies then try to meet. It is for this reason that we see a recent surge (some would say glut) of student retention products on the market, and relatively few that are meant to support liberal democratic values. It’s easy to forget that our technologies are not value-neutral. It’s easy to forget that, especially when it comes to communication technologies, the ‘medium is the message.’

What can educational technology companies do to meet market demands (something necessary to survival) while at the same time being attuned to the larger needs of society? I would suggest three things:

Struggle. Keeping ethical considerations and the needs of society top of mind is hard. For educational technologies to acknowledge the extent to which they both shape and are shaped by cultural movements produces a heavy burden of responsibility. The easy thing to do is to abdicate responsibility, citing the fact that ‘we are just a technology company.’ But technologies always promote particular sets of values. Accepting the need to meet market demand at the same time as the need to support liberal democratic education can be hard. These values WILL and DO come into conflict. But that’s not a reason to abandon either one or the other. It means constantly struggling in the knowledge that educational technologies have a real impact on the lives of people. Educational technology development is an inherently ethical enterprise. Ethics are hard.

Augment human judgment. Educational technologies should not create opportunities for human beings to avoid taking responsibility for their decisions. With more data, more analytics, and more artificial intelligence, it is tempting to lean on technology to make decisions for us. But liberal democracy is not about eliminating human responsibility, and it is not about making critical thinking unnecessary. To the contrary, personal responsibility and critical thinking are hallmarks of a liberal democratic citizen — and are essential to what it means to be human. As tempting as it may be to create technologies that make decisions for us because they can, I feel like it is vitally important that we design technologies that increase our ability to participate in those activities that are the most human.

Focus on community and critical thinking. Creating technologies that foster engagement with complex ideas is hard. Very much in line with the ‘augmented’ approach to educational technology development, I look to people like Alyssa Wise and Bodong Chen, who are looking at ways that a combination of embedded analytics and thoughtful teaching practices can produce reflective moments for students, and foster critical thinking in the context of community. And it is for this reason that I am excited about tools like X-Ray Learning Analytics, a product for Moodle that makes use of social network analysis and natural language processing in a way that empowers teachers to promote critical thinking and community engagement.

My wife’s coach one told her that “experience is what you get the moment after you needed it.” Too often the same can be said for data literacy. Colleges and universities looking to wisely invest in analytics to support the success of their students and to optimize operational efficiency are confronted with the daunting task of having to evaluate a growing number of options before selecting a products and approaches that are right for them. What products and services are most likely to see the greatest returns on investment? What approaches have other institutions taken that have already seen high rates of success? On the one hand, institutions that are just now getting started with analytics have the great advantage of being able to look to many who have gone before and who are beginning to see promising results. On the other hand, the analytics space is still immature and there is little long-term high-quality evidence to support the effectiveness of many products and interventions.

Institutions and vendors who have invested heavily in analytics have a vested interest in representing promising results (and they ARE promising!) in the best light possible. This makes sense. This is a good thing. The marketing tactics that both institutions of higher education and educational technology vendors employ as they represent their results are typically honest and in good faith as they earnestly work in support of student success. But the representation of information is always a rhetorical act. Consequently, the ways in which results are presented too often obscure the actual impact of technologies and interventions. The way that results are promoted can make it difficult for less mature institutions to adjudicate the quality of claims and make well-informed decisions about the products, services, and practices that will be best for them.

Perhaps the most common tactic that is used to make results appear more impressive than they are involves changing the scale used on the y-axis of bar and line charts. A relatively small difference can famously be made to appear dramatic if the range is small enough. But there are other common tactics that are not as easily spotted that are nonetheless just as important when it comes to evaluating the impact of interventions. Here are three:

There is a difference between a percentage increase and an increase in percentage points. For example, an increase in retention from 50% to 55% may be represented as either an increase of 5 points or 10%. It is also important to note that the same number of points will translate into a different percentage increase depending on the starting rate. For example, a 5-point increase from a retention rate of 25% represents an increase of 20%. A 5-point increase from a starting retention rate of 75%, on the other hand, is only an increase of 7%. Marketing literature will tend to choose metrics based on what sounds most impressive, even if it obscures the real impact.

A single data point does not equal a trend. Context and history are important. When a vendor or institution claims that an intervention saw a significant increase in retention/graduation in only a year, it is possible that such an increase was due to chance, an existing trend, or else was the result of other initiatives or shifts in student demographics. For example, one college recently reported a 10% increase in its retention rate after only one year of using a student retention product. Looking back at historical retention rates, however, one finds that the year prior to tool adoption marked a significant and uncharacteristic drop in retention, which means that any increase could just as easily have been due to chance or other factors. In the same case, close inspection finds that the retention rate following tool adoption was still low from an historical perspective, and part of an emerging downward trend rather than the reverse.

It’s not the tool. It’s the intervention. One will ofter hear vendors take credit for significant increases in retention / graduation rates, when there are actually other far more significant causal factors. One school, for example, is praised for using a particular analytics system to double its graduation rates. What tends not to be mentioned, however, is the fact that the same school also radically reduced its student : advisor ratio, centralized its administration, and engaged in additional significant programmatic changes that contributed to the school’s success over and above the impact that the analytics system might have made by itself. The effective use of an analytics solution can definitely play a major role in facilitating efforts to increase retention and graduation rates. If fact, all things being equal, it is reasonable to expect a 1 to 3 point increase in student retention as a result of using early alerts powered by predictive analytics. Significant gains above this, however, are only possible as a result of significant cultural change, strategic policy decisions, and well-designed interventions. It can be tempting for a vendor specially to at least implicitly take credit for more than is due, but it can be misleading and have the effect of obscuring the tireless efforts of institutions and people who are working to support their students. More than this, overemphasizing products over institutional change can impede progress. It can lead institutions to falsely believe that a product will do all the work, and encourage them to naively embark on analytics projects and initiatives without fully understanding the change in culture, policy, and practice to make them fully successful.