Big Data in Education in 2025: A Thought Experiment

all this talk of petabytes and exabytes
is making me confused .... and hungry

Each January, about 85 government ministers or so -- together with some members of their staffs, leaders of the education departments in international organizations, large NGOs and multinational companies, and other 'high level decision makers' -- gather in London to speak informally about topics of common interest during the Education World Forum, which bills itself as the 'world's largest gathering of education and skills ministers'. It's a rather unique and impressive collection of people with the power to make decisions affecting hundreds of millions of students and teachers around the world. This annual meeting was previously called the 'Learning and Technology World Forum'; despite dropping the word 'technology' from its official title a few years ago, talk of tech was inescapable during this year's Forum, whether onstage or in the hallways. If I were asked to identify three general themes that permeated discussions throughout this year's three-day event, they would be 'technology', 'systems' and 'data'.

For many groups, the Education World Forum offers a high profile venue to announce new initiatives, launch new publications, and present findings from recent research. My boss at the World Bank, Elizabeth King, for example, officially launched a new 'SABER' education data technology tool during her keynote speech on the second day ("When it comes to learning, education systems matter"). While the links between these three themes were perhaps not always explicit in Beth's speech, the important role that new technologies will play in helping education systems to collect and analyze key data about the health of the education system, especially as pertains to whether or not students are learning (and, if so, how), was echoed and amplified by many of the other speakers in both EWF plenary sessions and related side events.

---

While the Forum has become increasing open over the years, embracing the use of social media throughout much of the agenda, for example, and quickly making available on YouTube key speeches and presentations, the off-the-record ministerial exchange sessions that happen on the second day are, as per the EWF social media policy, meant to be a largely Twitter-free zone. The hope is that, if/when/where given space to ask the 'dumb' questions of their peers, and freed from having it reported that someone, in response, provided some 'dumb' answers, Forum participants might feel comfortable enough to have what turn out to be some rather smart conversations about topics for which they had not been prepped, and about which no formal position papers had been prepared back home.

At one of the informal Forum ministerial exchange sessions a few years ago, rather exasperated that much of the conversation was concentrated on discussions of the lowest costs that various countries had paid for student laptops, I posed the following scenario, and question, as a sort of 'thought experiment':

Let's assume that, by 2025, *all* hardware and software costs related to the use of information and communication technologies to support learning were zero.

How might this change the way you consider the use of ICTs to support the goals of your education system?

If we removed considerations of cost from the equation, how might we conceive of the use of technologies in education? Would our approach then be consistent with our approach today?

Many times I find that what is 'urgent' at a particular moment in time can crowd out consideration of what is truly important. In posing this thought experiment, my hope was to challenge folks to consider their current approaches to the use of technology, and the related challenges, over a longer period of time.

With that experience in mind, and given all of the talk about 'systems', 'data' and 'technology' at this year's Education World Forum, an additional thought experiment comes to mind:

Let's assume that, by 2025, and for better or for worse, the use of digital technologies across all aspects off your education system will be pervasive.

Let's also assume that, by 2025, most of the technical constraints and costs you are mentioning today related to the collection of 'data' of all sorts in your education system will effectively fall to zero, because this data collection will be a natural by-product of the widespread use of technology in the lives of your students, teachers, administrators, and the communities in which they live.

What would you do then?
What would be your key related concerns, priorities and opportunities?
Where would you, should you, be directing your resources and energies?

If this future is inevitable (I am not saying that it is, or that this future is a good or bad thing, just that such a scenario is plausible -- these are the parameters of the 'thought experiment'), what are the important related things you need to be thinking about and doing *today* that we are *not* talking about right now, because we are discussing more immediately 'urgent' issues and priorities?

---

Partisan ideological debates define (some would use stronger words, like 'plague', perhaps, or 'paralyze') academic and policy discussions about education in much of the world. One of the featured speakers at this year's Education World Forum (as in many past years) was the OECD's Andreas Schleicher, who rather famously likes to conclude many of his presentations about the latest PISA results by saying that, "without data, you are just another person with an opinion." Initiatives that rigorously collect learning- and policy-relevant data so as to inform more evidence-based decision making in the education sector offer the potential to steer at least some of the related discussions away from philosophical discussions of what 'should' work, based on whatever learning and organizational theories predominate in a particular place, to what appears to work (and not work) in practice. (Whether or not such initiatives can help with the politics that permeates education decision making in most countries is perhaps another issue ...)

International assessment schemes, surveys, benchmarking efforts and analytical tools (examples include things like PISA, TIMMS, PIRLS, TALIS, the World Bank's SABER program) are really still in their infancy, and evolving. While undeniably useful in various contexts, viewing them collectively underscores that fact that we really still don't have all that much useful hard data, whether at a macro or system level or at the micro level of an individual learner or teacher, about what works best, and what doesn't, in various contexts. We certainly have more data to help in this regard than we did twenty (or even five) years ago, of course! But we still have a long way to go. Spend some time poking around what is probably the world's most comprehensive repository of globally comparable education data, the World Bank's EdStats database, for example, and you'll find troves of data of many sorts that simply were not collected and made easily available to large numbers of people a generation ago. Useful stuff, to be sure. But viewed from another perspective, what is available through data repositories like EdStats only underscores how much data we still don't have.

One of the reasons for this is that it is often quite expensive to collect such data. As someone who has commissioned many data collections efforts around the world related to ICT use in education, and as someone who has advised on countless others, I have firsthand experience with just how expensive such efforts can be. However, as more of the actions, activities, and communications taking place inside and outside of schools are enabled or mediated through the use of information and communication technologies, one result may be a massive (and passive) collection of huge amounts of data (and metadata) that was previously not possible without incurring high costs, as such data collection required dedicated actions (surveys and interviews, high stakes standardized tests, classroom observations) by lots of trained people. If/when this eventually occurs, huge waves of data will wash through and across our education systems, increasingly quickly and in ways unimaginable only a few years ago.

What will education systems need to do to be able to take advantage of these waves of 'big data'?
Who can use these data, and for what purposes?
What won't we need to do that we currently do today -- and might we need to start doing?
What will need to be done to protect the rights of the people affected by decisions made as a result?

As with other uses of digital technologies to benefit learning, we should perhaps not be too surprised if those best able to benefit from the coming data explosion in education are those already considered 'the best' -- the best education systems (with the best data scientists), the best schools, the best teachers, the best students, and the best communities (and, it is perhaps worth noting, perhaps the 'best' companies too). If a sort of Matthew Effect of Big Data in Education may soon be upon us, what should we be doing now to prepare to help ensure learning for all students, and not just those who already enjoy the greatest advantages?

---

Having access to lots of data is one thing. Being able to make sense of so-called Big Data is quite another. As the cost of collecting certain types of data approaches zero, and costs of storing these data fall precipitously, what will be the costs of analyzing these data -- and are we prepared to pay these costs? As we go from a period of time marked by data scarcity in education to (at least in some regards) data abundance, one hopes that the quotation above from Andreas Schleicher will not be modified to read, "with too much data, you may be just another overwhelmed person with a poorly developed opinion".

In emails following up on a number of discussions that took place at this year's Education World Forum, a number of people asked me if I had seen newsreports about a study by researchers at Princeton that, as a result of epidemiological modeling of online social network dynamics, predicted "a rapid decline in Facebook activity in the next few years". Before too long, a number of folks took issue with this conclusion. (A data scientist at Facebook even put up a rather cheeky blog post using "the same robust methodology featured in the paper" to conclude that "Princeton may be in danger of disappearing entirely".) My point in mentioning this here is not to debate the merits of one particular paper (there are other places for that sort of thing), but rather to highlight the fact that any impending era of 'big data in education' will open up education systems to all sorts of new opportunities for analysis and action -- and potentially to all sorts of abuses as well.

In my experience, the education sector is far from a leader on issues related to so-called 'big data'. Related laws, cultural mores and public debates are in their infancy, especially in the middle and low income countries where I work, where there is a real Wild West quality to much of this stuff. To the extent that anyone is 'in charge' in such places on issues related to big data, I am often struck by the notion that it might in practice be many of the vendors selling and maintaining various information systems through which such data travel.

In 2014 the most pressing concerns of many education policymakers related to the use of technology in schools may (regrettably) be, 'what tablet should we buy?' This is no doubt a rather challenging question currently confronting many education systems around the world. As complicated as this question may appear today, however, one expects that, however it is answered, and however much this answer costs, and whatever the result, much more challenging follow-on questions (and much bigger related costs) loom not too far in the future.