Enhancing Teaching and Learning

Through Educational Data Mining and Learning Analytics:

Enhancing Teaching and Learning

U.S. Department of Education

Office of Educational Technology

Prepared by:Marie BienkowskiMingyu FengBarbara MeansCenter for Technology in LearningSRI International

October 2012

This report was prepared for the U.S. Department of Education under Contract Number ED-04CO-0040, Task 0010, with SRI International. The views expressed herein do not necessarilyrepresent the positions or policies of the Department of Education. No official endorsement bythe U.S. Department of Education is intended or should be inferred.U.S. Department of EducationArne DuncanSecretaryOffice of Educational TechnologyKaren CatorDirectorOctober 2012This report is in the public domain. Authorization to reproduce this report in whole or in part isgranted. While permission to reprint this publication is not necessary, the suggested citation is:U.S. Department of Education, Office of Educational Technology, Enhancing Teaching andLearning Through Educational Data Mining and Learning Analytics: An Issue Brief,Washington, D.C., 2012.This report is available on the Departments Web site at http://www.ed.gov/technology.On request, this publication is available in alternate formats, such as Braille, large print, orcompact disc. For more information, please contact the Departments Alternate Format Center at(202) 260-0852 or (202) 260-0818.Technical ContactBernadette Adamsbernadette.adams@ed.gov

Executive SummaryIn data mining and data analytics, tools and techniques once confined to research laboratories arebeing adopted by forward-looking industries to generate business intelligence for improvingdecision making. Higher education institutions are beginning to use analytics for improving theservices they provide and for increasing student grades and retention. The U.S. Department ofEducations National Education Technology Plan, as one part of its model for 21st-centurylearning powered by technology, envisions ways of using data from online learning systems toimprove instruction.With analytics and data mining experiments in education starting to proliferate, sorting out factfrom fiction and identifying research possibilities and practical applications are not easy. Thisissue brief is intended to help policymakers and administrators understand how analytics anddata mining have beenand can beapplied for educational improvement.At present, educational data mining tends to focus on developing new tools for discoveringpatterns in data. These patterns are generally about the microconcepts involved in learning: onedigit multiplication, subtraction with carries, and so on. Learning analyticsat least as it iscurrently contrasted with data miningfocuses on applying tools and techniques at larger scales,such as in courses and at schools and postsecondary institutions. But both disciplines work withpatterns and prediction: If we can discern the pattern in the data and make sense of what ishappening, we can predict what should come next and take the appropriate action.Educational data mining and learning analytics are used to research and build models in severalareas that can influence online learning systems. One area is user modeling, which encompasseswhat a learner knows, what a learners behavior and motivation are, what the user experience islike, and how satisfied users are with online learning. At the simplest level, analytics can detectwhen a student in an online course is going astray and nudge him or her on to a coursecorrection. At the most complex, they hold promise of detecting boredom from patterns of keyclicks and redirecting the students attention. Because these data are gathered in real time, thereis a real possibility of continuous improvement via multiple feedback loops that operate atdifferent time scalesimmediate to the student for the next problem, daily to the teacher for thevii

next days teaching, monthly to the principal for judging progress, and annually to the districtand state administrators for overall school improvement.The same kinds of data that inform user or learner models can be used to profile users. Profilingas used here means grouping similar users into categories using salient characteristics. Thesecategories then can be used to offer experiences to groups of users or to make recommendationsto the users and adaptations to how a system performs.User modeling and profiling are suggestive of real-time adaptations. In contrast, someapplications of data mining and analytics are for more experimental purposes. Domain modelingis largely experimental with the goal of understanding how to present a topic and at what level ofdetail. The study of learning components and instructional principles also uses experimentationto understand what is effective at promoting learning.These examples suggest that the actions from data mining and analytics are always automatic,but that is less often the case. Visual data analytics closely involve humans to help make sense ofdata, from initial pattern detection and model building to sophisticated data dashboards thatpresent data in a way that humans can act upon. K12 schools and school districts are starting toadopt such institution-level analyses for detecting areas for instructional improvement, settingpolicies, and measuring results. Making visible students learning and assessment activitiesopens up the possibility for students to develop skills in monitoring their own learning and to seedirectly how their effort improves their success. Teachers gain views into students performancethat help them adapt their teaching or initiate tutoring, tailored assignments, and the like.Robust applications of educational data mining and learning analytics techniques come withcosts and challenges. Information technology (IT) departments will understand the costsassociated with collecting and storing logged data, while algorithm developers will recognize thecomputational costs these techniques still require. Another technical challenge is that educationaldata systems are not interoperable, so bringing together administrative data and classroom-leveldata remains a challenge. Yet combining these data can give algorithms better predictive power.Combining data about student performanceonline tracking, standardized tests, teachergenerated teststo form one simplified picture of what a student knows can be difficult andmust meet acceptable standards for validity. It also requires careful attention to student andteacher privacy and the ethical obligations associated with knowing and acting on student data.

viii

Educational data mining and learning analytics have the potential to make visible data that haveheretofore gone unseen, unnoticed, and therefore unactionable. To help further the fields andgain value from their practical applications, the recommendations are that educators andadministrators:

Develop a culture of using data for making instructional decisions.

Involve IT departments in planning for data collection and use.

Be smart data consumers who ask critical questions about commercial offerings andcreate demand for the most useful features and uses.

Start with focused areas where data will help, show success, and then expand to newareas.

Communicate with students and parents about where data come from and how the dataare used.

Help align state policies with technical requirements for online learning systems.

Researchers and software developers are encouraged to:

Conduct research on usability and effectiveness of data displays.

Help instructors be more effective in the classroom with more real-time and data-baseddecision support tools, including recommendation services.

Continue to research methods for using identified student information where it will helpmost, anonymizing data when required, and understanding how to align data acrossdifferent systems.

Understand how to repurpose predictive models developed in one context to another.

A final recommendation is to create and continue strong collaboration across research,

commercial, and educational sectors. Commercial companies operate on fast development cyclesand can produce data useful for research. Districts and schools want properly vetted learningenvironments. Effective partnerships can help these organizations codesign the best tools.

ix

IntroductionAs more of our commerce, entertainment, communication, and learning are occurring over theWeb, the amount of data online activities generate is skyrocketing. Commercial entities have ledthe way in developing techniques for harvesting insights from this mass of data for use inidentifying likely consumers of their products, in refining their products to better fit consumerneeds, and in tailoring their marketing and user experiences to the preferences of the individual.More recently, researchers and developers of online learning systems have begun to exploreanalogous techniques for gaining insights from learners activities online.This issue brief describes data analytics and data mining in the commercial world and howsimilar techniques (learner analytics and educational data mining) are starting to be applied ineducation. The brief examines the challenges being encountered and the potential of such effortsfor improving student outcomes and the productivity of K12 education systems. The goal is tohelp education policymakers and administrators understand how data mining and analytics workand how they can be applied within online learning systems to support education-related decisionmaking.Specifically, this issue brief addresses the followingquestions:

What is educational data mining, and how is it

applied? What kinds of questions can it answer, andwhat kinds of data are needed to answer thesequestions?How does learning analytics differ from data mining?Does it answer different questions and use differentdata?What are the broad application areas for whicheducational data mining and learning analytics areused?What are the benefits of educational data mining andlearning analytics, and what factors have enabledthese new approaches to be adopted?

Online Learning Systems and Adaptive

Learning EnvironmentsOnline learning systems refer to onlinecourses or to learning software orinteractive learning environments that useintelligent tutoring systems, virtual labs, orsimulations. Online courses may beoffered through a learning or coursemanagement system (such asBlackboard, Moodle, or Sakai) or alearning platform (such as Knewton andDreamBox Learning). Examples oflearning software and interactive learningenvironments are those from Kaplan,Khan Academy, and Agile Mind. Whenonline learning systems use data tochange in response to studentperformance, they become adaptivelearning environments.

What are the challenges and barriers to successful application of educational data miningand learning analytics?What new practices have to be adopted in order to successfully employ educational datamining and learning analytics for improving teaching and learning?

Sources of information for this brief consisted of:

A review of selected publications and fugitive or gray

literature (Web pages and unpublished documents) oneducational data mining and learning analytics;Interviews of 15 data mining/analytics experts fromlearning software and learning management systemcompanies and from companies offering other kindsof Web-based services; andDeliberations of a technical working group of eightacademic experts in data mining and learninganalytics.

Learning management systems (LMS)

LMS are suites of software tools thatprovide comprehensive course-deliveryfunctionsadministration, documentation,content assembly and delivery, trackingand reporting of progress, usermanagement and self-services, etc. LMSare Web based and are considered aplatform on which to build and delivermodules and courses. Open-sourceexamples include Moodle, Sakai, andILIAS.

This issue brief was inspired by the vision of personalized learning and embedded assessment inthe U.S. Department of Educations National Education Technology Plan (NETP) (U.S.Department of Education 2010a). As described in the plan, increasing use of online learningoffers opportunities to integrate assessment and learning so that information needed to improvefuture instruction can be gathered in nearly real time:When students are learning online, there are multiple opportunities to exploit the power oftechnology for formative assessment. The same technology that supports learning activitiesgathers data in the course of learning that can be used for assessment. An online systemcan collect much more and much more detailed information about how students are learningthan manual methods. As students work, the system can capture their inputs and collectevidence of their problem-solving sequences, knowledge, and strategy use, as reflected bythe information each student selects or inputs, the number of attempts the student makes, thenumber of hints and feedback given, and the time allocation across parts of the problem.(U.S. Department of Education 2010a, p. 30)While students can clearly benefit from this detailed learning data, the NETP also describes thepotential value for the broader education community through the concept of an interconnectedfeedback system:The goal of creating an interconnected feedback system would be to ensure that keydecisions about learning are informed by data and that data are aggregated and madeaccessible at all levels of the education system for continuous improvement.(U.S. Department of Education 2010a, p. 35)2

The interconnected feedback systems envisioned by the NETP rely on online learning systemscollecting, aggregating, and analyzing large amounts of data and making the data available tomany stakeholders. These online or adaptive learning systems will be able to exploit detailedlearner activity data not only to recommend what the next learning activity for a particularstudent should be, but also to predict how that student will perform with future learning content,including high-stakes examinations. Data-rich systems will be able to provide informative andactionable feedback to the learner, to the instructor, and to administrators. These learningsystems also will provide software developers with feedback that is tremendously helpful inrapidly refining and improving their products. Finally, researchers will be able to use data fromexperimentation with adaptive learning systems to test and improve theories of teaching andlearning.In the remainder of this report, we:1. Present scenarios that motivate research, development, and application efforts to collectand use data for personalization and adaptation.2. Define the research base of educational data mining and learning analytics and describethe research goals researchers pursue and the questions they seek to answer aboutlearning at all levels of the educational system.3. Present an abstracted adaptive learning system to show how data are obtained and used,what major components are involved, and how various stakeholders use such systems.4. Examine the major application areas for the tools and techniques in data mining andanalytics, encompassing user and domain modeling.5. Discuss the implementation and technical challenges and give recommendations forovercoming them.

Personalized Learning Scenarios

Online consumer experiences provide strong evidence that computer scientists are developingmethods to exploit user activity data and adapt accordingly. Consider the experience a consumerhas when using Netflix to choose a movie. Members can browse Netflix offerings by category(e.g., Comedy) or search by a specific actor, director, or title. On choosing a movie, the membercan see a brief description of it and compare its average rating by Netflix users with that of otherfilms in the same category. After watching a film, the member is asked to provide a simple ratingof how much he or she enjoyed it. The next time the member returns to Netflix, his or herbrowsing, watching, and rating activity data are used as a basis for recommending more films.The more a person uses Netflix, the more Netflix learns about his or her preferences and themore accurate the predicted enjoyment. But that is not all the data that are used. Because manyother members are browsing, watching, and rating the same movies, the Netflix recommendationalgorithm is able to group members based on their activity data. Once members are matched,activities by some group members can be used to recommend movies to other group members.Such customization is not unique to Netflix, of course. Companies such as Amazon, Overstock,and Pandora keep track of users online activities and provide personalized recommendations ina similar way.Education is getting very close to a time when personalization will become commonplace inlearning. Imagine an introductory biology course. The instructor is responsible for supportingstudent learning, but her role has changed to one of designing, orchestrating, and supportinglearning experiences rather than telling. Working within whatever parameters are set by theinstitution within which the course is offered, the instructor elaborates and communicates thecourses learning objectives and identifies resources and experiences through which thoselearning goals can be attained. Rather than requiring all students to listen to the same lecturesand complete the same homework in the same sequence and at the same pace, the instructorpoints students toward a rich set of resources, some of which are online, and some of which areprovided within classrooms and laboratories. Thus, students learn the required material bybuilding and following their own learning maps.

Capturing the Moment of Learning by

Suppose a student has reached a place where the next unit isTracking Game Players Behaviorspopulation genetics. In an online learning system, thestudents dashboard shows a set of 20 different populationThe Wheeling Jesuit Universitys Cyberenabled Teaching and Learning throughgenetics learning resources, including lectures by a masterGame-based, Metaphor-Enhancedteacher, sophisticated video productions emphasizing visualLearning Objects (CyGaMEs) project wasimages related to the genetics concepts, interactivesuccessful in measuring learning usingassessments embedded in games.population genetics simulation games, an onlineCyGaMEs quantifies game play activity tocollaborative group project, and combinations of text andtrack timed progress toward the gamespractice exercises. Each resource comes with a rating of howgoal and uses this progress as a measureof player learning. CyGaMEs alsomuch of the population genetics portion of the learning mapcaptures a self-report on the gameit covers, the size and range of learning gains attained byplayers engagement or flow, i.e., feelingsstudents who have used it in the past, and student ratings ofof skill and challenge, as these feelingsvary throughout the game play. In additionthe resource for ease and enjoyment of use. These ratings areto timed progress and self-report ofderived from past activities of all students, such as likeengagement, CyGaMEs capturesindicators, assessment results, and correlations betweenbehaviors the player uses during play.Reese et al. (in press) showed that thisstudent activity and assessment results. The student choosesbehavior data exposed a prototypicala resource to work with, and his or her interactions with it are moment of learning that was confirmedused to continuously update the systems model of howby the timed progress report. Researchusing the flow data to determine how usermuch he or she knows about population genetics. After theexperience interacts with learning isstudent has worked with the resource, the dashboard showsongoing.updated ratings for each population genetics learningresource; these ratings indicate how much of the unit contentthe student has not yet mastered is covered by each resource. At any time, the student maychoose to take an online practice assessment for the population genetics unit. Student responsesto this assessment give the systemand the studentan even better idea of what he or she hasalready mastered, how helpful different resources have been in achieving that mastery, and whatstill needs to be addressed. The teacher and the institution have access to the online learning data,which they can use to certify the students accomplishments.

This scenario shows the possibility of leveraging data for improving student performance;another example of data use for sensing student learning and engagement is described in thesidebar on the moment of learning and illustrates how using detailed behavior data can pinpointcognitive events.The increased ability to use data in these ways is due in part to developments in several fields ofcomputer science and statistics. To support the understanding of what kinds of analyses arepossible, the next section defines educational data mining, learning analytics, and visual dataanalytics, and describes the techniques they use to answer questions relevant to teaching andlearning.

Data Mining and Analytics: The Research Base

Using data for making decisions is not new; companies use complex computations on customerdata for business intelligence or analytics. Business intelligence techniques can discern historicalpatterns and trends from data and can create models that predict future trends and patterns.Analytics, broadly defined, comprises applied techniques from computer science, mathematics,and statistics for extracting usable information from very large datasets.An early example of using data to explore online behavior isWeb analytics using tools that log and report Web pagevisits, countries or domains where the visit was from, and thelinks that were clicked through. Web analytics are still usedto understand and improve how people use the Web, butcompanies now have developed more sophisticatedtechniques to track more complex user interactions with theirwebsites. Examples of such tracking include changes inbuying habits in response to disruptive technology (e.g., ereaders), most-highlighted passages in e-books, browsinghistory for predicting likely Web pages of interest, andchanges in game players habits over time. Across the Web,social actions, such as bookmarking to social sites, posting toTwitter or blogs, and commenting on stories can be trackedand analyzed.

Unstructured Data and

Machine LearningData are often put into a structuredformat, as in a relational database.Structured data are easy for computers tomanipulate. In contrast, unstructureddata have a semantic structure that isdifficult to discern computationally (as intext or image analysis) without human aid.As a simple example, an email messagehas some structured partsTo, From,and Date Sent and some unstructuredpartsthe Subject and the Body.Machine learning approaches to datamining deal with unstructured data,finding patterns and regularities in thedata or extracting semantically meaningfulinformation.

Analyzing these new logged events requires new techniques

to work with unstructured text and image data, data frommultiple sources, and vast amounts of data (big data). Big data does not have a fixed size; anynumber assigned to define it would change as computing technology advances to handle moredata. So big data is defined relative to current or typical capabilities. For example, Manyikaet al. (2011) defines big data as Datasets whose size is beyond the ability of typical databasesoftware tools to capture, store, manage, and analyze. Big data captured from users onlinebehaviors enables algorithms to infer the users knowledge, intentions, and interests and to createmodels for predicting future behavior and interest.

Research on machine learning has yielded techniques for

knowledge discovery (see sidebar for a definition) or datamining that discover novel and potentially usefulinformation in large amounts of unstructured data. Thesetechniques find patterns in data and then build predictivemodels that probabilistically predict an outcome.Applications of these models can then be used incomputing analytics over large datasets.

Knowledge Discovery in Databases (KDD)

KDD is an interdisciplinary area focusing onmethodologies for extracting useful knowledgefrom data. Extracting knowledge from datadraws on research in statistics, databases,pattern recognition, machine learning, datavisualization, optimization, and highperformance computing to deliver advancedbusiness intelligence and Web discoverysolutions.

Two areas that are specific to the use of big data in

education are educational data mining and learninghttp://researcher.ibm.com/view_pic.php?id=144analytics. Although there is no hard and fast distinctionbetween these two fields, they have had somewhatdifferent research histories and are developing as distinctresearch areas. Generally, educational data mining looks for new patterns in data and developsnew algorithms and/or new models, while learning analytics applies known predictive models ininstructional systems. Discussion on each follows below.

Educational Data Mining

Educational data mining is emerging as a research area witha suite of computational and psychological methods andresearch approaches for understanding how students learn.New computer-supported interactive learning methods andtoolsintelligent tutoring systems, simulations, gameshave opened up opportunities to collect and analyze studentdata, to discover patterns and trends in those data, and tomake new discoveries and test hypotheses about howstudents learn. Data collected from online learning systemscan be aggregated over large numbers of students and cancontain many variables that data mining algorithms canexplore for model building.

Just as with early efforts to understand online behaviors,

early efforts at educational data mining involved miningwebsite log data (Baker and Yacef 2009), but now more integrated, instrumented, andsophisticated online learning systems provide more kinds of data. Educational data mininggenerally emphasizes reducing learning into small components that can be analyzed and theninfluenced by software that adapts to the student (Siemens and Baker 2012). Student learningdata collected by online learning systems are being explored to develop predictive models byapplying educational data mining methods that classify data or find relationships. These modelsplay a key role in building adaptive learning systems in which adaptations or interventions basedon the models predictions can be used to change what students experience next or even torecommend outside academic services to support their learning.An important and unique feature of educational data is that they are hierarchical. Data at thekeystroke level, the answer level, the session level, the student level, the classroom level, theteacher level, and the school level are nested inside one another (Baker 2011; Romero andVentura 2010). Other important features are time, sequence, and context. Time is important tocapture data, such as length of practice sessions or time to learn. Sequence represents howconcepts build on one another and how practice and tutoring should be ordered. Context isimportant for explaining results and knowing where a model may or may not work. Methods forhierarchical data mining and longitudinal data modeling have been important developments inmining educational data.

Educational data mining researchers (e.g., Baker 2011; Baker

and Yacef 2009) view the following as the goals for theirresearch:1. Predicting students future learning behavior by creatingstudent models that incorporate such detailed informationas students knowledge, motivation, metacognition, andattitudes;2. Discovering or improving domain models thatcharacterize the content to be learned and optimalinstructional sequences;3. Studying the effects of different kinds of pedagogicalsupport that can be provided by learning software; and4. Advancing scientific knowledge about learning andlearners through building computational models thatincorporate models of the student, the domain, and thesoftwares pedagogy.To accomplish these four goals, educational data miningresearch uses the five categories of technical methods (Baker2011) described below.

Whether educational data is taken from

students use of interactive learningenvironments, computer-supportedcollaborative learning, or administrativedata from schools and universities, it oftenhas multiple levels of meaningfulhierarchy, which often need to bedetermined by properties in the data itself,rather than in advance. Issues of time,sequence, and context also playimportant roles in the study of educationaldata.

1. Prediction entails developing a model that can infer a

single aspect of the data (predicted variable) from somehttp://www.educationaldatamining.orgcombination of other aspects of the data (predictor variables).Examples of using prediction include detecting such studentbehaviors as when they are gaming the system, engaging in off-task behavior, or failing toanswer a question correctly despite having a skill. Predictive models have been used forunderstanding what behaviors in an online learning environmentparticipation in discussionforums, taking practice tests and the likewill predict which students might fail a class.Prediction shows promise in developing domain models, such as connecting procedures or factswith the specific sequence and amount of practice items that best teach them, and forecasting andunderstanding student educational outcomes, such as success on posttests after tutoring (Baker,Gowda, and Corbett 2011).2. Clustering refers to finding data points that naturally group together and can be used to split afull dataset into categories. Examples of clustering applications are grouping students based ontheir learning difficulties and interaction patterns, such as how and how much they use tools in alearning management system (Amershi and Conati 2009), and grouping users for purposes ofrecommending actions and resources to similar users. Data as varied as online learningresources, student cognitive interviews, and postings in discussion forums can be analyzed using

10

techniques for working with unstructured data to extract characteristics of the data and thenclustering the results. Clustering can be used in any domain that involves classifying, even todetermine how much collaboration users exhibit based on postings in discussion forums (Anayaand Boticario 2009).3. Relationship mining involves discovering relationships between variables in a dataset andencoding them as rules for later use. For example, relationship mining can identify therelationships among products purchased in online shopping (Romero and Ventura 2010).

Association rule mining can be used for finding student mistakes that co-occur,associating content with user types to build recommendations for content that is likely tobe interesting, or for making changes to teaching approaches (e.g., Merceron and Yacef2010). These techniques can be used to associate student activity, in a learningmanagement system or discussion forums, with student grades or to investigate suchquestions as why students use of practice tests decreases over a semester of study.Sequential pattern mining builds rules that capture the connections between occurrencesof sequential events, for example, finding temporal sequences, such as student mistakesfollowed by help seeking. This could be used to detect events, such as students regressingto making errors in mechanics when they are writing with more complex and criticalthinking techniques, and to analyze interactions in online discussion forums.

Key educational applications of relationship mining include discovery of associations between

student performance and course sequences and discovering which pedagogical strategies lead tomore effective or robust learning. This latter areacalled teaching analyticsis of growingimportance and is intended to help researchers build automated systems that model how effectiveteachers operate by mining their use of educational systems.4. Distillation for human judgment is a technique that involves depicting data in a way thatenables a human to quickly identify or classify features of the data. This area of educational datamining improves machine-learning models because humans can identify patterns in, or featuresof, student learning actions, student behaviors, or data involving collaboration among students.This approach overlaps with visual data analytics (described in the third part of this section).5. Discovery with models is a technique that involves using a validated model of a phenomenon(developed through prediction, clustering, or manual knowledge engineering) as a component infurther analysis. For example, Jeong and Biswas (2008) built models that categorized studentactivity from basic behavior data: students interactions with a game-like learning environmentthat uses learning by teaching. A sample student activity discerned from the data was mapprobing. A model of map probing then was used within a second model of learning strategiesand helped researchers study how the strategy varied across different experimental states.Discovery with models supports discovery of relationships between student behaviors and

11

student characteristics or contextual variables, analysis of research questions across a wide

Learning AnalyticsLearning analytics is becoming defined as an area of researchand application and is related to academic analytics, actionanalytics, and predictive analytics. 1 Learning analytics drawson a broader array of academic disciplines than educationaldata mining, incorporating concepts and techniques frominformation science and sociology, in addition to computerscience, statistics, psychology, and the learning sciences.Unlike educational data mining, learning analytics generallydoes not emphasize reducing learning into components butinstead seeks to understand entire systems and to supporthuman decision making.Learning analytics emphasizes measurement and datacollection as activities that institutions need to undertake andunderstand, and focuses on the analysis and reporting of thedata. Unlike educational data mining, learning analytics doesnot generally address the development of new computationalmethods for data analysis but instead addresses theapplication of known methods and models to answerimportant questions that affect student learning andorganizational learning systems. The Horizon Report: 2011Edition describes the goal of learning analytics as enablingteachers and schools to tailor educational opportunities toeach students level of need and ability (Johnson et al. 2011).Unlike educational data mining, which emphasizes systemgenerated and automated responses to students, learninganalytics enables human tailoring of responses, such asthrough adapting instructional content, intervening with atrisk students, and providing feedback.

Defining Learning Analytics

Learning analytics refers to theinterpretation of a wide range of dataproduced by and gathered on behalf ofstudents in order to assess academicprogress, predict future performance, andspot potential issues. Data are collectedfrom explicit student actions, such ascompleting assignments and takingexams, and from tacit actions, includingonline social interactions, extracurricularactivities, posts on discussion forums, andother activities that are not directlyassessed as part of the studentseducational progress. Analysis modelsthat process and display the data assistfaculty members and school personnel ininterpretation. The goal of learninganalytics is to enable teachers andschools to tailor educational opportunitiesto each students level of need andability.Learning analytics need not simply focuson student performance. It might be usedas well to assess curricula, programs, andinstitutions. It could contribute to existingassessment efforts on a campus, helpingprovide a deeper analysis, or it might beused to transform pedagogy in a moreradical manner. It might also be used bystudents themselves, creatingopportunities for holistic synthesis acrossboth formal and informal learningactivities.Johnson et al. 2011, p. 28

1Academic analytics is described in Goldstein (2005). The term learning analytics began to be used in 2009. Differencesamong these terms are not important for purposes of this brief. The interested reader may wish to consult Elias (2011) or Longand Siemens (2011).

13

Technical methods used in learning analytics are varied and

draw from those used in educational data mining.Additionally, learning analytics may employ:

Sharing Learning Resource Data

The Learning Registry is being developedto take advantage of metadata and socialmetadata generated as educators andlearners interact with online learningresources. Data published to the LearningRegistry can serve as the basis forlearning resource analytics to helprecommend resources, detect trends inresource usage, and judge userexperience.http://www.learningregistry.org

As with educational data mining, providing a visual

representation of analytics is critical to generate actionable analyses; information is oftenrepresented as dashboards that show data in an easily digestible form.A key application of learning analytics is monitoring and predicting students learningperformance and spotting potential issues early so that interventions can be provided to identifystudents at risk of failing a course or program of study (EDUCAUSE 2010; Johnson et al. 2011).Several learning analytics models have been developed to identify student risk level in real timeto increase the students likelihood of success. Examples of such systems include PurdueUniversitys Course Signals system (Arnold 2010) and the Moodog system being used at thecourse level at the University of California, Santa Barbara, and at the institutional level at theUniversity of Alabama (EDUCAUSE 2010). Higher education institutions have shown increasedinterest in learning analytics as they face calls for more transparency and greater scrutiny of theirstudent recruitment and retention practices.Data mining of student behavior in online courses has revealed differences between successfuland unsuccessful students (as measured by final course grades) in terms of such variables aslevel of participation in discussion boards, number of emails sent, and number of quizzescompleted (Macfayden and Dawson 2010). Analytics based on these student behavior variablescan be used in feedback loops to provide more fluid and flexible curricula and to supportimmediate course alterations (e.g., sequencing of examples, exercises, and self-assessments)based on analyses of real-time learning data (Graf and Kinshuk in press).In summary, learning analytics systems apply models to answer such questions as:

When are students ready to move on to the next topic?

When are students falling behind in a course?When is a student at risk for not completing a course?What grade is a student likely to get without intervention?What is the best next course for a given student?Should a student be referred to a counselor for help?14

Visual Data Analytics

Visual Data Analysis

Visual data analysis is a way of

Visual data analysis blends highly advanced computationaldiscovering and understanding patterns inmethods with sophisticated graphics engines to tap the ability large datasets via visual interpretation. Itof humans to see patterns and structure in complex visualis used in the scientific analysis ofcomplex processes. As the tools topresentations (Johnson et al. 2010). Visual data analysis isinterpret and display data have becomedesigned to help expose patterns, trends, and exceptions inmore sophisticated, models can bevery large heterogeneous and dynamic datasets collectedmanipulated in real time, and researchersfrom complex systems. A variety of techniques and tools are are able to navigate and explore data inways that were not possible previously.emerging to enable analysts to easily interpret all sorts ofVisual data analysis is an emerging field,data. For instance, visual interactive principal componentsa blend of statistics, data mining, andvisualization that promises to make itanalysis (finding the components of a dataset that reducepossible for anyone to sift through,many variables into few) is a technique once available onlydisplay, and understand complexto statisticians that is now commonly used to detect trendsconcepts and relationships.and data correlations in multidimensional data sets.Johnson et al. 2010, p. 7Gapminder (http://www.gapminder.org/), for example, usesthis approach in its analysis of multivariate datasets overtime. Websites, such as Many Eyes (http://www958.ibm.com/software/data/cognos/manyeyes/), offer tools for any user to create visualizations(map-based, text-based clouds and networks, and charts and graphs) of personal datasets. Earlyin its release, the creators of Many Eyes discovered that it was being used for visual analytics, tocheck for data quality, to characterize social trends, and to reveal personal and collectivesentiments or advocate for a position (Vigas et al. 2008). Like Many Eyes, other onlineservices, such as Wordle and FlowingData, accept uploaded data and allow the user to configurethe output to varying degrees. To facilitate the development of this field, the NationalVisualization and Analytics Center was established by the U.S. Department of HomelandSecurity to provide strategic leadership and coordination for visual analytics technology andtools nationwide, and this has broadened into a visual analytics community(http://vacommunity.org).

The Horizon Report: 2010 Edition (Johnson et al. 2010) describes the promise of visual dataanalysis (in the four- to five-year time frame) for teaching undergraduates to model complexprocesses in such subjects as quantum physics. Visual data analysis also may help expand ourunderstanding of learning because of its ability to support the search for patterns. It may beapplied, for example, to illustrate the relationship among the variables that influence informallearning and to see the social networking processes at work in the formation of learningcommunities.Currently, the tools, techniques, and high-resolution displays that enable people to interactivelymanipulate variables or zoom through the analysis results are still found mostly in researchsettings. Because interpreting data generated for visual data analysis requires analytical15

knowledge, researchers have thus far been the major population to use this method. Nevertheless,such sites as GapMinder offer data aimed at educators and provide teacher professionaldevelopment to help educators interpret the data. Social Explorer, for example, offers tools forexploring map-based census and demographic data visualizations and is used by both researchersand educators. In the future, advances in visual data analytics and human-computer interfacedesign may well make it feasible to create tools, such as Many Eyes, that policymakers,administrators, and teachers can use.This section has described the promise of educational data mining (seeking patterns in dataacross many student actions), learning analytics (applying predictive models that provideactionable information), and visual data analytics (interactive displays of analyzed data) and howthey might serve the future of personalized learning and the development and continuousimprovement of adaptive systems. How might they operate in an adaptive learning system? Whatinputs and outputs are to be expected? In the next section, these questions are addressed bygiving a system-level view of how data mining and analytics could improve teaching andlearning by creating feedback loops.

16

Data Use in Adaptive Learning Systems

Online learning systemslearning management systems, learning platforms, and learningsoftwarehave the ability to capture streams of fine-grained learner behaviors, and the tools andtechniques described above can operate on the data to provide a variety of stakeholders withfeedback to improve teaching, learning, and educational decision making. To demonstrate howsuch adaptive systems operate, using the predictive models created by educational data miningand the system-level view of learning analytics, this section describes a prototypical learningsystem with six components (Exhibit 1):

A content management, maintenance, and delivery component interacts with students to

deliver individualized subject content and assessments to support student learning.A student learning database (or other big data repository) stores time-stamped studentinput and behaviors captured as students work within the system.A predictive model combines demographic data (from an external student informationsystem) and learning/behavior data from the student learning database to track a studentsprogress and make predictions about his or her future behaviors or performance, such asfuture course outcomes and dropouts.A reporting server uses the output of the predictive model to produce dashboards thatprovide visible feedback for various users.An adaption engine regulates the content delivery component based on the output of thepredictive model to deliver material according to a students performance level andinterests, thus ensuring continuous learning improvement.An intervention engine allows teachers, administrators, or system developers to interveneand override the automated system to better serve a students learning.

17

Exhibit 1.The Components and Data Flow Through a Typical Adaptive Learning System

Exhibit reads: The data flow is shown through a box and arrows diagram with a content box on the topwith an arrow to a student and two engines underneath shown as boxes: an adaptation engine and anintervention engine, with arrows for each up to the content box. Another arrow connects a predictivemodel box to the adaptation engine. The predictive model is connected to two databases with incomingarrows. On the right is the student learning database and on the left is the student information system.Below the predictive model and connected with an incoming arrow is a dashboard that is shownconnected with arrows to faculty and educators and administrators.

In addition to these six internal components, an adaptive learning system often uses the studentinformation system (SIS) that is maintained by a school, district, or institution as an external datasource. Student profiles from the SIS are usually downloaded in batch mode, as they do notchange often, and then are linked with performance data in the student learning database usingstudent identifiers in compliance with applicable law. Student profiles contain backgroundinformation on students that can be used to group them into specific categories or to providemore variables that might suggest a particular student is at risk.

18

The numbers in Exhibit 1 signify the data flow that creates feedback loops between the users andthe adaptive learning system. The data flow starts with Step 1, students generating inputs wheninteracting with the content delivery component. (In the future, a student may have a portablelearning record that contains information from all past interactions with online learning systems.)The inputs are time-stamped and cleaned as necessary and stored in the student learning databaseaccording to predefined structure (Step 2). At certain times (not synchronized with studentlearning activities), the predictive model fetches data for analysis from both the student learningdatabase and the SIS (Step 3). At this stage, different data mining and analytics tools and modelsmight be applied depending on the purpose of the analysis. Once the analysis is completed, theresults are used by the adaptation engine (Step 4) to adjust what should be done for a particularstudent. The content delivery component presents these adjusted computer tutoring and teachingstrategies (Step 4) to the student. The findings also may flow to the dashboard (Step 5), and, inthe last step in the data flow, various users of the system examine the reports for feedback andrespond (using the intervention engine) in ways appropriate for their role.These last steps complete feedback loops as stakeholders receive information to inform theirfuture choices and activities. Students receive feedback on their interactions with the content theyare learning through the adaptive learning system. The feedback typically includes thepercentage correct on embedded assessments and lists of concepts they have demonstratedmastery on (Exhibit 2), but it also can include detailed learning activity information (e.g., hintsrequested and problems attempted). Detailed learning information for one student can becompared with that for students who earned high grades so that students can adjust their learningwith the system accordingly.Exhibit 2. Student Dashboard Showing Recommended Next Activities

19

Teachers receive feedback on the performance of each

individual student and of the class as a whole and adjust theirMeasuring Student Effortinstructional actions to influence student learning. ByLearning software collects such data asexamining the feedback data, instructors can spot studentsminutes spent on a unit, hints used, andcommon errors, and aggregates thesewho may need additional help or encouragement to spenddata across many students in a school ormore time on the content and identify areas where the classschools in a district (Feng, Heffernan, andas a whole is struggling. The latter area can be addressedKoedinger 2009). Using these measures,teachers can distinguish betweenduring class time when the instructor can respond tostudents who are not trying and thosequestions and address student misconceptions and lack ofwho are trying but still struggling and thencomprehension. For the former areas, teachers may choose to differentiate instruction for each group.intervene with the system to adjust student learning pace ormay assign additional learning materials targeting the skillsthat are not yet mastered (see Case Study 1 on page 22). Learning systems typically track thestate of student mastery at the skill or topic level (e.g., the quadratic equation) and can providethis information to students so they know what to study and to teachers so they know the areaswhere they should concentrate further instruction (Exhibit 3). Researchers involved with theOpen Learning Initiative at Carnegie-Mellon University have a similar vision of student andteacher feedback systems that is guiding their work in developing online courses (Bajzek et al.2008) and is described in Case Study 2 on page 23.Exhibit 3. Teacher Dashboard With Skill Meter for Math Class

20

Administrators can look at detailed data across different classes to examine progress for allstudents at a school, to see what works and what does not in a particular classroom, and to do sowith less effort. District administrators can use data from this kind of dashboard as a basis fordetermining whether a particular learning intervention is effective at promoting student learning,even at the level of individual concepts (Exhibit 4). Typically, the detailed learning data thesystem provides can be disaggregated by student subgroup (for example, to see how studentswithout a course prerequisite perform or to compare males and females progress in the course),by instructor, or by year. Learning system data can support analyses of how well students learnwith particular interventions and how implementation of the intervention could be improved.Using the data, administrators can set policies, implement programs, and adapt the policies andprograms to improve teaching, learning, and completion/retention/graduation rates.Exhibit 4. Administrator Dashboard Showing Concept Proficiency for a Grade Level

Researchers can use fine-grained learner data to experiment with learning theories and toexamine the effectiveness of different types of instructional practices and different course designelements. Learning system developers can conduct rapid testing with large numbers of users toimprove online learning systems to better serve students, instructors, and administrators.Researchers using online learning systems can do experiments in which many students areassigned at random to receive different teaching or learning approaches, and learning systemdevelopers can show alternative versions of the software to many users: version A or version B.This so-called A/B testing process can answer research questions about student learning suchas: Do students learn more quickly if they receive a lot of practice on a given type of problem all21

at once (massed practice) or if practice on that type of problem is spaced out over time(spaced practice)? What about students retention of this skill? Which kind of practiceschedule is superior for fostering retention? For what kind of students, and in what contexts?

Case Study 1. Fine-grained Data Collection and Use: ASSISTments

Fine-grained student data can be structured into meaningful chunks to provide evidence of studentproblem-solving sequences, knowledge state, and strategy. An example of this use of fine-grained datathat is in wide-scale use is the ASSISTments tutoring system, currently used by more than 20,000students in the New England area. Designed by researchers at Worcester Polytechnic Institute andCarnegie Mellon University, ASSISTments combines online learning assistance and assessmentactivities. ASSISTments tutors students on concepts while they practice on problems, and provideseducators with a detailed assessment of students developing skills. While ASSISTments is widelyused in fourth to 10th grade mathematics and science, it is also finding use in English and socialstudies. This wider adoption across subjects is due in part to teachers ability to write their ownquestions.When students respond to ASSISTments problems, they receive hints and tutoring to the extent theyneed them. At the same time, ASSISTments uses information on how individual students respond tothe problems and how much support they need from the system to generate correct responses asassessment information. Each week, when students work on ASSISTments, it learns more about theirabilities and, thus, can provide increasingly appropriate tutoring for each student and can generateincreasingly accurate predictions of how well the students will do on the end-of-year standardized tests.In fact, the ASSISTments system, taking into account information on the quantity and quality of helpthat students request, has been found to be more accurate at predicting students performance on thestate examinations than the number of items students get correct on benchmark assessments (Feng,Heffernan, and Koedinger 2009).The ASSISTments system gives educators detailed reports of students mastery of 147 math skillsfrom fourth grade to 10th grade, as well as their accuracy, speed, help-seeking behavior, and numberof problem-solving attempts. The system can identify the difficulties that individual students are havingand the weaknesses demonstrated by the class as a whole so that educators can tailor the focus oftheir upcoming instruction or tailor ASSISTments to adjust its instruction.

22

Case Study 2. Meshing Learning and Assessment in Online and Blended InstructionThe online learning systems being developed through the Open Learning Initiative (OLI) at CarnegieMellon University illustrate the new advances that allow integration of learning and assessmentsystems. The OLI team set out to design learning systems incorporating the learning science principleof providing practice with feedback. In the OLI courses, feedback mechanisms are woven into a widevariety of activities. A biology course, for example, has the following components:

Interactive simulations of biological processes that students can manipulate; the studentsinteraction with the simulation is interspersed with probes to gauge his or her understanding ofhow it works.Did I Get This? quizzes after presentation of new material so that students can check forthemselves whether or not they understood, without any risk of hurting their course grade.Short essay questions embedded throughout the course material that call on students to makeconnections across concepts.Muddiest Point requests that ask students what they thought was confusing.

Tutored problem solving gives students a chance to work through complex problems and get scaffolds(e.g., showing how similar problems are solved) and hints to help them. The students receive feedbackon their solution success after doing each problem, and the system keeps track of how muchassistance students needed for each problem as well as whether or not they successfully solved it.When OLI courses are implemented in a blended instruction mode that combines online and classroomlearning, the instructors can use the data the learning system collects as students work online toidentify the topics students most need help on so that they can plan upcoming classroom activities onthose misconceptions and errors (Brown et al. 2006). OLI is now doing R&D on a digital dashboard togive instructors an easy-to-read summary of the online learning data from students taking their course.OLI has developed learning systems for postsecondary classes in engineering statics, statistics, causalreasoning, economics, French, logic and proofs, biology, chemistry, physics, and calculus. A studycontrasting the performance of students randomly assigned to the OLI statistics course with those inconventional classroom instruction found that the former achieved better learning outcomes in half thetime (Lovett, Meyer, and Thille 2008).

These case studies demonstrate practical applications of data-rich feedback loops in adaptivelearning systems. But they do not represent the full range of potential applications of educationaldata mining and learning analytics. To show this larger potential, the next section outlines broadareas where educational data mining and learning analytics can be applied, many inspired byindustry practices.

23

24

Educational Data Mining and

Learning Analytics ApplicationsEducational data mining and learning analytics research are beginning to answer increasinglycomplex questions about what a student knows and whether a student is engaged. For example,questions may concern what a short-term boost in performance in reading a word says aboutoverall learning of that word, and whether gaze-tracking machinery can learn to detect studentengagement. Researchers have experimented with new techniques for model building and alsowith new kinds of learning system data that have shown promise for predicting studentoutcomes. Previous sections presented the research goals and techniques used for educationaldata mining and learning/visual analytics. This section presents broad areas of applications thatare found in practice, especially in emerging companies. These application areas were discernedfrom the review of the published and gray literature and were used to frame the interviews withindustry experts. These areas represent the broad categories in which data mining and analyticscan be applied to online activity, especially as it relates to learning online. This is in contrast tothe more general areas for big data use, such as health care, manufacturing, and retail (seeManyika et al. 2011).These application areas are (1) modeling of user knowledge, user behavior, and user experience;(2) user profiling; (3) modeling of key concepts in a domain and modeling a domainsknowledge components, (4) and trend analysis. Another application area concerns how analyticsare used to adapt to or personalize the users experience. Each of these application areas usesdifferent sources of data, and Exhibit 5 briefly describes questions that these categories answerand lists data sources that have been used thus far in these applications. In the remainder of thissection, each area is explored in more detail along with examples from both industry practice andacademic research.

Type of Data Needed for Analysis

Students responses (correct, incorrect, partiallycorrect), time spent before responding to a prompt orquestion, hints requested, repetitions of wronganswers, and errors madeThe skills that a student practiced and totalopportunities for practice

User behavior modeling

User experiencemodeling

What do patterns of student

behavior mean for theirlearning? Are studentsmotivated?

Are users satisfied with their

experience?

User profiling

What groups do users cluster

into?

Domain modeling

What is the correct level at

which to divide topics intomodules and how shouldthese modules be sequenced?

Students performance level inferred from system

work or collected from other sources, such asstandardized testsStudents responses (correct, incorrect, partiallycorrect), time spent before responding to a prompt orquestion, hints requested, repetitions of wronganswers, and errors madeAny changes in the classroom/school context duringthe investigation period of timeResponse to surveys or questionnairesChoices, behaviors, or performance in subsequentlearning units or coursesStudents responses (correct, incorrect, partiallycorrect), time spent before responding to a prompt orquestion, hints requested, repetitions of wronganswers, and errors madeStudents responses (correct, incorrect, partiallycorrect) and performance on modules at differentgrain sizes compared to an external measureA domain model taxonomy

Learning componentanalysis andinstructional principleanalysis

Trend analysis

Which components are

effective at promotinglearning? What learningprinciples work well? Howeffective are whole curricula?

What changes over time and

how?

Associations among problems and between skills and

problemsStudents responses (correct, incorrect, partiallycorrect) and performance on modules at differentlevels of detail compared to an external measureA domain model taxonomyAssociation structure among problems and betweenskills and problemsVaries depending on what information is of interest;typically would need at least three data pointslongitudinally to be able to discern a trendData collected include enrollment records, degrees,completion, student source, and high school data inconsecutive years

Type of Data Needed for Analysis

How should the user

May need to collect historical data about the user and

also related information on the product or service tobe recommended

How can the user experience

be altered, most often in realtime?

Students academic performance record

27

User Knowledge Modeling

Tailoring Learner FeedbackAdaptive learning systems can provideResearchers and developers build and tune user models thattailored feedback that gives guidancerepresent a collection of user-specific data, especially skillsbased on analysis of fine-grained data.and knowledge. User models are used to customize and adapt The Knewton Math Readiness systemuses analytics to deliver only the contentthe systems behaviors to users specific needs so that theeach student needs and skips conceptssystems say the right thing at the right time in thethe student has already shown he or sheright way (Gerhard 2001). Inferring what a user knows,understands.i.e., user knowledge modeling, requires looking ataccumulated data that represent the interactions betweenstudents and the learning system. 2 Knowledge can be inferred from such interactions ascorrectness of student responses alone or in a series, time spent on practice before attempting toanswer a question, number and nature of hints requested, repetitions of wrong answers, anderrors made. Such inferences can be made by a predictive computer model or by a teacherlooking at student data on a dashboard.

User knowledge modeling has been adopted to build

adaptive hypermedia, recommender systems, expert systems,and intelligent tutoring systems. In intelligent tutoringsystems, user knowledge models direct key operations, suchas deciding which problems to give students. A popularmethod for estimating students knowledge is Corbett andAndersons knowledge tracing model (Corbett and Anderson1994), an approach that uses a Bayesian-network-basedmodel for estimating the probability that a student knows askill based on observations of him or her attempting toperform the skill. More recently, Baker and colleaguesproposed a new method for knowledge tracing using amachine learning approach to make contextual estimations ofthe probability that a student has guessed or slipped.Incorporating models of guessing and slipping intopredictions of student future performance was shown toincrease the accuracy of the predictions by up to 48 percent(Baker, Corbett, and Aleven 2008).

Advancing InstructionMany learning technology experts areenthusiastic about the possibility of datacompletely driving the studentsexperience. By tracking a studentsmastery of each skill, a learning systemcan give just the right amount ofinstruction. Other experts caution againstallowing analytics to completely determinewhat problems or skills students practicenext or whether they advance to the nexttopic. Automatically holding a studentback on the assumption that difficulty withone topic will preclude making progresson another may not be the best course ofaction (Means, Chelemer, and Knapp1991).

2Even though one could envision that continuous knowledge modeling could supplant traditional assessments, the technicalworking group still saw a need for end-of-course or state-level assessments as a check on this more local and possibly moreformative type of assessment.

28

Student knowledge modeling is a common component of commercial learning software. How

these models are used to adapt instruction varies. For example, one company builds dynamicstudent models for determining a students readiness to move to new learning content and thenadvances the student automatically. Other companies resist automatic advancement, and insteadtheir systems offer suggestions to teachers after detecting a students placement. Othercompanies are trying a middle approach: If students are performing above average, they receivesuggestions to move on to new content; otherwise, they are encouraged to consolidate currentskills and work on prerequisites.As an example of using student modeling, learning software can collect such data as how manyminutes are spent on a unit, how many hints were used, and common errors. The data for anindividual student can then be compared against a model built from a large number of students.The industry expert we interviewed from Agile Mind, a learning software company, explainedthat these data enable teachers to distinguish between students who are not trying and those whoare trying but still struggling. This information then helps teachers use different instructionalstrategies for these two groups of students. Agile Mind, however, cautions against allowing thedata to drive what a student sees next or allowing the data to prevent a student from advancingbecause, according to the data, he or she has not achieved mastery. Not enough is known aboutthe dependencies among topics to make these decisions in a completely automated manner.In contrast, the Onsophic Inc. online learning platform collects data at a very granular level (pertopic) for each student and detects student mastery at this topic level (e.g., quadratic equation)rather than at the course level. Plans are to provide students with detailed feedback, such as, Aweek ago, you were yellow on a prerequisite but now you are struggling on this topic. Wesuggest that you make sure you have a solid foundation on this topic through practicing on theprerequisite.

User Behavior Modeling

User behavior modeling in education often characterizes student actions as on- or off-task andcan be used as a proxy for student engagement. It relies on the same kinds of learning data usedin predicting user knowledge plus other measures, such as how much time a student has spentonline (or on the system), whether a student has completed a course, documented changes in theclassroom or school context, attendance, tardiness, and sometimes a students level of knowledgeas inferred from his or her work with the learning system or from other such data sources asstandardized test scores. Baker and colleagues have conducted a series of studies on detectingand adapting to students off-task behaviors (called gaming the system) in adaptive learningsystems that teach algebra (Baker et al. 2004, 2006). They found that gaming behaviors (such asclicking until the system provides a correct answer and advancing within the curriculum bysystematically taking advantage of regularities in the softwares feedback and help) were29

strongly associated with less learning for students with below-average academic achievementlevels. In response, they modified the system to detect and respond to these students and providethem with supplementary exercises, which led to considerably better learning. Similar researchhas been done in unscripted environments that are more open-ended than the well-defineddomain of mathematics. For instance, Blikstein (2011) has presented an automated technique anda case study to assess, analyze, and visualize behaviors of students learning computerprogramming.Online learning systems log student data that can be mined to detect student behaviors thatcorrelate with learning. Macfayden and Dawson (2010) analyzed learning management systemtracking data from a Blackboard Vista-supported course and found variables that correlated withstudent final grade. Fewer than five variables were found to account for 30 percent of thevariation in student final grades, and their model could correctly pick 81 percent of students whofailed the course.Not all learning software companies have adopted user behavior modeling. Those that havecollect and provide data to teachers to help them diagnose student learning issues. CarnegieLearning reported that its user behavior modeling was able to detect shifts in the classroom, suchas the use of a substitute teacher, a teachers lack of attention to an online learning system, or aclassroom visit by a trainer for the learning system. Social gaming companies, such as Zynga, tryto predict what users want and will do next in a game to find out how to make games more funand get users more engaged. Others companies, such as Onsophic, Inc. are testing whethercapturing on- and off-task behaviors can help them understand online learning throughaddressing such questions as: Does more interaction between the student and the system lead toincreased learning? Do people learn more from items they show interest in? What patterns ofinteractions are associated with more learning?

User Experience Modeling

User experience modelingascertaining whether a student is satisfied with the learningexperiencecan be judged by students responses to follow-up surveys or questionnaires and bytheir choices, behaviors, performance, and retention in subsequent learning units or courses.User experience modeling has been most popular in such Web-based applications as onlineshopping. Some of the interviewees companies model user experience through methods otherthan data mining. Zynga explicitly asks users for their reactions via a survey, conducts userstudies, or has humans conduct postmortem analyses (much like Googles researchers who lookat failed searches). Zynga described an extended approach to user experience modeling: Asample of users can be surveyed about their experience, and then their behavior can be correlatedwith their survey results as a way to confirm what they said. Zynga also is experimenting with a30

more leading-edge approach: analyzing free-text responses given by users in responding to a

survey (this is most useful when the sample of users is large, e.g., 250,000 users).Compared with commercial applications of user experience modeling, less work has been donein education to use analytics to improve students learning experience and foster their successand retention rate. Dawson, Heathcote, and Poole (2010) examined how effective highereducation institutions have been in harnessing the data-capture mechanisms from their studentinformation systems, learning management systems, and communication tools for improvingstudent learning experiences and informing practitioners of the achievement of specific learningoutcomes. They found that if the multiple means through which students engage with universitysystems are considered, individual activity can be tracked throughout the entire student lifecyclefrom initial admission through course progression, and finally graduation andemployment transitions. The combined data captured by various systems build a detailed pictureof the activities that students, instructors, service areas, and the institution as a whole undertakeand can be used to improve relevance, efficiency, and effectiveness in a higher educationinstitution.User experience, as measured by retention, is important for companies offering commercialonline courses. Kaplan, Inc. uses retention to judge whether its product is meeting customerneeds. Kaplan has experimented with course redesigns using analytics. In one redesignexperiment, it changed courses on topics, such as nutrition, interpersonal communication, andmedical terminology. The old courses had students follow a classic online learning sequence ofRead, Write, Discuss, and the new courses were more active, using a Prepare, Practice,Perform learning sequence. The new courses were carefully designed to make them easy to useand to give them a clean, simple look with good production values, paying attention to researchon how media, audio, and text best reinforce, as opposed to distract from, learning. Theredesigned versions offered opportunities for students to get help when they need it, as well asbuilt-in assessments and quick surveys of self-efficacy and perceived value, and provided muchmore structured support for faculty as well.Kaplans analytics group collected time spent on redesigned course components, periodicsurveys of students motivation state during the course, and learning performance. Kaplan thenlooked at instructor satisfaction, student satisfaction, performance on embedded learningassessments, whether the student passed the course, and whether the student was retained untilthe next semester. Through A/B testing Kaplan was able to ascertain that the new course wasbetter overall. But this was visible only via multiple measures: Instructors preferred the redesign;students did better on the assessments, spent more time on the materials, and were more likely topass and take the next course. Of interest, however, is that students reported liking the oldversion more.

31

User ProfilingA user profile is a collection of personal data describing the essential characteristics of a user.User profiling refers to the process of constructing and applying student or group profiles usingdata mining and machine learning algorithms. Because students differ in their preferences,interests, background, and even goals for learning, the long-term objective of user profiling isoften to provide adapted and personalized learning environments for individuals or groups ofstudents to maximize learning effectiveness and efficiency.Profiling technologies can be applied in a variety of domains and for a variety of purposes.Knowledge about customer behavior and preferences is of great interest to the commercialsector. With profiling technologies, companies can predict the behavior of different types ofcustomers. Marketing strategies, such as personalized advertising, then can be tailored to thepeople fitting these types.In education, data mining techniques, such as classification and clustering, are often used tocategorize (or profile) students based on the kinds of personal learning data described in thesection on the research base, on student demographic data, or both. Kardan and Conati (2011)proposed a user modeling framework that relies on interaction logs to identify different types oflearners, as well as their characteristic interactions with the learning system. This informationwould then be used to classify new learners, with the long-term goal of providing adaptiveinteraction support when behaviors detrimental to learning are detected, or to learn ways tosupport engaged behavior. Classification also can group students together into study groups orother joint learning activities.Gaming companies automatically cluster users into groups using behavioral data and usedifferent strategies with each group to increase engagement and reduce drop-offs in playing.These groups emerge from the data and often are named based on human interpretations of theemergent patterns, for example, casual players, weekenders, social players, big spenders,decorators, and the like. In practice, these user groups may not always be informative oractionable, although groupings based on purchasing habits have proven useful forrecommendation services. Representatives of one of the learning companies interviewed werehesitant to provide automatic recommendations for students based on profiles, believing thatevidence for the effectiveness of such adaptations is not sufficient. Instead, this company hasfound that concentrating on assignments, concept strands, standards, and students who do or donot have mastery of the concepts in a standard is more fruitful than classifying students intogroups based on learner types. In contrast, those of another company interviewed for this reportare working to classify users based on understandings, learning trajectories, motivation, andpossibly even cultural background. They are researching how this helps teachers differentiateinstruction.32

Domain ModelingA domain model is often created to represent the key concepts that make up a subject or topicarea like mathematics or art history (i.e., domains). The domain model also identifies therelationships among all the key concepts or units of study. Research in domain modeling ineducational data mining and learning analytics investigates how learning is affected bydifferences in how a topic is divided into key concepts at a particular level of generalization. Forexample, a state may specify that students in eighth grade must learn data analysis, statistics, andprobability. A finer level requires teaching students to understand data presentation techniques;that is, students learn that data can be represented as number lines, bar graphs, circle graphs,stem and leaf plots, and so on. For a learning environment, it may be sufficient to test studentperformance and adapt at the data presentation level. However, there may be advantages topresenting sequences of related concepts (such as graph types) in a specific order. Researcherswho use data mining to study difference in approaches to domain modeling use a taxonomy ofthe domain, associations among skills (such as prerequisites), user responses (includingcorrectness), and actions over time on individual learning resources (such as a unit concept likemultiplication of whole numbers).Domain modeling has been adopted as an approach to fine-tune learning systems to better servelearning and instruction. For instance, Martin et al. (2011) described three studies to demonstratehow learning curves can be used to drive changes in the user model for personalized learningenvironments. Learning curves (i.e., some measure of performance against opportunities to learnand practice) for subsets of the domain model were shown to yield insight into theappropriateness of the models structure and granularity. Martin et al. also used learning curvesto analyze large amounts of user data to fine-tune a systems domain model.In the education industry, some learning software companies have the goal of collecting data onatomic learning objects (i.e., objects that teach one concept that cannot be decomposed) andcreating linking relationships among topics based on user tags or other actions. They intend topair this technique with a feature that enables users to improve on any automatically builtrelationships or to create their own taxonomies.

33

Learning System Components and Instructional Principle Analysis

Instructional principle analysis examines components of a learning system and types ofinstructional practices adopted at various time points or for various student groups to addresssuch questions as:

Which learning components are effective at promoting learning?

Does a newly developed curriculum enable more learning than an alternative?What types of instructional practice are more effective in promoting learning (e.g.,massed practice vs. spaced practice)?

Answering these questions entails collecting such data as student input and response correctness,student actions on learning system components over time, when and to which group a specificinstructional strategy was applied, and students performance on pre/posttests and/or delayedtests or their standardized test results.Because studying the effectiveness of different learning system components and instructionalpractices can contribute to the design of better learning systems and has strong implications forstudent learning, it has been a key area of interest for educational data mining and analyticsresearchers, as evidenced by widely cited papers that reported using educational data mining tostudy and improve online courses (Baker and Yacef 2009). For example, researchers andeducators from Carnegie Learning, Inc. and Carnegie Mellon University have been working tobuild cognitive models of mathematics, which have become the basis for middle school and highschool curricula incorporating the Cognitive Tutor, an intelligent tutoring system. In thesesystems, complex tasks are decomposed into individual knowledge components, and a model isused to follow students actions and diagnose their strategy in solving a problem. Each actionthat the student takes is associated with one or more skills. In this way researchers have beenable to use Cognitive Tutor data to dynamically evaluate the effectiveness of instruction at amore detailed level. Evaluations and improvements have been conducted over the past 15 years(Ritter et al. 2007).To discover which pedagogical support is most effective, Beck and Mostow (2008) proposedlearning decomposition as an alternative to traditional A/B testing methods. As a type ofrelationship mining, learning decomposition involves fitting exponential learning curves toperformance data and relating student success to the amount of each type of pedagogical supporta student has received (with a weight for each type of support). The weights indicate howeffective each type of pedagogical support is for improving learning.One company uses data from many teachers to identify the pedagogical patterns of effectiveteachers, i.e., teachers whose students learn the most or are most engaged. The company is

34

training other teachers in the same techniques and studying what happens in the learning systemwhen these other teachers adopt those patterns.

Trend AnalysisTrend analysis in general refers to the practice of collecting information and attempting to spot asequential pattern, or trend, in the information over time. Web-based companies use trendanalysis to predict what users might be searching for or be interested in or how user participationramps up or falls off. In education, trend analysis helps answer such questions as what changeshave occurred in student learning over time and how learning has changed. At the school level,trend analysis can be used to examine test scores and other student indicators over time to helpadministrators determine the impact of policies. In educational data mining, trend analysis oftenrefers to techniques for extracting an underlying pattern, which might be partly or nearlycompletely hidden by data that does not contribute to the pattern (i.e., noise). Although the actualdata needed for trend analysis vary depending on what information is of interest, typicallylongitudinal data from at least three points in time are required.As an example of trend analysis, the Postsecondary Education Commission of Californiaprovides a trend analysis tool at http://www.cpec.ca.gov/OnLineData/Mining.asp. This tool canbe used to examine the commissions database tables to identify trends. It also can be used todiscover anomalies with the data, such as large numerical differences between consecutive yearsand gaps when no data were reported. Visitors can generate customized reports on enrollment,degree completion, student home school, and high school data.

Adaptation and Personalization

Personalization, as defined in the NETP (U.S. Department of Education, 2010a), indicatesadaptive pacing, styling instruction to learning preferences, and tailoring content to learnersinterest. We use adaptation to indicate the changes a system (interface or behavior) or instructormakes in response to students, thereby personalizing their experience. Adaptation andpersonalization address such questions as: How should the user experience be changed for thisuser? How can user experience be altered to best serve individual users in real time? Userclassification techniques and trend or sequence analyses are often applied to create models foradapting instruction to students needs. These adaptations may include recommendations orfeedback to students about their best next actions and changes to their experience with an onlinelearning system (such as different content, more practice, or signals about their progress througha course).

35

To adapt instruction or personalize student learning experiences, such data as sequences of

student activity, information on the problems or steps a user has attempted, and studentdemographic information are often collected and used to create a personal profile for eachsystem user. Researchers from Austria (Kck and Paramythis 2011) investigated the monitoringand interpretation of sequential learning activities to improve adaptation and personalizeeducational environments. They analyzed student problem-solving data from a physics tutoringsystem (VanLehn et al. 2005) by first converting activity sequences in the raw data into chainlike models and then clustering sequences to detect problem-solving styles. These models areused to adapt the tutoring system to students preferred learning methods.This section has described broad categories of applications that exploit educational data miningand learning analytics techniques to adapt and personalize learning and improve teaching. Theserepresent the promise of educational data mining and learning analytics, with the caveat thatsome are still in the research stage. The next section examines challenges and considerations tobring these techniques into K12 and higher education.

36

Implementation Challenges and Considerations

New technology start-ups founded on big data (e.g., Knewton, Desire2Learn) are optimisticabout applying data mining and analyticsuser and domain modeling and trend analysistoadapt their online learning systems to offer users a personalized experience. Companies thatown personal data (e.g., Yahoo!, Google, LinkedIn, Facebook) have supported open-sourcedevelopments of big data software (e.g., Apache Foundations Hadoop) and encourage collectivelearning through public gatherings of developers to train them on the use of these tools (calledhackdays or hackathons). The big data community is, in general, more tolerant of public trialand-error efforts as they push data mining and analytics technology to maturity. 3 What is the gapbetween the big data applications in the commerce, social, and service sectors and K20education? The 2012 Horizon Reports short list of projects to watch in higher education showslearning analytics in the two- to three-year range for widespread adoption (New MediaConsortium 2012). Given that learning analytics practices have been applied primarily in highereducation thus far, the time to full adoption may be longer in different educational settings, suchas K12 institutions.This section describes the challenges in implementing data mining and learning analytics withinK20 settings. Experts pose a range of implementation considerations and potential barriers toadopting educational data mining and learning analytics, including technical challenges,institutional capacity, legal, and ethical issues. Successful application of educational data miningand learning analytics will not come without effort, cost, and a change in educational culture tomore frequent use of data to make decisions (U.S. Department of Education 2010b).

As an example, consider the contrasting cases described for user profiling. Representatives of one learning company believed itwas ineffective, while representatives of another were willing to experiment with it as a differentiator for their company.

37

Technical ChallengesOnline learning technologies offer researchers and developers opportunities for creatingpersonalized learning environments based on large datasets that can be analyzed to supportcontinuous improvement. However, these benefits depend on managing all the data that can nowbe captured in real time across many students. A challenge for successful implementation ofeducational data mining and learning analytics techniques is having sufficient technical resourcesfor using big data and incurring the expenses associated with software services and storage ineither remote servers provided by a company or local servers. Although data mining andanalytics are used in some courses and institutions, computer scientists are still working onreducing the computer memory requirements needed to support advanced algorithms, and someexperts are not optimistic about the near-term resolution of this issue.In response to this big data challenge, a few key issues must be considered for each case whenimplementing data mining and analytics. These include choosing what data to collect, focusingon the questions to be answered, and making sure that the data align with the questions.Developers must be strategic about what data to collect and study the analytic techniques neededto answer the most pressing questions. One expert interviewed stressed the importance of startingout by understanding what questions data mining and analytics can answer: If you have 100people working, I would allocate 99 for identifying what questions to answer and one for [thetechnical process of] data mining.Lack of data interoperability 4 among different data systems imposes a challenge to data miningand analytics that rely on diverse and distributed data. Over time, piecemeal purchases ofsoftware can lead to significant decentralization of the source of education data, such as studentinformation systems, teachers online grade books, homework submission systems, andpublishers online assignments, homework help, and assessments. The National Center forEducation Statistics (NCES) is supporting efforts to create interoperability for state longitudinaldata (early learning through the workforce) that includes, in some cases, grades, standardized testscores, attendance, enrollment, and other administrative and demographic data. The CommonEducation Data Standards (https://ceds.ed.gov/) is an NCES-supported effort to create andencourage the use of voluntary standards for student data. Adoption of these standards is animportant first step to moving data across disparate data systems, and across institutions,education levels, and school years.

Data interoperability refers to a property of a system whose input/output data flow and formats are completely understood byother systems so that data from such systems can be integrated or exchanged seamlessly for analysis.

38

Researchers in educational data mining and learning analytics seek to make claims about astudents learning topics or concepts based on the students interaction with an online learningsystem. These claims can be validated by comparing scores on assessments and course grades.Going beyond one dataset to combining multiple sources of data (e.g., multiple tests, bothteacher-made and standardized; behavioral assessments; or online behavior tracking) in order toprovide an integrated view of a students progress is not a straightforward task. Existing datasetsmay not have been designed to support creating profiles of student behaviors and, for example,may leave out data that could be an important variable in a model. Combining disparate datasources to make claims about student learning is known to be fraught with difficulties inassessment and, when used for high-stakes actions, must meet appropriate standards for validstudent assessment.

39

Limitations in Institutional Capacity

Open Research Questions

What is the right amount of data to collect?

Technical challenges can be overcome through research,

development, and testing; computing and storage can bebudgeted as part of an institutions infrastructure costs.However, implementing data mining and learning analyticsin K20 institutions has costs that go beyond simplycomputing and storage. Significant human resources also areneeded for data preparation, processing, and analysis.Integrating existing data systems, such as grade books, withstudent information systems can be expensive, and therequirements can exceed the capabilities of the informationtechnology department of a single institution. Our expertsreported that at least 70 percent and often 80 to 85 percent ofthe effort in data analytics is devoted to data cleaning,formatting, and alignment and suggested that education hasthe further complication of needing to move data acrossdifferent levels of the system, back and forth betweenclassroom, school, district, and state databases.If technical challenges can be overcome and data can beprepared and analyzed, smart consumers are needed to usethe data. Today, teachers and school leaders are surroundedby many data reports and often are frustrated by how muchwork is required to sort the useful from the useless. Datadashboards need to be adapted to everyday users. Educationresearchers and software developers must obtain a goodunderstanding of the challenges from the users perspectivefor adoption and implementation of data mining andanalytics in classrooms, schools, districts, and otherinstitutions to be successful. This will enable them to posequestions that matter to teachers and other users and to framefindings in a thoughtful, informative way that highlights andrecommends clear actions.In reports about the newest technologies for adaptation,personalization, and recommendation, the role of humanjudgment is sometimes underemphasized (with the exceptionof visual data analytics). All the experts consulted for thisissue brief emphasized the key role that people play in many

Experts from the learning analytics field

tend to favor a top-down approach:Meaningful questions should be posed todrive the data collection and analysis. Theyadvocate a targeted strategy of collectingthe right data in the right form at the outset.In contrast, data mining researchers favora bottom-up approach supported by a moreinclusive data collection strategy. Theybelieve that collecting more data allows forexploratory data mining approaches inwhich a main question drives analysis, butthe large amount of data collected supportsfinding unexpected patterns.Solutions from commercial companies havealso shown promise in a middle ground,such as collecting dense usage data from arandomly selected sample of users toinform product improvement.What is the right data structure?Given the heterogeneous (many datasources) and hierarchical (multiple levels)nature of educational data, determiningdata structures and data formats thataccurately represent an event underconsideration become key. A basic dataformat may be a learning transactiongenerated by the system, the student, orthe interactions between the two.The best data structure and analytictechniques are determined by the types ofproblems to be solved. Answering afocused question takes extensive datacleaning and extraction, and it is veryimportant to have the best analyticalgorithm. Pattern-seeking approaches,such as outlier detection (e.g., to detectatypical student behavior, such as novicemistakes or irregular learning), on the otherhand, require less data cleaning and canemploy a coarser algorithm.

40

steps of the data mining and analytics process. Smart data consumers can help determine whatquestions to address, what data to collect, and how to make reports meaningful and actionable.They can also help interpret data, discern and label patterns, and guide model building. Datamining and analytics technology play a supporting role in the essentially human and social effortof making meaning out of experience. One expert interviewed stressed that data mining andanalytics do not give answers when just unleashed on a big data warehouse. Instead, therecommendation was to approach the problem in an informed way, considering what can beacted on, what evidence can came from data analysis, and what early pilots of the data miningand analytics applications reveal.Smart data consumers must learn to keep an open mind to what the data say. Data mining andanalytics techniques can confirm or disconfirm teachers and students beliefs about studentknowledge, abilities, and effort. Sometimes, these beliefs are not consistent with the data:Teachers may believe particular students are more or less capable than they are, and studentsmay report spending more time and effort on learning than they actually do. For example, onecompany found in an A/B study it conducted on the use of visualizations that students were moreengaged when complex visualizations were included in the software. Students identifiedcomplexity as a source of their engagement, but teachers thought the visualizations were toocomplex, underestimating what the students were capable of understanding.

Privacy and Ethics Issues

It has been acknowledged for many years (e.g., Kobsa 1990) that personalized interaction anduser modeling have significant privacy implications because personal information about usersneeds to be collected to customize software to individuals. Press coverage and recent FederalTrade Commission rulings have highlighted online companies privacy protection lapses. Datamining researchers have exposed obvious weaknesses, e.g., querying a social network forregistered email addresses on a large scale (Balduzzi et al. 2010). 5 Consumer surveys(ChoiceStream 2005) often show that while online users value personalized content, they are alsoconcerned about their privacy on the Internet. At the same time, privacy versus personalization isnot a simple trade-off: A more complete set of factors includes personal and communityattitudes, how far the disclosed information differs from the norm, and even how much usersknow about what was disclosed and how much control they have over it (Kobsa 2007).

Starting with a list of about 10.4 million email addresses, Balduzzi et al. (2010) were able to automatically identify more than1.2 million user profiles associated with the addresses. By searching through these profiles, they collected publicly availablepersonal information about each user. After being exposed, this social networks vulnerability was repaired.

41

Education institutions must consider privacy, policy and legal issues when collecting, storing,analyzing, and disclosing personally identifiable information from students education records tothird parties for data mining and analytics. The Family Educational Rights and Privacy Act(FERPA) is a federal law that protects the privacy of students education records. However,FERPA generally allows for the disclosure of personally identifiable information from astudents education record without consent to school officials if there is a legitimate educationinterest. 6 When a school controls learning software on its own hardware, or hosting is providedby a district or county computing facility, its IT department standards are in force as they wouldbe for any student data, such as a grade book and attendance records. If the institution purchasesan externally hosted analytics-based solution from a third party, de-identified student and teacherdata will need to be released to fine-tune predictive models or be used in models to generateactionable intelligence. As with other kinds of analyses on large sets of longitudinal data,analyses that result in disclosure may be hard to foresee. In such cases, the more features of thedata that are released (e.g., time of day homework was done simultaneously) the more valuablepredictions can be (e.g., hours of operation for school-based homework centers) and the higherthe likelihood of unintended disclosure (e.g., by pinpointing students who work after school).A full discussion of privacy and confidentiality is beyond the scope of this document. The moveto build statewide longitudinal data systems has raised similar concerns, and, in response,resources are available that address data management for education and research purposes, suchas the technical brief series from the Departments National Center for Educational Statistics(e.g., U.S. Department of Education, 2010c). Recent guidance on FERPA (U.S. Department ofEducation, 2012a) has helped clarify how institutions may use detailed and longitudinal studentdata for research, accountability, and school improvement under certain conditions incompliance with FERPA. These revisions to the existing FERPA regulations increase access todata for research and evaluation (including sharing across levels, such as from high school tocollege) while maintaining student privacy and parents rights (U.S. Department of Education,2012b).Educational data mining and learning analytics make predictions and recommend actions basedon increased visibility into student actions, and these give rise to a number of social and ethicalconcerns. Experts cited the ethical obligation to act on the knowledge about students gainedthrough data mining. Educational data analysts should share their insights with those who canbenefit from them (for example, students, teachers, and school districts), and what is shared mustbe framed in a way that benefits rather than harms. For example, is it useful to share with aparticular student that he has only a 20 percent chance of success in a course given his past6Pursuant to 34 CFR 99.31(a)(1) of the FERPA regulations, prior consent is not required to disclose education records to"school officials" with "legitimate educational interests" so long as the disclosing education institution or agency provides annualnotification to its students regarding who constitutes a school official and what constitutes a legitimate education interest.

42

performance? What is the impact of this finding on the classroom and on the teachers practices?What will happen to the student-teacher relationship once such results are released?Policymakers bear an ethical responsibility to investigate the validity of any predictive modelthat is used to make consequential decisions about students. Policymakers must be able toexplain the evidence for predictions and the actions taken by the computer system on the basis oflearning analytics. Analysts conducting data mining may discover patterns or associations thatwere previously unknown and that involve sensitive information (e.g., teacher performance orstudents family situation), and validating them with external observations and further datacollection will be needed.

43

44

RecommendationsEducation institutions pioneering the use of data mining and learning analytics are starting to seea payoff in improved learning and student retention (Koedinger, McLaughlin, and Heffernan2010). As described in a practice guide of the Department of Educations Institute of EducationSciences (Hamilton et al. 2009), working from student data can help educators both trackacademic progress and understand which instructional practices are effective. The guidedescribes also how students can examine their own assessment data to identify their strengthsand weaknesses and set learning goals for themselves. Recommendations from this guide are thatK12 schools should have a clear strategy for developing a data-driven culture and aconcentrated focus on building the infrastructure required to aggregate and visualize data trendsin timely and meaningful ways, a strategy that builds in privacy and ethical considerations at thebeginning. The vision that data can be used by educators to drive instructional improvement andby students to help monitor their own learning is not new (e.g., Wayman 2005). However, thefeasibility of implementing a data-driven approach to learning is greater with the more detailedlearning microdata generated when students learn online, with newly available tools for datamining and analytics, with more awareness of how these data and tools can be used for productimprovement and in commercial applications, and with growing evidence of their practicalapplication and utility in K12 and higher education. There is also substantial evidence ofeffectiveness in other areas, such as energy and health care (Manyika et al. 2011).Internet businessesboth providers of general commodities and services, and learning softwarecompanieshave discovered the power of using data for rapid improvement of their practicesthrough experimentation and measurement of change that is understandable and that leads toactionable next steps. The key for data analysis consumers, such as students, parents, teachers,and administrators, is that the data are presented in such a way that they are clearly answering aquestion being asked and point toward an action that is within the data consumers repertoire.In the remainder of this section, in addition to these existing recommendations specific ones foreducators, researchers, and developers using educational data mining and learning analytics areprovided. Possible collaborations across sectors, and the role of states in supporting the adoptionof analytics applications, also are addressed.

45

EducatorsStakeholders in the K12 and higher education sectors should increase the use of educationaldata mining and learning analytics to improve student learning. The experts and TWGrecommendations to facilitate adoption, including the role of states, are as follows.Educators should develop a culture of using data for making instructional decisions. Thisbrief builds on the recommendations of the U.S. Department of Education (2010b) report callingfor development of the mind-set that using data more strategically can drive schoolimprovement. Educators need to experience having student data that tell them something usefuland actionable about teaching and learning. This means that instructors must have near-real-timeaccess to easy-to-understand visual representations of student learning data at a level of detailthat can inform their instructional decisions. Scores on an achievement test taken six months agodo not tell a teacher how to help a particular student tomorrow. The kinds of data provided toinstructors need to be truly helpful in making instructional decisions, and instructors will need tocome to these learning data with a different mind-set than that engendered by data systemsgeared to serving purposes of accountability.Districts and institutions of higher education need to understand that their informationtechnology department is part of the effort to improve instruction but is not the onlyresponsible department. Establishing a data-driven culture requires much more than simplybuying a computer system. District staff from the information technology department need tojoin with assessment, curriculum, and instruction staff, as well as top decision makers, and worktogether to iteratively develop and improve data collection, processing, analysis, anddissemination. A U.S. Department of Education report (Hamilton et al. 2009) suggests thatdistricts foster a culture of using data by beginning with such questions as: Which instructionalmaterials or approaches have been most effective in promoting student learning of this area ofmath content? Are there differences in course success rates for students coming in to our highschools from different feeder schools? Are there teachers who are particularly successful interms of their students learning gains whose practice might serve as a model for others?Understand all details of a proposed solution. When purchasing learning software or learningmanagement systems, districts should demand details about the kinds of learning analytics thesystem will generate and make sure the system will provide teachers and school leaders withinformation they can use to improve teaching and learning: What are the analytics based on?Have these measures been validated? Who gets to see the analytic data and in what format, andwhat do they have to do to gain access? If students, teachers, and district administrators will usevisualizations or other reports from a data mining or an analytics package, they should evaluatethe solution to make sure the data are presented in a comprehensible way. Give teachers theopportunity to ask questions about data mining and analytics that go beyond the numbers, colors,or charts and instead probe the value that the analytics system will bring to them and the steps46

they can take in response to the data the system will give them. Any predictive models proposedfor consequential use (such as assigning students to services or qualifying them for advancedcourses) should be transparent and backed up by solid empirical evidence based on data fromsimilar institutions.Start small and leverage the work of others. It can be tempting to latch on to a solution thatpromises to integrate all data systems to support powerful learning analytics. But the experienceof districts pioneering the use of data-driven decision making suggests that there are no easyturnkey solutions (Hamilton et al. 2009). Districts and higher education institutions typicallyhave much more data than they actually use to inform their actions. Part of the problem is thatdata reside in multiple systems in different formats. The development of standards for educationinformation systems, software to facilitate data integration from multiple systems, and designingeasy-to-use data dashboards on top of different data systems are all active areas of technologydevelopment. At the present time, however, districts typically incur significant costs when tryingto integrate data across different systems. In addition to technology and user interfacedevelopment costs are the costs involved in developing staff capacity for using data in smartways. Adoption should be conceptualized as a set of processes and ongoing investments ratherthan a one-time purchase of a single product or technology. Data mining and analytics can bedone on a small scale. In fact, starting with a small-scale application can be a strategy forbuilding a receptive culture for data use and continuous improvement that can prepare a districtto make the best use of more powerful, economical systems as they become available. Startingsmall can mean looking at data from assessments embedded in low-cost or open learning systemsand correlating those data with student grades and achievement test scores. Some openeducational software systems that provide analytics are listed in the Selected Websites: OnlineLearning Systems with Analytics section at the end of this report.Help students and parents understand the source and usefulness of learning data. Ascolleges and schools move toward the use of fine-grained data from learning systems and studentdata aggregated from multiple sources, they need to help students understand where the datacome from, how the data are used by learning systems, and how they can use the data to informtheir own choices and actions. Feedback is an important variable in changing behavior, andresearch on systems like Purdues Signals suggests that many students will respond appropriatelyin the face of feedback that they understand. Similarly, parents can help their children makesmarter choices if they have access to student data and understand how the data are generatedand what they mean.Align state policy to support the move to online learning. State policy plays an importantleadership role in the changes required to adopt an analytics-focused approach to education. Tosupport adoption of online and digital learning at the district and school level, state-levelorganizations must advocate for and set policies to follow road maps to implement change.Efforts to support reform implementations, such as Digital Learning Now47

(http://digitallearningnow.com) and the Data Quality Campaign

(http://www.dataqualitycampaign.org), highlight requirements for better data systems, broaderand faster Internet connections, one-to-one Internet access for all students, online assessmentstuned to measure mastery, interoperable and portable electronic student records, and professionaldevelopment for educators and administrators. Leadership across the state is required fromgovernors, education chiefs, legislators, and education boards.Taking advantage of this kind of digital data infrastructure also will require research to developand validate new techniques for efficiently extracting evidence of the effectiveness of specificinstructional interventions or approaches. Learning and education research provide a basis foridentifying key variables to examine as potential predictors of students learning and educationalattainment. If these variables are captured in connected data systems, data analytics techniquescan determine the extent to which there are relationships between them and desiredoutcomes, providing evidence both for improving and for choosing among instructional productsand practices. While such analyses would not meet the current gold standard of evidence fromrandom-assignment experiments, they would prove convincing to many educationalpractitioners, particularly when they are replicated across multiple data sets by multipleresearchers. An ongoing effort, sponsored by the Office of Educational Technology, isexamining the issue of an appropriate evidence framework for digital learning and a draft reportis expected by the end of 2012.

48

Researchers and Developers

R&D in educational data mining and learning analytics occurs in both academic and commercialorganizations. Research and development are tightly linked, as the field seeks to understandbasic processes of data interpretation, decision making, and learning and to use those insights todevelop better systems. We encourage the R&D community to consider these recommendations,as well as continuing experimentation that show evidence of the impact of these approaches onstudent learning.Conduct research on the usability and impact of alternative ways of presenting finegrained learning data to instructors, students, and parents. Data visualizations provide animportant bridge between technology systems and data analytics, and determining how to designvisualizations that practitioners can easily interpret is an active area of research. Solving thisproblem will require identifying the kinds of choices or decisions that teachers, students, andparents want to make with fine-grained learning data, and the time pressure and cognitive loadfactors present when different kinds of decisions are made.Develop decision supports and recommendation engines that minimize the extent to whichinstructors need to actively analyze data. The teacher in a truly instrumented classroom wouldhave much more than access to student scores on state and district tests. Diagnostic real-timeassessment tools and decision support systems would enable the instructor to work withautomated systems to make decisions on the fly to improve instruction for all students(Crawford et al. 2008). But conscious labor-intensive processing of data is not possible under thetime constraints of efficient classroom management. To support teachers in the act of instruction,we need decision supports and recommendation systems that link student learning profiles torecommended instructional actions and learning resources. We give such tools to physicians andmilitary decision makers; education is no less complex and no less important.Continue to perfect the anonymization of data and tools for data aggregation anddisaggregation that protect individual privacy yet ensure advancements in the use ofeducational data. Recent amendments to the FERPA regulations have provided clarification onthe legality of states and school districts disclosing student data for audit, evaluation, or studypurposes. Much remains to be done, however, in figuring out how to support aggregation anddisaggregation of student data at different levels of the education system (classroom, school,district, state) in ways that make it possible to combine data from different sources yet protectstudent privacy in compliance with applicable law.Develop models for how learning analytics and recommendation systems developed in onecontext can be adapted and repurposed efficiently for other contexts. Differences ineducational contexts have made it a challenge to transfer developed predictive models acrosseducational settings. Because students, administrative policies, course programs (e.g., four-year49

vs. community colleges), and/or adopted learning systems often vary among institutions, studentlearning data that can be collected changes, too. Thus, a model developed for one institutionusually cannot be applied directly and efficiently to another without research into whether it mustbe changed for the new context (Laura and Baron 2011). Understanding how this process canbecome more efficient will be key to scaling up the use of learning analytics.

Collaborations Across Sectors

As noted above, building the capacity of education organizations to use data mining andanalytics meaningfully is a major undertaking. This section addresses R&D collaborations thatcan aid the process. The advisors consulted recommended collaboration among learning systemdesigners (often commercial entities), learning scientists, and educators. Learning productdesigners want access to the knowledge base built by academic researchers. Policymakers wantfindings about student learning and clear-cut guidelines for practice (e.g., ONeil 2005). As theeducation system moves from print to digital classrooms, learning products will change rapidly,and academic institutions and policies must respond accordingly. It is anticipated that the nextfive years will bring an increase in models for collaboration among learning system designers,researchers, and educators. Possibilities for such collaborations include the following:

Learning labs where commercial designers can make data from their learning systemsavailable to the research community, as is being done through the Pittsburgh Science ofLearning Centers DataShop (Koedinger et al. 2010)Partnerships between research organizations and education organizations to improveresearch-based products. For example, the Strategic Education Research Partnership(SERP) is an organization that stimulates innovation in education through sustainedcollaboration among distinguished researchers, educators, and designers. Under SERP,researchers built a set of in-depth partnerships with large school systems and developedtools and interventions in Boston and San Francisco to help middle and high schoolteachers, particularly those in science, social studies, and other content areas, incorporateacademic vocabulary into their teaching.Organizational structures that bring together people with the requisite expertise fromindustry, academia, and school systems in a sustained interaction to improve learningsystems. The recent program called Digital Promise (http://www.digitalpromise.org/) hasthe goal of fostering sustained investments in such partnerships, which are much morelikely to have an impact than simply publishing research and expecting that thecommercial sector will incorporate it into products.

50

ConclusionWorking with big data using data mining and analytics is rapidly becoming common in thecommercial sector. Tools and techniques once confined to research laboratories are beingadopted by forward-looking industries, most notably those serving end users through onlinesystems. Higher education institutions are applying learning analytics to improve the servicesthey provide and to improve visible and measurable targets such as grades and retention. K12schools and school districts are starting to adopt such institution-level analyses for detectingareas for improvement, setting policies, and measuring results.Now, with advances in adaptive learning systems, possibilities exist to harness the power offeedback loops at the level of individual teachers and students. Measuring and making visiblestudents learning and assessment activities open up the possibility for students to develop skillsin monitoring their own learning and to see directly how their effort improves their success.Teachers gain views into students performance that help them adapt their teaching or initiateinterventions in the form of tutoring, tailored assignments, and the like. Adaptive learningsystems enable educators to quickly see the effectiveness of their adaptations and interventions,providing feedback for continuous improvement. Researchers and developers can more rapidlycompare versions A and B of designs, products, and approaches to teaching and learning,enabling the state of the art and the state of the practice to keep pace with the rapid pace ofadoption of online and blended learning environments.Open source tools for adaptive learning systems, commercial offerings, and increasedunderstanding of what data reveal are leading to fundamental shifts in teaching and learningsystems. As content moves online and mobile devices for interacting with content enableteaching to be always on, educational data mining and learning analytics will enable learning tobe always assessed. Educators at all levels will benefit from understanding the possibilities of thedevelopments described in the use of big data herein.

Selected ReadingColey, T. 2010. Defining ITs Role in Mission-Critical Retention Initiatives. EDUCAUSEQuarterly 33 (4). Presents a method for adoption of a data culture with leadership frominstitutional information technology departments. Gives examples of early indicators, earlyalerts, and aligning separate data systems andpeople. http://www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum/DefiningITsRoleinMissionCritic/219108Goetz, T. 2011, June. Harnessing the Power of Feedback Loops. Wired Magazine. Givesexplanations and examples of simple feedback loops to improve human behavior, stressingreal-time feedback. http://www.wired.com/magazine/2011/06/ff_feedbackloop/all/1Ferguson, R. 2012. The State Of Learning Analytics in 2012: A Review and Future Challenges.Technical Report KMI-12-01. Knowledge Media Institute, The Open University, UK.Reviews the last decade of work on learning analytics, including factors that influenced itsdevelopment, and looks at futurechallenges. http://kmi.open.ac.uk/publications/techreport/kmi-12-01Johnson, L., A. Levine, R. Smith, and S. Stone. 2010. The 2010 Horizon Report. Austin, TX:The New Media Consortium. Horizon reports identify and describe emerging technologieslikely to have an impact on college and university campuses within the next five years. Thisissue includes visual data analysis as an emerging technology.http://www.nmc.org/pdf/2010-Horizon-Report.pdfJohnson, L., R. Smith, H. Willis, A. Levine, and K. Haywood. 2011. The 2011 Horizon Report.Austin, TX: The New Media Consortium. Horizon reports identify and describe emergingtechnologies likely to have an impact on college and university campuses within the next fiveyears. This issue includes learning analytics as an emergingtechnology. http://www.nmc.org/pdf/2011-Horizon-Report.pdf

59

Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. 2011. BigData: The Next Frontier for Innovation, Competition, and Productivity. McKinsey GlobalInstitute. Looks at innovation and competitive advantages for industries using big data,including health care, retail, and use of personal locationdata. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovationNational Research Council. 2009. Protecting Student Records and Facilitating EducationResearch: A Workshop Summary. Margaret Hilton, rapporteur. Committee on NationalStatistics and Center for Education, Division of Behavioral and Social Sciences andEducation. Washington, DC: The National Academies Press. Reports on a workshop on howresearchers can access data and protect confidentiality in compliance with FERPA and withthe Common Rule for the Protection of Human Subjects.Patil, D.J. 2012, September. Building Data Science Teams. @dpatil shares his advice on whatdata scientists add to an organization, how they fit in, and how to hire and build effectivedata science teams. He also presents highlights of how Internet companies use bigdata. http://radar.oreilly.com/2011/09/building-data-science-teams.htmlRomero, C. R., and S. Ventura. 2010. Educational Data Mining: A Review of the State of theArt. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications andReviews 40 (6): 601618. In the introduction, Romero and Ventura describe different typesof data mining techniques, both classical and emergent, used for educational tasks bydifferent stakeholders.Romero, C., S. Ventura, M. Pechenizkiy, and R. S. J. d. Baker (eds.). 2010. Handbook ofEducational Data Mining. Boca Raton, FL: CRC Press. This book provides a technicaloverview of the current state of knowledge in educational data mining. It helps educationexperts understand what types of questions data mining can address and helps data minersunderstand what types of questions are important in education decision making.Siemens, G., and Baker, R. S. J. d. 2012. Learning Analytics and Educational Data Mining:Towards Communication and Collaboration. LAK12: Second International Conference onLearning Analytics & Knowledge, AprilMay 2, Vancouver, BC, Canada. This paperpresents an updated distinction between the fields of learning analytics and educational datamining.Siemens, G., and P. Long. 2011. Penetrating the Fog: Analytics in Learning and Education.EDUCAUSE Review 46 (5). Gives a broad discussion of how analytics can be used to directlearning andeducation. http://www.educause.edu/EDUCAUSE+Review/EDUCAUSEReviewMagazineVolume46/PenetratingtheFogAnalyticsinLe/23501760

Verbert, and R. S. J. d. Baker. 2011. Open Learning Analytics: An Integrated & ModularizedPlatform. Society for Learning Analytics Research (SoLAR). Concept paper on an openlearning analytics architecture that raises the need for openness in learning algorithms so thatdifferent school settings (cultural or otherwise) can adjust how content is personalized.http://solaresearch.org/OpenLearningAnalytics.pdf

61

62

Selected WebsitesVisualization and Data Explorationhttp://www-958.ibm.com/software/data/cognos/manyeyes/. Many Eyes lets users exploreexisting visualized datasets and upload their own for exploration. Users can comment onvisualizations or create topic areas for discussion. Visualization types are organized by howthey show the data (e.g., See the parts of a whole for data laid out in pie charts and Seethe world for data laid out on maps) and datasets can be numerical, textual, or spatial.http://hint.fm/. Data visualization meets art in this site showing work by Fernanda Vigas andMartin Wattenberg.http://research.uow.edu.au/learningnetworks/seeing/snapp/index.html. Social Networks AdaptingPedagogical Practice (SNAPP) is a tool for visualizing networks resulting from the posts andreplies to discussion forums as a measure of student interactions.http://www.socialexplorer.com/. Social Explorer is an online tool that allows map- and reportbased visualizations of census data and demographic information. Flexible enough for use insectors ranging from education to journalism.http://www.tableausoftware.com/products/public. Tableau Software offers a free datavisualization tool that companies, individuals, and journalists use. Visualizations are storedon the Tableau Public site but are embeddable into blogs or websites.

Online Learning Systems With Analytics

http://www.assistments.org. The ASSISTments online platform helps teachers write questionsfor assessments and then see reports on how their students performed. Students can getimmediate tutoring while they are being assessed.http://wayangoutpost.com/. Wayang Outpost is an intelligent tutoring system that helps middleand high school students study for standardized tests and adjusts instruction as they progress.63

http://oli.web.cmu.edu/openlearning/forstudents/freecourses. The Open Learning Initiative (OLI)

offers open and free courses on such subjects as biology, programming, chemistry, andstatistics. Both students and instructors get timely and targeted feedback.http://www.khanacademy.org/. Khan Academy provides a library of videos, worked examples,and practice exercises, organized into knowledge maps, for self-paced learning in many topicareas. Khan Academy keeps track of students progress and shows at-a-glance displays forstudents, parents, and educators.

Professional Organizationshttp://www.educationaldatamining.org. Educational data mining researchers have beenorganizing yearly international conferences since 2008. The Journal of Educational DataMining was launched in 2009, and in 2011 the International Educational Data Mining Societywas founded by the International Working Group on Educational Data Mining.http://www.solaresearch.org. In 2011, a professional society for exploring analytics in teaching,learning, training and development systems was founded, the Society for Learning AnalyticsResearch. Beginning in 2010, a yearly conference has been held, the InternationalConference on Learning Analytics and Knowledge.