Panel: Privacy and Learning Analytics

Interactive activity: Fill in a warm up quiz online, using Socrative.com. Questions about students and privacy. There were kicker questions – early ‘I read the privacy policy’ and than later ‘I read the policy for this tool’.

They have a panel looking at the results, going through it together.

“When I use an ed tech product, I read the privacy policy” – 81% false, 19% true. Thanks to 81% of you who were honest!

Justin can believe that 1 in 5 of this audience might. But we know from research, anecdata, that it’s hard to make your way through those dense difficult privacy policies. K12 school districts survey, Berkman Student Privacy Initiative. Lots of folks aren’t getting through the policies. They are long and dense. For educators on the front line, grading yesterday’s planning tomorrow’s, hard to look at what they’re saying they’ll do.

April: Many say they can change whenever they want.

Yes, Alice in Wonderland situation. Good luck with third party policies.

Garron: I have high concern. As we develop, responsive systems to student data. What does it mean, how can you opt out, anonymise, don’t collect, you’re disconnected from the system. Does that put me at a learning disadvantage?

Justin: That’s high level of autonomy for students, most don’t feel like that, they feel compelled because teacher or district told them to. It’s concerning enough if they have autonomy.

Xavier: I don’t think the students care so much. They know university is collecting data, but they don’t pay attention. For research maybe that’s a problem. Evaluation linked to that data, maybe there’s a problem.

Justin: Two things. One, students expect educators collect data about them and use that to improve things – if you’re not doing that as a teacher, we’re concerned. Another part, students are growing up knowing that large corporations have large data about them, universities are probably minor compared to Google and the NSA. Realism and hopelessness.

Someone else: Mix of people K12 and university. At K12, and elementary.

We’re aware of that. Student privacy initiative is K12, Justin is HE realm.

Someone else: Students know? Not at elementary level.

Their rights belong to the parents, they have decision-making power. Excellent point, not only do students not know, but parents don’t know. Schools use more cloud-based ed tech, rely on a certain legal framework to say they don’t need parental consent. May have to tell them, but we can make the decisions. I am a lawyer, also Assoc Prof of law, used to practice education law.

Piotr: Around a year ago, interesting experiment. Capture expert interactions for students. Recorded in a classroom. Nice sheets describing how we use the data, consent forms. Of the forms that said, zero were picked up by students. Students cut off explanations and handed it back. Zero of the students cared, and that was MIT, very tech-savvy. Researcher and savvy students.

Justin: We have a research infrastructure built around gaining consent, research suggests that doesn’t work, esp with ToS and privacy policy. Even if you sit down and try to read it to them and explain, people who don’t come from a research context – and

Piotr: They did understand, they just didn’t care.

Justin: Should we accommodate students who take that position to their data, or do we treat that as problematic and design the systems to get them to care more about the use of the data, which may have the effect of not wanting to share it with us.

Someone: Element of trust from student to tutor. Not that they don’t care, they trust the tutor. In educational context, expect things to be on topic, not personal stuff, so not concerned.

Chris Brooks: Implicit trust, if you say I’m from MITx, you’d know they’re not going to get IRB approval if they’re going to sell data to CocaCola.

Phil Winne: Reasonable expectation of privacy in these circumstances. At a mall, can be photographed.

Justin: Interact with tech change. At this moment, data collected by stuff we’re aware of. But if kept permanently, can’t have a reasonable expectation, because we don’t know what possibilities will emerge. Data that appear harmless may become re-identifiable or have big consequences later on. Students have enormous trust for us, have reasonable expectations that data will be used, part of the process. Pro-actively protecting that trust is a central consideration.

Leah: Districts under-resourced. 20% of districts had no IT policies in place. Not that they’re not worthy of trust, they haven’t had the resources or bandwidth to assess this.

“What are your concerns about student privacy” … many, different. The field is in the 1.0 space here, nailing down what’s happening. 2.0 concerns, a bit harder to talk about yet, have to do with,

Someone: Students don’t know about and don’t benefit from algorithms, products developed.

Parents are legally required to send students to school. Serving up a captive market. That falls in to privacy 2.0.

Did you read the Socrative terms of use and privacy policy before using this tool?

100% no.

Someone: Time constraint.

Piotr: They can’t enforce a contract since they had no identifying information.

Family Educational Rights and Privacy Act, Children’s Online Privacy Protection Act. Protection of Pupil Rights Act. Applies in K12 space. FERPA law that says, personally-identifiable from ed records, need consent from student or parent for minors. Exceptions, most often, legitimate school official exception. Schools say, 3rd party provider – of anything, lawyers, consultant – going to them for something that’d otherwise be done in house, under our direct control, and is not going to re-share the info. Schools can extend reach of their services. If info sharing in to school official exception, don’t need to get consent ahead of time. We’re seeing a lot of actors, to the extent they’re thinking it through, it’s not clear they have the time to do that, the school official exception is the one they’re using. Tricky though. Ed tech tools, their policies, ToS, they have rabbit holes saying they can reshare and change terms at any time. Would like to see ed tech providers as a reform, bottom up or top down, see nutrition-style labelling for privacy policies.

Someone: Former bilingual teacher. Many inaccessible because not reading in English, or at all.

Special ed, low income, very much resonates with my experience as an attorney. Also don’t have internet access. They’re not going to read the policies and come up with answers. COPPA applies not to school, but to the provider if commercial.

April: What constitutes personally-identifiable – take name off, but if have birthday and home town that’s enugh to re-identify. That keeps changing.

FERPA has statutory definitions. Catch-all category, stuff that alone that would allow a reasonable person to ID with reasonable certainty.

Justin: MIT wrestling with this, took data, de-identified them to make it publicly accessible. Was written in 1974, if make it anonymous can release it. Now talk about 2 categories – identifiers, and quasi-identifiers, things which in combo could re-identify someone. Challenge with MOOCs vs ITS, we have students who write in discussion forums, public online. In a forum, have introduce yourself, write all the stuff about you. If you say hey, I’m from Latvia, and you’re the only Latvian, that’s identifying you. Had a whole team, law school, technologists, figure out what guidance exists about them. Clear that no set guidance about what should be quasi-identifiers. Dept of Ed, the bite of FERPA – hasn’t bitten yet – if you get Federal funding, FERPA can take it away. Harvard and MIT may be too big to FERPA. We want to be in compliance. k-anoyimity – each row is in common with k-1 others. So what’s k? DoE doesn’t say hey, k=7. We looked through a series of guidance statements, settled on k=5. Gender, birth, country, number of forum posts. Why would you release a set? First, for replication, transparency. Second, for analyses from others. But de-identification process, where we don’t have k-anonymity. Can blur data – e.g. change Latvian to East European. Or we can delete them. Used both. Deleted 24% of the rows. In many of the characteristics, the results is the same. In some, very very different. In the analyses already done, we can say how different they should be. Super-troubling: for any novel analysis, we can’t tell you if the finding is because of de-identification or because it’s a new thing. If two people found something interesting, we can do that. But if 1000, we have other things to do as well. Whole bunch I’ve packaged in there. We’re writing an article for CACM. Underlying problem, FERPA conflate privacy with anonymity. Those don’t have to be conflated. Federalist papers were anonymous but not private, your vote is private but not anonymous. (I think that way round.)

Chris Brookes: Dropped off some of this long tail, the small groups, that’s interesting in terms of embracing diversity. Small populations, by definition are going to get dropped.

Justin: Shall I tell you who is a small population in our course? People who complete our course. [laughter] And we’re super interested in that group. It’s 2%. We lost half the people who completed the course. Lost at differential rates on different courses, we don’t know why.

Chris: Latvia. How did you deal with forums?

Justin: We released only person-level summary stats. We have billions of event logs. Didn’t even try to do that. We did person-course dataset. Each row is one person on one course. Already binning and summarising. We favoured suppression strategies, deleted rows. Release data that’s precise but more anonymous. Now we’re thinking that may have been wrong and we should’ve blurred. After blurring, or binning, the correlations are further apart because you’ve reduced the variance. There are tradeoffs. We can quantify for the analyses we did. But any novel analysis, we can’t tell you.

Capacity for us to lose the trust, get outrage, is very high. InBloom – non-profit, third-party. The alternative was Pearson was going to collect it. But student privacy advocates lobbied to withdraw from it. Facebook thing this time. Voting manipulation was much more concerning, making people a bit happier or sadder is not such a big deal. We know the government is subpoena-ing this data, that’s changed things. HE should be at the forefront of these conversations. There’s 50 people met at Asilomar, released the Asilomar Convention, issues from Belmont report, FERPA, starting a conversation about this. Most of the guidelines were written with biomedical research in analogue world. Our context different. Dept of Education is not going to come up with the best solution. After the FB privacy revelation, a load of us got together to respond at the Berkman Center. Make it clear we’re pro-actively reconsidering this. Show we’re taking it seriously.

Leah: Plug to invite you to continue this conversation with us. Student Privacy Initiative at Berkman. Has a framing paper. Handy guides on things like FERPA.

Panel: Learning Analytics and Learning at Scale

Piotr Mitros: Learning Analytics@SCALE

We’re all familiar with LA. At scale, thousands or tens of thousands of students. At edX, best known for MOOCs, do blended learning too. K12. Picked up recently because can deliver content. Where we’re going now, LA will become holistic across the whole course, rather than just individual learner. Further future, assessment of complex skills.

Example from Monday workshops, discourse in a cMOOC – seeing Wikipedia talk appeared and disappeared. Different.

Example data, looking at how much bias in circuits and electronics. X axis the courses, y is the difference from the US on various metrics, for bias. Most extreme problem had 25% bias in favour of US. Very complex language, nothing obviously biased but there were things. Different from what you see in a classroom where you could see a student confused by e.g. a reference to baseball.

Big courses are quasi-static. Given again, with minor tweaks. Transition from ephemeral. Online big is also more asynchronous, and more diverse. Almost all MOOC platforms have almost complete data, longitudinal; trad have limited, course-long data. In residential setting, students do things offline. Tools to look at things we couldn’t look at before. One problem to one student in one course, that’s just a single data point. But if have many, can look at it more.

John Stamper

Which represents the classroom of the future – kids in circle with tablets, or in rows with paper. Answer is both! Depending on how much money you have.

RIch vs poor – poor kids will be forced to real on cheap tech, rich kids will get expensive teachers. Gates studies show that’s important. Already see this to day – Waldorf school in Silicon Valley with no tech. NGLC Wave III Grants, degree for $5k, mostly online schools. Big MOOC proliferation. Adaptive, data-driven tech companies. And every school, college, university has online instruction of some sort.

What does this mean? It’s going to happen. Driven by economics. It’s cheaper to do this. Tech gets cheaper. We should focus on improving learning tech, new ways to improve Teacher-Student access, more adaptive features. So – adaptive learning, at scale, with data.

What is scale? How big is big? Number of students, data collected. We’ve had data from Dept of Ed on millions of students. Or is it what level of data we’re collecting? I’d say lots of students but lots of data at granularity where we can track learning.

What is learning? It’s a verb. Tracked over time. Photo of a learning curve from DataShop. Red jagged line from student data, blue line predictive curve. Error rate on y axis, x axis shows attempts. Error rate falls over time.

But what do we need to create these learning curves? Observations over time. And we need to track identified skills. Not easy, need ML, data mining.

DataShop! [Most people have heard of it in the room] Largest open repository of educational datasets. Any of you can analyse the data, very fine-grained data. Currently >150m student actions, 350,000 hours of student work. Today’s age, more data collected. Pearson probably have way more data than this. But nothing like this that’s available to researchers.

Risks of data at scale. This keeps me awake at night. It’s hard to understand the data. Our data from lots of areas, online learning, games, ITS, simulations. I don’t know all the data. Possible to misinterpret it. The context is very important – some collected from particular students, and if you don’t know, that might not be apparent. Privacy and security – here it’s anonymised at a dataset level. Likely we have same student in different datasets, but we can’t tell that. If you cut off the long tail, that’s the part that’s really interesting. And in the case of MOOCs, the long tail is the people who are finishing.

In the future. Datashop has a particular data model, log-based tutor system, if it doesn’t fit that it’s hard to get the data in. Looking to track students across multiple experiences. And see students controlling their data – I agree with the person who says students don’t care. They’re used to saying the app prefs to say access phone records to play a game. They do know what they’re doing and they just don’t care. They’re going to be given the control, they’ll give consent. We’ll see how that goes.

Eric Cooper

Intel Education. Been in education for about 9y. We’re coming from a different perspective. We are mostly dealing outside the US, not mature markets. Mostly in Latin America, India, China and Asia, in their K12. Seeing lots at odds with cloud solutions, because often we don’t have connectivity. Our mission is to provide excellent education for children worldwide.

We’re involved with all stages of deployments. Talk to government, MoEs, understand what needs to be in place for tech solutions. Primarily on student success, prof devt, publishers, research – ethnography, followup afterwards. Funny to see tablets used as hard surfaces to write with paper on top of them.

Joe Burkhart, Oracle

One particular solution that has applicability in this space. Want to get feedback from you.

Oracle Education and Research Industry Business Unit. We serve 400,000 customers, have 130,000 employees. I focus on education.

Lawyer slide – don’t hold me to anything I say today.

Looking for – 3M commercial, we don;t make the surfboard, we make it better. We deliver enabling tech that enable you to do your job better. Seldom are you going to pick this up and say it’s an Oracle solution, you may have Oracle underneath and not even know that.

Focusing on a lot of different things, but new ways to engage with students, faculty, staff, researchers. YouTube is the second largest search engine in the world. Where does all this information come from, how are people standardised in finding it.

Think about student experience, student success. In Washington DC, student success is the primary focus area for HE. Institutions still want to change debate about what student success means, but at federal level that train has left the station. Institutional excellence, research, personalised learning.

Oracle, education is primary foundational activity, $3bn donated through several modes. Applications, we’ll donate so you can teach off of it, and we’ll help you set it up. Curriculum and content, for Java and HTML coding. Oracle Academy. That program in 108 countries, 3200 institutions.

What we’re really focused on, learner success and experience. It’s an infinity loop. [It’s a circle loop but with a visual twist so it looks like an infinity sign.] Previously continuous improvement process. Identify, engage, empower, intervene, succeed, analyze, enhance, predict, then back to identify again.

Can we do this in a classroom, 30 people? Yeah. 10,000 in a MOOC, or an entire university, or university system?

To do that, we have a solution – based on R. We embedded that in Oracle database. Instead of bringing data to the algorithm, we bring the algorithm to the data. Fast, at scale. Oracle R Enterprise, PLSQL to go through text. Massive amounts of data So not loads of extraction, flattening, reducing time to knowledge. So we can do that in near real-time. Focusing a lot of our efforts on this.

Questions

Zach Pardos: HE data, Coursera, edX, it can be onerous to deal with the format. For ITS, maybe Datashop. Data export formats, shared data model, replicable research that affords. Nice standards in Datashop. But in HE, wide variety, hard to make progress in coming to a shared standard. What have you seen as the biggest challenge to that data? What promising design patterns, generalisations, that make you optimistics?

Piotr: As MITx, that was a very rapid development process, aim to collect all of the data. edX has an analytics team making the formats better. It’ll never be entirely solved. Because tool designed to be easy to build interactives, and getting everyone to standardise is onerous. Can’t get same standardisation. There’ll always be challenges, but you can solve them, you have good coders.

John: Groups coalesce around different data models. I’ve been invited to events where different groups talk about bringing together people in a community and creating a data model. Submitted a proposal with MIT and Stanford, Memphis, Datashop-like repository to propose models and use their own. The Science of Learning Center was a funded NSF initiative, we put together a model and forcefully nicely let people use that model if they wanted the money. Another way, if the tools are built a certain way, people’ll get the data that way to fit the tools. It can be done. Groups are looking to do more.

Piotr: Datashop problem is easier. The types of activities are standardised. But when might submit video to review, it’s a few hundred meg. Or ML course, building large ML models. Videoconferencing. Problem gets more massive. We’ll standardise what we can.

John: You have to decide what is the important features to collect. People are moving toward that direction. Can’t track learning if not tracking everything. At some point, figure out how to collect interactions.

Piotr: We can track a heck of a lot.

John: Performance space, we’ll get there.

Joe: We see IMS Global coming there. Herding cats is their job. Half-life of learning content, can be fairly short. Who knows how we’re delivering next year. What is it we’re trying to track? That’s still up in the air. Measuring how far read in a book, re-watch a video. If can understand the important pieces, then we can get in to standardisation and data models. 2y from now there’ll be some new tool.

Eric: K12 publishers in the US. My son in 8th grade might have books from multiple, siloed publishers. Not good for teacher. General API, need some commonality. Easiest is race to the bottom – A in this, B in that. Don’t see a lot of movement.

Mykola: Collect data in a standard way is one important aspect. How to bring analytics tools and link it to MOOC platforms, enhance digital learning platforms. R comes to Oracle databases. But more for learning analysts?

Piotr: In terms of EDM, excited about real-time analytics. Plug-ins can go in to code base, so the analytics is immediate, can act. Get it within 100 ms. Standardisation, there’ll be common events where we all do the same thing. But another set, very broad – e.g. a chem or circuit simulator. Students can right now design circuits, write code, develop VSEPR diagrams in chem.

John: Interoperability between systems, standards. Not there yet. But definitely people looking at how to make that happen.

Piotr: Experience API, TinCan?

John: For what it is, xAPI is very high level, needs people to do stuff to make it useful as a research tool. It is a standard. Potential to have some carrot and stick behind it when the US Government is funding it. It’s on the radar. My feeling is, more people who work with it, try to be compatible. With both of those tools.

Joe: Real challenge is changing nature of education right now. State of HE, some doomsayer in a magazine 2y ago, saying half of the institutions aren’t going to be here in 10y. Big push for differentiation. This school in this system specialises in this mode, or domain. Still up in the air.

Piotr: Standardisation, even if you fix the formats, a lot of them don’t make sense. Mastery learning built in to it, and matrix falls on its face. Multiple choice questions, trying to narrow that place as much as possible. Much psychometrics make a difference. Small changes to pedagogy cause a lot of algorithms to need to be adapted. That’s also really hard.

Caroline: Looks like a lot is an individual focus. Where is there evaluating social learning, like communication, argumentation, teamwork, collaboration. Not sure if text data in there, or just results of tests. Individual focus, whether this is one perspective on learning, that there are facts to be learned – US Government defined that. But from workplace, it’s all communication. Where’s the social learning in learning at scale? Are we looking at the next kind of learning.

Joe: From toolset perspective, we’re looking at how to measure individual engagement from a social scale. R&D in Oracle, acquisitions, to understand active inside an LMS, but also outside, say Google+. To some extent, they’re used to that. But when we do a social learning demo, it does freak out students. “Duh, we didn’t think you could see all that!” Classroom activity, learning activity, if they don’t engage that’s a big piece of what we need.

John: Military is interested in non-cognitive skills, C21st skills, other attributes that aren’t cognitively focused. We’re looking at that as well. Carolyn Rosé is looking at that, text mining. I still don’t think we do the cog skills very well, plenty of work to get trad ed tech up to speed.

Eric: We look globally, surprised to see how many countries are interested in critical thinking skills, collaboration – good for kids and economy. Within our platform, not tracking that. There is social aspect, see that as valuable and necessary. Worked in the past in knowledge forum.

Piotr: In edX, platform focused on STEM, but not closed-ended questions. Many questions, make a program, design a circuit, design a quadcopter, task in a sim. Very constructive. Social aspects, targeted to humanities, somewhat more limited. Not yet closed the loop. Tremendous potential. We teach teamwork by throwing them in to a team. We don’t have any way to measure outcomes, have they developed teamwork skills. Scale means I can see whether team projects work or not over 10,000 students. Same goes for C21st skills. Still not closed loop. Second, do capture social interactions within the platform. That gives us tools to do ML. And essay is a poor proxy for how people communicate. Email threads you can analyse. Doing it many thousands of times. The future is super-bright.

Closing

Dragan: This is the end of our event. Thanks to everyone who participated. Many interesting discussions. Thanks helpers for organising, Charles, Grace and Garron. Lot of work to enable Mykola and me do the program. George for the sponsorship. Simon Buckingham Shum who coordinated LASI Locals. Open the floor for quick impressions of the event. I’ve sent you request for a survey. Feedback to improve the event.

Caroline: I enjoyed the workshops. Not just my own! ONe with Carolyn Rosé, and all the panels.

Stian: Would love next year to have more opportunities for small group discussions, and more hands-on. More open space to define within the conference. Like unconferences. You talk to people over dinner, all interested in one topic. Have a space later where you can get together within the formal program.

Dragan: I agree, we also want to have open space in the program. Didn’t have it this year because reduced it to 3d, so many things to fit in. Last year, 5d and more space for those informal attractions. Useful to get the duration of the event.

Eric: 4d seems better. The panels, hard to speak quickly in an hour. Hard to get what you want to get across, and have a discussion. Maybe an hour and a half. Or perhaps the small group idea.

Someone: Liked the PhD students presenting one at a time. Not met as many people as I’d hoped. Opportunities for speed dating, find out what people are doing.

Piotr: Problem with that is limited physical space. If we went out there, it was crowded. Not sufficient for deep conversations like at last year’s LASI.

Xavier: Idea, having some kind of memory of the discussions. Everyone going home richer. A lot produced in the questions, would be useful for people trying to understand what learning analytics is. What we think it is now in 2014. How do we get some memory of that. Not video!

Dragan: Perhaps analysis of video transcripts. Often discussing how we can make an easy way to understand the lay of the land. Lots of literature review.

Stian: I agree. The main purpose is to make connections. Can watch videos at home. This is time when we can chat with each other. The last day, we should share participation list, does anyone mind? You even had huge forms to fill in about LASI. Don’t have to release all that data, think about it ahead, map of where people are physically, what data they have. At dinner, generate maximum entropy tables so you don’t sit with the people we know.

Dragan: We have limited time, just 1 h, this year, to facilitate informal conversations. Perhaps 3d is too short.

Caroline: Would like participants list at the beginning of conference. Also need pictures. Send those so I know who’s who. Which Stephen was that?

Someone: Maybe add Twitter handle.

Martyn Cooper: Lacking from the field, rather than how this is organised. I’ve been craving some social science research that validates the metrics, that isn’t just LA validating LA, taking a different research approach to see that what we think we’re measuring is what we’re measuring.

Dragan: There will be a tour to HarvardX.

Justin: HarvardX is part of X consortium, creates materials mostly for edX. See where the sausage is made, see our studio. We’re 5 minute walk from here. Might be folks around. Then our video editor will take you over to a new studio space. There are some public access issues we may be able to overcome. It may be you can’t get in and just get a nice view of Harvard Yard. Not dinosaurs or space Martians, but looking at how we make MOOCs.

Dragan: Grace and Charles have boxes for name tags. CFP for special issues in JLA. Thanks everyone!

–
This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Share this:

Like this:

LikeLoading...

Related

Author: dougclow

Experienced project leader, data scientist, researcher, analyst, teacher, developer, educational technologist and manager. I particularly enjoy rapidly appraising new-to-me contexts, and mediating between highly technical specialisms and others, from ordinary users to senior management.
After 20 years at the OU as an academic, I am now a self-employed consultant, building on my skills and experience in working with people, technology, data science, and artificial intelligence, in the education field and beyond.
View all posts by dougclow