Students confront the messiness of data

No matter their skill level when they enter the class, students will leave the Data Challenge Lab at Stanford University after 10 weeks with the confidence to tackle even the most unwieldy data.

“The class has no prerequisites. We have students from a very wide variety of majors and all seven schools,” said Bill Behrman, PhD ’98, course instructor and director of the Stanford Data Lab. “By the end of the quarter, we give them the same raw data and challenges as Pulitzer-prize winning journalists and a Nobel-prize winning economist.”

Assignments in the Challenge Lab address diverse topics to help students get accustomed to all kinds of data, tackling datasets that detail famous historical events, research findings and everyday life, such as the 2016 presidential elections. Students work with data science tools created by co-instructor Hadley Wickham, an adjunct professor in the Institute for Computational and Mathematical Engineering at Stanford, which allow them to manipulate the data and quickly create high-quality visualizations. Eventually students replicate – and improve upon – the same data wrangling, visualization and detective work done by professionals.

No lectures

The 20-student course meets every weekday to review homework, participate in breakout sessions covering new skills, and work in groups on the next assignments. Behrman estimates that around 70 percent of his time in class — and that of his teaching assistant — is spent working one-on-one with students or with groups of two or three students. The class has no lectures.

“There are inherent limitations to what lectures can do. You develop skills by practice and feedback,” Behrman said. “We cover an enormous amount of ground, and we don’t need lectures to do that.”

Behrman’s teaching follows findings from a relatively new academic field called learning science, which focuses on learning and teaching advances backed by interdisciplinary, scientific research. One example of such a technique is mastery learning, which requires students to reach a certain proficiency in each skill the course covers before moving on to new skills. In the Challenge Lab course, skill mastery means tackling nearly 100 homework assignments and receiving detailed feedback from Behrman and his teaching assistant within a day of submission.

All of the assignments are also examples of project-based learning, which requires solving problems exactly as someone would in the real world. Behrman has championed project-based learning for years and it is also well-regarded by learning science.

“It seemed like Professor Behrman was very thoughtful about his teaching and that he was putting a lot of effort into how the class was run. That was part of what drew me to take the Challenge Lab,” said Sara Altman, a graduate student in the Symbolic Systems Program and former Challenge Lab student who worked in the extracurricular offshoot of this course, the Data Impact Lab.

Combining the barrage of skill-building assignments with intense individual attention helps students learn at a much faster pace with less struggle, Behrman said. Many of the students compare the course to an immersive language class, but for developing data skills.

Removing barriers

Although other classes give students some data-wrangling experience, this one aims to boost their confidence in their data skills so they can focus less on technical barriers and more on what data can communicate.

“We are faced with real-world messiness and have to figure out how to shape it into a form that communicates an idea to other people,” said Jennifer Ren, ‘18, a current student and a human biology major. “Before, I was concerned that I wasn’t fluent in doing data science outside of the classroom. Now, we’re using the same datasets that researchers are using. That’s so empowering.”

An internship at a data analytics company showed Ren how data science could fuel social good and motivated her to learn more. She’s taken other computer science classes but this course stands out for the level of individual attention and emphasis on real-world application.

The common desire among Challenge Lab students for meaningful data projects led to the establishment of the Data Impact Lab, which makes up the other half of the Data Lab that Behrman directs. The Data Impact Lab has completed the Poverty Alleviation Project in Kenya and the Data Journalism Project. Two more are currently underway: one to assist outreach efforts to families who qualify for an anti-poverty program in California and another to help community health workers who are aiming to eliminate malaria in Zambia by 2021.

If Behrman has his way, the Challenge Lab course will scale up, matching the class sizes of other computer science classes on campus, and the Data Lab could play a role in the education of every student.

“We are teaching the foundational data skills that enable students to make better decisions and solve problems using available data,” Behrman said. “These are increasingly important skills our students will need, regardless of the careers they pursue.”

Like the widespread influence of the Stanford d. school, of which Behrman was an early member, he believes the Data Lab could add special insight to anyone’s learning, research or work.