Posted by John Cox and Pavel Simakov, Course Builder Team, Google Research

Course Builder is an experimental, open source platform for delivering massive online open courses. When you run Course Builder, you own everything from the production instance to the student data that builds up while your course is running.

Part of being open is making it easy for you to access and work with your data. Earlier this year we shipped a tool called ETL (short for extract-transform-load) that you can use to pull your data out of Course Builder, run arbitrary computations on it, and load it back. We wrote a post that goes into detail on how you can use ETL to get copies of your data in an open, easy-to-read format, as well as write custom jobs for processing that data offline.

Now we’ve taken the next step and added richer data processing tools to ETL. With them, you can build data processing pipelines that analyze large datasets with MapReduce. Inside Google we’ve used these tools to learn from the courses we’ve run. We provide example pipelines ranging from the simple to the complex, along with formatters to convert your data into open formats (CSV, JSON, plain text, and XML) that play nice with third-party data analysis tools.

We hope that adding robust data processing features to Course Builder will not only provide direct utility to organizations that need to process data to meet their internal business goals, but also make it easier for educators and researchers to gauge the efficacy of the massive online open courses run on the Course Builder platform.