Category Student Work

Dear Students, There is a new Kaggle Visualization Competition in our honor! I encourage you all to enter it! I received this email from Will Cukierski from Kaggle. This email was sent to me and Chris Mulligan. (See the p.s. for the Legend of Chris Mulligan.) Yours, Rachel Chris and Rachel, Thanks to your blog […]

Congratulations to Maura Fitzgerald for taking first place in our in-class Kaggle competition! First a couple comments, and then the final results are below. Were these Kaggle-competitive scores? The top scores were in the ballpark of the winning scores in the external version of this competition. The students in the class were given slightly different […]

Each week Cathy O’Neil blogs about the class. Cross-posted from mathbabe.org. Thank you Cathy for doing such a wonderful job this semester capturing the course in this way, and also for being a respected voice in the classroom, a question-asker and role model for the students. Here’s our class photo, and Cathy’s blog post follows. Cathy’s post captures the presentation done by a subset of students, which represented a collaboration of many/most students in this course, as part of their work for a think piece. More on this to come at a later date. It also captures my synthesis of the semester.

In the final week of Rachel Schutt’s Columbia Data Science course, we heard from two groups of students as well as from Rachel herself. […]

This is another part of the students’ final project. A small group designed a survey to assess their classmates on different dimensions that capture the skills of a data scientist, and administered the survey to their classmates. The questions were of the form “Do you know what ___ means?”, or “Have you ever implemented ____?”. The students were well aware of potential biases in their questions, the limitations of self-reporting, etc. The survey was a great first pass.

This is an innovative way of describing and visualizing Data Scientists — it captures the variablity among data scientists, and allows for the potential for effective Data Science teams to be constructed by creating “constellations” of these stars, or overlaying the stars on top of each other to create “complete” data science teams. The visualization and survey represented an improvement over the data science profiles I gave them at the beginning of the semester. This was a collaborative effort among many students including Adam Obeng, Eurry Kim, Christina Gutierrez, Kaz Sakamoto, and Vaibhav Bhandari. Full report of last lecture still to come.

Last night the students gave their guest lecture. It was awesome! We’ll have a more detailed report tomorrow, but this image was already posted on twitter, so I thought I’d get it up here as a sneak preview for the rest of the lecture. Part of the students’ design concept was constellations and stars, so they have another nice visualization of “data science profiles” as stars. It will make more sense when you see it. Kaz Sakamoto, Eurry Kim and Vaibhav Bhandari created this as part of a larger class collaboration.

Dear Students, I want to let you know about the following: (1) Institute for Data Sciences and Engineering: Columbia University’s new Institute for Data Sciences and Engineering has launched a website: http://idse.columbia.edu. The word “Data” also modifies “Engineering”, in case there was confusion. Data Sciences and Data Engineering. (2) Kaggle Competition: As you know, our class […]

Each Tuesday, Eurry Kim, a student in our class, picks one example of data visualization to share with us. This week is a little different. Eurry didn’t pick it– she created it! I asked if we could feature it. Eurry and Kaz Sakamoto, also a student in the class, submitted a visualization to the Hubway Challenge. Here is their submission. You can view the public leaderboard here, and vote for Kaz and Eurry’s submission! I’m excited to see students in our class collaborating in this way. Below I asked them to describe their collaborative process:

Doing Data Science

Introduction to Data Science is a class at Columbia University in the Department of Statistics. The course was designed and taught by Dr. Rachel Schutt in the Fall of 2012. The course was team taught in the Fall of 2013 by Dr. Schutt and Dr. Kayur Patel.