Our 3 Tracks

by Friederike Schüür and Iva Horel

01/31/17

This post has been adapted from the original, which we first wrote for the Healthcare Data Dive.

We’ve been to hackathons and one problem that we’ve seen is that some data skills, like loading and cleaning that data, are required early on in the pipeline. While people are scrubbing the data, others with different skills, ones more suited to analyzing or modeling, will have to wait.

Now, that can be frustrating. Who likes to show up to an event early (imagine your precious Saturday) pumped up and ready to get going only to find out that you can’t contribute for at least a couple hours?

Right, that’s no fun! This is why we decided introduce STRUCTURE.

Yes, yes, we know... the beauty of hackathons is their egalitarian nature. Their spontaneity. People hacking away doing cool stuff. That’s what we love about hackathons, too, and that’s why we’re introducing just the right kind and the right amount of structure (so we hope). Our aim is to help everyone be productive right from the first minute of the hackathon while ensuring we’re all working, roughly, towards a common goal.

Now, how exactly are we going to doing that?

First of all, we ask people to apply. To be clear, we’re not planning to let only the best people through. This isn’t a job application. We want to make sure people attending the event have the minimal skills required to be productive whilst ensuring a good mix of different types of skills so that people can have fun learning from each other. That’s why we ask you to fill in the application from. So please, bear with us!

Second, we’d like to introduce to you our three data tracks:

Clean the Clutter

Describe the Disorder

Model the Mess

People within a track will share the same set of skills (or are interested in getting better in working that set of skills).

People in “Clean the Clutter” will focus on getting and cleaning the data, a crucial step in the data processing pipeline. We’ve got a couple of really interesting but hard to work with datasets. People in “Clean the Clutter” will help us all get access to the data, from scraping to setting up databases to query from. Their objective will be to unify our sources and build clean datasets for everyone’s use. Outputs from our wranglers will be available for download after the event. We’ll need resourceful people, come join “Clean the Clutter”.

People in “Describe the Disorder” will work on getting some really good descriptive statistics (think cool visualization) of the data sets we’ll be working with. A good understanding of our data will be crucial; good descriptive stats are the first and crucial building block in getting to that understanding. The descriptive stats groups will begin their day with some pre-cleaned data (different from what’s being cleaned by “Clean the Clutter”) and dig into it in search for those actionable insights. Join “Describe the Disorder” to get intimately familiar with our data sets.

Finally, people in “Model the Mess” will construct machine learning type models to predict, classify or cluster, for example. They'll work on similar problems as the "Clean the Clutter" and "Describe the Disorder" tracks, just from a different perspective using slightly different tools. And that’s really the beauty of it all, people working together towards a common goal using the skills they have.

Each track will have 2-3 groups of 4-5 people. Plenty of people to learn from, small enough for each one of you to have an impact.

Now, should you only consider a track if you already have all the skills you need to contribute?

Well, if you’re an expert in cleaning, describing, or modeling, we’d love to see you at the event. But, if you’re interested in learning about cleaning, describing, or modeling, that’s great, too. We ask for some basic knowledge of python (or R), but otherwise we’d love to foster curiosity and provide an environment for you to learn in. So, if you spend your days setting up databases and you’d want to give visualization a try, consider joining the descriptive stats track! If, on the other hand, you already know something about predictive modelling but want to apply your skills to the data we provide, join our modeling team!

Towards the end of our hack day, we will have a series of lightning talks for everybody to present their results. The goal is to learn from each other. There’ll be no prizes for the best project. Why compete? Let’s work together.