Monthly Archives: January 2015

DataQuest is a recently launched online data science learning platform for python. The site consists of a gamified series of missions that increase in difficulty as your skills progress. Here are a few other features of the site.

Sample Code

Live, Interactive Browser-based Coding Environment

Step by Step Instructions

Instant Feedback

Helpful Forums for Q&A

The site is still under development and the founder, Vik Paruchuri, is looking for help developing more content and missions for the site. If that is something of interest to you, get in touch with Vik via the DataQuest website.

Like this:

If you are based near San Francisco and interested in machine learning, the Next.ML conference is going on this weekend, January 17, 2015. The conference is a bunch of workshops covering the latest trends in:

DEEP LEARNING

PROBABILISTIC PROGRAMMING

PARALLEL LEARNING

JULIA

OTHER MACHINE LEARNING TOPICS AND TOOLS

The lineup of speakers is great, coming from places like MIT, Facebook, Stanford, Domino Data Labs, and others. Bring your laptop because all participants will leave the conference with lots of great software and datasets.

Note: If you would like to attend the conference, you can use the coupon code “media” to save 30% off the conference admission.

Like this:

The primary output of data science is data products. Data products can be anything from a list of recommendations to a dashboard to a single chart or any other product that aides in making a more informed decision. In the end, data science should produce some usable results, and those results are the data product. The process used to created those data products needs a bit more formalization. Call it a: methodology, process, lifecycle, or workflow; but it needs to exist.

Data Science is not Software Engineering

First, data science is often treated as software engineering because code is written. However, they are not the same thing. Agile methods, waterfall, and scrum are not pluggable methodologies that can be used with data science. Data science is more science and less engineering; therefore it should follow a more scientific method.

Existing Data Science Workflows

Luckily, some options already exist for data science. Much like software engineering, there is not a magic workflow that fits every project. The goal is to find a workflow that best fits the needs of the current project.

CRISP-DM

The most popular and oldest method is CRISP-DM. CRISP-DM was designed for data mining projects, which is closer to data science than software engineering, but still not exact. The 6 steps of CRISP-DM are:

Data Science Workflow

Those are 3 options of workflows for data science. They are not the only options. Feel free to modify the workflows to best suit the project. It will be exciting to see the new workflows for data science that will be created in the near future. It will also be fun to see which ones turn out to be the most beneficial.

One thing a data product must do is help answer a question. Thus, a logical staring point for data science is a good question. Just don’t let the focus of the workflow come down to the process, which is often the case in software engineering. Let the focus be on data products.

Note:
I have previously written 2 posts on this topic, and I don’t think either post gets the methodology exactly correct.

Like this:

Again this summer, the University of Chicago is hosting the Data Science for Social Good (DSSG) Fellowship Program. DSSG is a 12-week training program for aspiring data scientist interested in working on problems in the non-profit and government sectors. The program is collaborative, creative, and project-based. All the fellows work on real problems from organizations seeking a social impact. And yes, fellows are paid and they receive a housing stipend.

DSSG looks for the following characteristics is applicants:

Passion for doing social good

Preferably a graduate student or a recent grad

Some programming, stats, and data analysis skills (don’t have to be an expert)