Data Scientist in Van Wyck, SC | Taking it one day at a time

Programming for data analysts

By Chip Oglesby December 26, 2016

When you stop and think about it, the entire data science industry is kind
of like how our country is laid out. Bare with me here as I try to explain:

You have some people who are in the know and they use deep learning, and
neural networks to solve problems. That’s kind of like living in NY or SF,
right? That’s where the best and most advanced work is going on.

Then you have those people who are trapped in the “flyover states” and
wouldn’t even know where to begin to build a neural net.

Most of the time there’s not a lot of consideration given to those people in
the middle. There are plenty of sites that will explain the what and the how
of programmatic data analysis, but not always the ‘why’.

Why would I want to learn to program??

Established analysts who don’t work for large companies or start-ups will
usually ask this question and I will attempt to answer it.

Let’s start for example of what your “workflow” might look like:

You get a request from a coworker about an analysis that needs to be ran.

You run your query in a SQL interface or even through Access (shudder).

You download your data as a CSV file, which you open in Excel and do some
light analysis.

You draw your conclusions write up your analysis in a Word document and
send it off.

Let’s say for example that this is a report that you have to pull and run
weekly. Maybe it’s related to web traffic, or maybe it’s related to something
like revenue or sales if you’re in ecommerce.

Let’s look at the physical steps again:

Write a sql query

Download the information

Open and analyze it in Excel

Write up a summary in Word and send it out.

If it’s a simple request, it might take you between one and two hours to get
this done. At the most, that’s 104 hours per year that you’ll spend getting
this done.

Let’s look again specifically at steps three and four to explain why you would
want to learn to program. Let’s say that you are pulling revenue metrics.

weekOne == 200,000
weekTwo == 300,000

Classically you might go into excel and use a function like =(B2/A2)-1 and
then write in Word, Revenue increased over the previous week by 50%. Now
that’s all well and good if you never make a mistake, but we’re all human
so someone is going to make a mistake eventually.

But what if there were a way to automate some of the descriptive statistics
work you do? What if you could write a small bit of code to help you with some
of the higher-level work you end up doing?

This small bit of code does two things. First it calculates the difference
between weekTwo and weekOne. Then it uses a simple if statement to create
part of a sentence for me.

Using a program called R markdown I can create dynamic documents that can be
shared. Those documents are dynamic because they can be updated each time I run
them when the data has changed.

For example using r result in a sentence would produce: Over this past week,
revenue increased by 50%. Now each week I rerun my document, a new number and
part of a new sentence is produced and I don’t have to worry about copying or
pasting the wrong metric.

Okay, so??

So when you learn to program what you’re doing, you’ll find that analysis
actually becomes easy/fun again. Instead of four different stages, you can use
code that connects to a SQL database, pulls the information, performs your
analysis and writes the results, all in one neat package making it reproducible.

This is a very basic high-level overview of what is possible once you’ve decided
that you want to do things differently. The hardest part is deciding you want
to make this change.

You can take this type of work as far as you want, but it’s up to you to get
there.