Issue #179

April 27 2017

Editor Picks

Raising Good RobotsIntelligent machines, long promised and never delivered, are finally on the horizon. Sufficiently intelligent robots will be able to operate autonomously from human control. They will be able to make genuine choices. And if a robot can make choices, there is a real question about whether it will make moral choices. But what is moral for a robot? Is this the same as what’s moral for a human?...

Creating a Modern OCR Pipeline Using Computer Vision and Deep LearningIn this post we [Dropbox] will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale...

A Message from this week's Sponsor:

How far could you go with the right experience and education? Find out. At Capitol Technology University. Earn your PhD Management & Decision Sciences — in as little as three years — in convenient online classes. Banking, healthcare, energy and business all rely on insightful analysis. And business analytics spending will grow to $89.6 billion in 2018. This is a tremendous opportunity — and Capitol’s PhD program will prepare you for it. Learn more now.

Data Science Articles & Videos

Why good data scientists make good product managers (and why they’ll be a little uncomfortable)When I was transitioning my career from data scientist to product manager, I solicited a lot of feedback from current data scientists and product managers about getting in touch with others who had attempted such a transition...and I’d like to offer some perspective on what the transition is like for the benefit of others who may be thinking of either making this transition themselves or for hiring managers who are considering hiring a data scientist as a product manager...

Our Machines Now Have Knowledge We’ll Never Understand
We are increasingly relying on machines that derive conclusions from models that they themselves have created, models that are often beyond human comprehension, models that “think” about the world differently than we do...But this comes with a price. This infusion of alien intelligence is bringing into question the assumptions embedded in our long Western tradition. We thought knowledge was about finding the order...It looks like we were wrong. Knowing the world may require giving up on understanding it...

Banning Exploration In My InfoVis ClassI’ve banned the word “explore” from all project proposals in my InfoVis class. No explore. No exploration. No exploratory. No, you may not create a tool to “allow an analyst to explore the bird strike data.” No, you can’t build a system for “exploration of microarray data.” And, no, you can’t make a framework for “exploratory network analysis.” Just no...The line that I use on my students is that: No one is paid to explore, they’re paid to find...

Examining the arc of 100,000 stories: a tidy analysis
I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page...This offers a great opportunity to analyze story structure quantitatively. In this post I’ll do a simple analysis, examining what words tend to occur at particular points within a story, including words that characterize the beginning, middle, or end...

Building and Exploring a Map of Reddit with PythonThe goal of this notebook is to build and analyse a map of the 10,000 most popular subreddits on Reddit. To do this we need a means to measure the similarity of two subreddits...we want to map out and visualize the space of subreddits, and attempt to cluster subreddits into their natural groups. With that done we can then explore some of the clusters and find interesting stories to tell...

Data Sciencing Motorcycles: Lean AssistMotorcycling is life and learning to ride is hard. Cornering, in particular, was the bane of my existence for a long time in the beginning. How do you know if you are leaning too much or too little? Thus… introducing… the Motorcycle Lean Assist, MLA for short! Let’s use data science to solve this common motorcycle challenge...

The Myth of a Superhuman AIRecently at a conference convened to discuss these AI issues, a panel of nine of the most informed gurus on AI all agreed this superhuman intelligence was inevitable and not far away...Yet buried in this scenario of a takeover of superhuman artificial intelligence are five assumptions which, when examined closely, are not based on any evidence. These claims might be true in the future, but there is no evidence to date to support them. The assumptions behind a superhuman intelligence arising soon are...

Utilising epidemic modelling to improve advertising click rates on FacebookIt is clear that Facebook calculates a likelihood of responding to an ad per user, and only targets those with high likelihoods...What is unclear, is if Facebook utilizes friendship information, more specifically, the assumption that: If a user clicks on an ad, the probability their connections will also click on the ad increases...Based on this assumption, can the user network be explored to ‘spread’ the ad in a viral manner to achieve a similar (or superior) numbers of clicks, while showing the ad to fewer people?...

Jobs

Knotch is a marketing intelligence company that enables CMOs to understand how their marketing efforts are impacting their audiences emotionally across every content distribution channel or geography in real time. Marketers use this unprecedented, real-time intelligence to optimize the creative and distribution of their marketing. We're based in New York and our client verticals include financial institutions, entertainment, and CPG.

We’re a small team right now, so we’re looking for a teammate with a breadth of skills and a passion for learning. From ensuring data fidelity in our pipelines to building out full product features, and from preparing internal reports to giving polished client presentations, our ideal team member enjoys doing it all. In a typical day you may find yourself building out a deep net to classify web content, debugging a data discrepancy in our pipeline, or working with our Customer Success team on a custom analysis for a client. So, yeah, as our ideal team member, you’re as comfortable writing a simple SQL query to pull descriptive stats and making a beautiful bar chart as you are implementing a deep neural network, and you’d love to do both...

Training & Resources

How to Label Images Quickly
I’ve found collecting great data is a lot more important than using the latest architecture when you’re trying to get good results in deep learning, so ever since my Jetpac days I’ve spent a lot of time trying to come up with good ways to refine my training sets. I’ve written or used a lot of different user interfaces custom designed for this, but surprisingly I’ve found that the stock Finder window in OS X has been the most productive!...

Dance Dance ConvolutionDance Dance Revolution (DDR) is a popular rhythm-based video game. Players perform steps on a dance platform in synchronization with music as directed by on-screen step charts...We introduce the task of learning to choreograph. Given a raw audio track, the goal is to produce a new step chart. This task decomposes naturally into two subtasks: deciding when to place steps and deciding which steps to select...