Issue #182

May 18 2017

Editor Picks

Applying Artificial Intelligence in Medicine: Our Early Results Picture a world where your heart can be monitored continuously using a device you could purchase at a Best Buy or Target. Algorithms transform the raw data coming from your watch into diagnoses, and your doctor will be notified when a problem is detected. Today, Cardiogram is taking the first step down that path. We’ve developed an algorithm to use the Apple Watch to detect atrial fibrillation...

How Our Company Learned to Make Better Predictions About EverythingIf an individual can gain a predictive edge, so can a company. At Twitch, a subsidiary of Amazon, we created a program that teaches all our employees to become better forecasters regardless of their quantitative background, organizational role, or area of expertise...

The Cost of Doing Data Science on Laptops
At the heart of the data science process are the resource intensive tasks of modeling and validation. During these tasks, data scientists will try and discard thousands of temporary models to find the optimal configuration. Even for small data sets, this could take hours to process. Because of this, data scientists who rely on their laptops or departmental servers for processing power must choose between fast processing time and model complexity. In either case, performance and revenue suffer...

A Message from this week's Sponsor:

This is a white paper about data science teams and how companies apply their insights to the real world. You’ll learn how successful data science teams are composed and operate and which tools and technologies they are using.

Data Science Articles & Videos

Are Pop Lyrics Getting More Repetitive?In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: "the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced"). I'm going to try to test this hypothesis with data. I'll be analyzing the repetitiveness of a dataset of 15,000 songs that charted on the Billboard Hot 100 between 1958 and 2017...

I don’t know Fisher’s exact test, but I know Stan A few days ago, I watched a terrific lecture by Bob Carpenter on Bayesian models. He started with a Bayesian approach to Fisher’s exact test. I had never heard of this classical procedure, so I was curious to play with the example. In this post, I use the same data that he used in the lecture and in an earlier, pre-Stan blog post. I show how I would go about fitting the model in Stan and inspecting the results in R...

Google’s neural network-generated custom face stickers are like Bitmoji that aren’t horrible
So let me just say really quick that I really dislike Bitmoji. Pretty much everything about it/them. The one thing that’s good about Bitmoji is that the user really can easily customize a representative avatar, which is good for inclusion, even if the results are universally terrible in every way. Fortunately Google has just blown Bitmoji out of the water with a genuinely excellent alternative...

The Making of the Weighted Pivot Scatter Plot
I recently published a story that tried to answer the question: what city is the microbrew capital of the US? One of the graphics in the story allows the user to adjust some parameters to change the rankings, and see how the data can be manipulated to yield different results. It looks like this...

Understanding deep learning requires re-thinking generalizationThis paper has a wonderful combination of properties: the results are easy to understand, somewhat surprising, and then leave you pondering over what it all might mean for a long while afterwards! The question the authors set out to answer was this: What is it that distinguishes neural networks that generalize well from those that don’t?...

From Physics to Finance: My First Year in [Data Science] IndustryAs the first data science hire in a startup, the opportunities to interact with and learn from other groups are limitless. I communicate with engineers, business development, marketing, customer success and design teams on a regular basis: a unique opportunity to learn about the other fields, how different departments operate, and the financial technology as a whole. Here are some lessons I have learned along the way...

Jobs

Sprout’s data team uses code, statistics, and machine learning to inform our products and organization. We’re looking for an experienced Data Engineer to join our team of data scientists and engineers.

Our team uses Python and its data stack (pandas, scikit-learn, Apache Airflow), along with a little Apache Spark and a lot of Amazon Redshift to drive decisions and power software that is used by more than 17,000 brands around the world. Companies like Microsoft, Zipcar, Hyatt, Google, and Zendesk rely on Sprout to create stronger relationships with their customers through social media.

We’re looking for curious, analytical, and creative people to help utilize the vast amount of data we have. If you love finding ways of using data to build better products and solve problems, we’d love to talk with you...

The Hitchhiker’s Guide to d3.jsThis guide is meant to prepare you mentally as well as give you some fruitful directions to pursue. There is a lot to learn besides the d3.js API, both technical knowledge around web standards like HTML, SVG, CSS and JavaScript as well as communication concepts and data visualization principles. Chances are you know something about some of those things, so this guide will attempt to give you good starting points for the things you want to learn more about...

Books

"Critical book for anyone interested in the way technology is shaping our cities. Great collection of many historical and current endeavors, and a strong perspective on the different approaches to solving urban issues with technology."...