Issue #146

Sep 08 2016

Editor Picks

A Technical Primer On Causality
What does “causality” mean, and how can you represent it mathematically? How can you encode causal assumptions, and what bearing do they have on data analysis? These types of questions are at the core of the practice of data science, but deep knowledge about them is surprisingly uncommon...

The Pallettes of Earth
Take a satellite image, and extract the pixels into a uniform 3-D color space. Then run a clustering algorithm on those pixels, to extract a number of clusters. The centroids of those clusters them make a representative palette of the image...

Deep Neural Networks for YouTube RecommendationsYouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning...

A Message from this week's Sponsor:

Join Yhat cofounder and CTO Greg Lamp & Rodeo Product Manager Colin Ristig for a live product tour of Yhat's open-source Python IDE, Rodeo, and enterprise model deployment platform, ScienceOps. Greg and Colin will walk through a demo of both products using a beer recommender algorithm and web app as an example. The webinar will take place on Wednesday, September 21 at 2 PM EST.

Data Science Articles & Videos

Artificial Intelligence Swarms Silicon Valley on Wings and WheelsFor more than a decade, Silicon Valley’s technology investors and entrepreneurs obsessed over social media and mobile apps that helped people do things like find new friends, fetch a ride home or crowdsource a review of a product or a movie. Now Silicon Valley has found its next shiny new thing. And it does not have a “Like” button...

How a Japanese Cucumber Farmer is using Deep Learning and TensorFlowIt’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes...

Experimentation in a Ridesharing MarketplaceTechnology companies strive to make data-driven product decisions — and Lyft is no exception. Because of that, online experimentation, or A/B testing, has become ubiquitous. The way it’s bandied about, you’d be excused for thinking that online experimentation is a completely solved problem. In this post, we’ll illustrate why that’s far from the case for systems — like a ridesharing marketplace — that evolve according to network dynamics. As we’ll see, naively partitioning users into treatment and control groups can bias the effect estimates you care about...

A Decomposable Attention Model for Natural Language InferenceWe propose a simple neural architecture for natural
language inference. Our approach uses attention
to decompose the problem into subproblems
that can be solved separately, thus making
it trivially parallelizable. On the Stanford Natural
Language Inference (SNLI) dataset, we obtain
state-of-the-art results with almost an order
of magnitude fewer parameters than previous
work and without relying on any word-order information...

Hierarchical Multiscale Recurrent Neural NetworksLearning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks...

A Survival Guide to a PhDNow that my PhD has come to an end I wanted to compile a similar retrospective document in hopes that it might be helpful to some. Unlike the undergraduate guide, this one was much more difficult to write because there is significantly more variation in how one can traverse the PhD experience. Therefore, many things are likely contentious and a good fraction will be specific to what I’m familiar with (Computer Science / Machine Learning / Computer Vision research). But disclaimers are boring, lets get to it!...

Jobs

A million people a year die in car collisions around the world. That number should be zero. You can help us create a new insurance company that uses the latest technology and data science methods to save lives by preventing car collisions before they happen. The field is rich with data and we will be pushing the boundaries of what is possible...