Issue #238

June 14 2018

Editor Picks

Why did the Neural Network cross the road?Can a machine learning algorithm learn to tell a joke? I’ve experimented with neural networks and jokes before, teaching them to tell knock-knock jokes, or to generate April Fools pranks. In each case, the results were underwhelming. However, that could have been because the algorithm didn’t have much data to work with, just a couple of hundred examples of each type of joke. What happens when I give a neural network a LOT of examples to copy?...

Why the Future of Machine Learning is TinyI’m convinced that machine learning can run on tiny, low-power chips, and that this combination will solve a massive number of problems we have no solutions for right now. That’s what I’ll be talking about at CogX, and in this post I’ll explain more about why I’m so sure...

A Message from this week's Sponsor:

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Improving Language Understanding with Unsupervised LearningWe’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well...

Graph Convolutional Neural Networks for Web-Scale Recommender SystemsHere we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model...

Physicist Max Tegmark on the promise and pitfalls of artificial intelligenceTegmark recently spoke about AI’s potential — and its dangers — at IPsoft’s Digital Workforce Summit in New York City. After the keynote address, we spoke via phone about the challenges around AI, especially as they relate to autonomous weapons and defense systems like the Pentagon’s controversial Project Maven program. Here’s an edited transcript of the interview...

The Trouble with D3Recently there were a couple of threads on Twitter discussing the difficulties associated with learning d3.js. I’ve also seen this come up in many similar conversations I’ve had at meetups, conferences, workshops, mailing list threads and slack chats. While I agree that many of the difficulties are real, the threads highlight a common misconception that needs to be cleared up if we want to help people getting into data visualization...

Jobs

PepsiCo’s eCommerce Data Science and Analytics group is a team of data scientists, technology specialists, and business innovators who operate within eCommerce to build industry-leading systems and solutions. By focusing on machine learning and automation, the Data Science & Analytics group is pushing the bounds of possibility for PepsiCo and its strategic partners...

Training & Resources

Visualize TensorFlow Graph In TensorBoardLearn how to use TensorFlow Summary File Writer (tf.summary.FileWriter) and the TensorBoard command line unitility to visualize a TensorFlow Graph in the TensorBoard web service, via a screencast video and full tutorial transcript...

Data Cleaning with Python and Pandas: Detecting Missing ValuesAccording to IBM Data Analytics you can expect to spend up to 80% of your time cleaning data. In this post we’ll walk through a number of different data cleaning tasks using Python’s Pandas library. Specifically, we’ll focus on probably the biggest data cleaning task, missing values...

Books

The book begins with an introduction to test-driven machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression...