Web Picks (week of 24 July 2017)

Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.

Jefferies gives IBM Watson a Wall Street reality check
IBM’s Watson unit is receiving heat today in the form of a scathing equity research report from Jefferies’ James Kisner. The group believes that IBM’s investment into Watson will struggle to return value to shareholders. The discussion on Hacker News is also worth reading for quotes such as “IBM vastly over promises with their marketing. It is so frustrating to have to answer questions from the CEO about why we don’t solve all our problems with magic beans from IBM’s Watson.”.

Technical Debt in Machine Learning
“Experienced teams know when to back up seeing a piling debt, but technical debt in machine learning piles extremely fast. You can create months worth of debt in a matter of one working day and even the most experienced teams can miss a moment when the debt is so huge that it sets them back for half a year, which is often enough to kill a fast-pacing project.”

Robust Adversarial Examples
Impressive (and scary) research from OpenAI: “We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives. This challenges a claim from last week that self-driving cars would be hard to trick maliciously since they capture images from multiple scales, angles, perspectives, and the like.”

Facets: An Open Source Visualization Tool for Machine Learning Training Data
From Google Research: “Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model.”

How to make a racist AI without really trying
“My purpose with this tutorial is to show that you can follow an extremely typical NLP pipeline, using popular data and popular techniques, and end up with a racist classifier that should never be deployed.”

CatBoost is an open-source gradient boosting library with categorical features support
CatBoost is an algorithm for gradient boosting on decision trees. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.

Using Deep Learning to Create Professional-Level Photographs
There are areas where objective evaluations are not available. For example, whether a photograph is beautiful is measured by its aesthetic value, which is a highly subjective concept. To explore how ML can learn subjective concepts, Google Research introduces an experimental deep-learning system for artistic content creation.

Our quest for robust time series forecasting at scale
“We were part of a team of data scientists in Search Infrastructure at Google that took on the task of developing robust and automatic large-scale time series forecasting for our organization. In this post, we recount how we approached the task, describing initial stakeholder needs, the business and engineering contexts in which the challenge arose, and theoretical and pragmatic choices we made to implement our solution.”

Two Decades of Recommender Systems at Amazon.com
Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this paper, Amazon discusses their recommendation system over the years.

Predictive learning vs. representation learning
“When you take a machine learning class, there’s a good chance it’s divided into a unit on supervised learning and a unit on unsupervised learning. I’d argue that this is deceptive. I think real division in machine learning isn’t between supervised and unsupervised, but what I’ll term predictive learning and representation learning.”

keras: Deep Learning in R
With the rise in popularity of deep learning, CRAN has been enriched with more R deep learning packages.

Introducing tidygraph
“I’m very pleased to announce that my new package tidygraph is now available on CRAN. As the name suggests, tidygraph is an entry into the tidyverse that provides a tidy framework for all things relational (networks/graphs, trees, etc.).”

Modeling documents with Generative Adversarial Networks
“I presented some preliminary work on using Generative Adversarial Networks to learn distributed representations of documents at the recent NIPS workshop on Adversarial Training. In this post I provide a brief overview of the paper and walk through some of the code.”

Alibaba Cloud
A new cloud player enters the market. “ChinaConnect” looks to be interesting: Alibaba Cloud’s one-stop solution to help international companies do business in China, including ICP license support.