Issue #52

Nov 20 2014

Special Message: It's been a year! (This is Issue #52!)Many thanks to everyone who's subscribed, enjoyed, sent messages etc. Looking forward to the next 52 issues! ...

Editor Picks

Personalized Recommendations at EtsyProviding personalized recommendations is important to our online marketplace. It benefits both buyers and sellers: buyers are shown interesting products that they might not have found on their own, and products get more exposure beyond the seller’s own marketing efforts. In this post we review some of the methods we use for making recommendations at Etsy...

Deep Visual-Semantic Alignments for Generating Image DescriptionsWe present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding...

Data Science Articles & Videos

Flock: Hybrid Crowd-Machine Learning ClassifiersWe present hybrid crowd-machine learning classifiers: classification models that start with a written description of a learning goal, use the crowd to suggest predictive features and label data, and then weigh these features using machine learning to produce models that are accurate and use human-understandable features...

Show and Tell: A Neural Image Caption Generator
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image...

Putting the Magic in Data ScienceI argue that magic in data science often comes from combining various “tricks” in novel ways. I describe four common tricks we use at Facebook, as well as a grab bag of others that I’ve found useful...

Scaling Language Understanding via Joint Multilingual Learning
In this talk, Andreea Bodnari (Chief Data Scientist at Movable Ink) presents two probabilistic models that systematically model both the depth and the breadth of natural languages for two different linguistic tasks: syntactic parsing and joint learning of named entity recognition and coreference resolution...

Artificial Intelligence is a tool, not a threat
Recently there has been a spate of articles in the mainstream press, and a spate of high profile people who are in tech but not AI, speculating about the dangers of malevolent AI being developed, and how we should be worried about that possibility. I say relax. Chill...

Jobs

Work on building state of the art data systems that ingest, model and analyze massive flow of data from online, social, mobile and offline commerce/user activity to create models to achieve business objectives. Use cutting edge machine learning, data mining and optimization algorithms underneath it all to analyze all this data on top of Hadoop/HBase/Hive/Pig. Have the aptitude to see through and analyze the complexity of intricate statistical and learning algorithms; and design roadmaps for efficient and scalable implementation while using numerical routines tailored to the problem...

Seaborn v0.5.0 releaseThis is a major release from 0.4. Highlights include new functions for plotting heatmaps, possibly while applying clustering algorithms to discover structured relationships. These functions are complemented by new custom colormap functions and a full set of IPython widgets that allow interactive selection of colormap parameters. The palette tutorial has been rewritten to cover these new tools and more generally provide guidance on how to use color in visualizations. There are also a number of smaller changes and bugfixes...