Issue #107

December 10 2015

Many rules of statistics are wrong There are two kinds of people who violate the rules of statistical inference: people who don't know them and people who don't agree with them. I'm the second kind...

Hidden Technical Debt in Machine Learning SystemsMachine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems...

A Message from this week's Sponsor:

Data Science Articles & Videos

Google and Facebook Race to Solve the Ancient Game of Go with AIOver the last 20 years, machines have topped the best humans at so many games of intellectual skill, we now assume computers can beat us at just about anything. But Go—the Eastern version of chess in which two players compete with polished stones on 19-by-19-line grid—remains the exception...

Analyzing San Francisco Crime Data to Determine When Arrests Occur The SF OpenData portal is a good source for detailed statistics about San Francisco. One of the most popular datasets on the portal is the SFPD Incidents dataset, which contains a tabular list of 1,842,050 reports (at time of writing) from 2003 to present. For this article, I’m going to do something different and illustrate the data processing step-by-step, both as a teaching tool, and to show that I am not using vague methodology to generate a narratively-convenient conclusion...

Visualising your hiking trails and photos with My Tracks, R and LeafletAfter a hiking vacation, it is nice to have some sort of visual record afterwards. While there are likely professionaly solutions to record and visualise your trails, as a recreational hiker you can already get a lot of milage from your smartphone in combination with the R data-analysis ecosystem...

Learning to Generate Chairs, Tables and Cars with Convolutional NetworksWe train generative 'up-convolutional' neural networks which are able to generate images of objects given object style, viewpoint, and color. Our experiments show that the networks do not merely learn all images by heart, but rather find a meaningful representation of 3D models allowing them to assess the similarity of different models, interpolate between given views to generate the missing ones, extrapolate views, and invent new objects not present in the training set by recombining training instances, or even two different object classes...

Exploring Virtual Reality Data Visualization with Gear VR
With the release of the Gear VR virtual reality headset by Samsung and Oculus, it feels like the future is here. It’s easy to see how a number of industries are going to be disrupted by this new media format over the next few years by virtual reality... But what about data science? The applications are much less clear than in entertainment and marketing, but it’s likely that virtual reality will enable some interesting new data visualizations that 2D images, even interactive ones, don’t provide...

How Much Memory Does A Data Scientist Need?Recently, I discovered an interesting blog post Big RAM is eating big data – Size of datasets used for analytics from Szilard Pafka. He says that “Big RAM is eating big data”. This phrase means that the growth of the memory size is much faster than the growth of the data sets that typical data scientist process. So, data scientist do not need as much data as the industry offers to them. Would you agree?...

Jobs

At eBay, our systems scale to billions of transactions per day, and we run our sites 24x7 with 99.99% reliability. We pride ourselves to be the leader in cloud computing, Big Data, search, and many other lead-edge technologies. We are seeking a highly talented, creative, and passionate applied researchers to help us create the most relevant recommendations, machine translation and search experiences...

Books

"As a data scientist lead, I found this to be a book that is full of clear explanations on the important topics you need to master predictive modeling... The code samples provided with the book are very well organized and make it trivial to pick up and execute examples from anywhere in the book. I'm currently recommending this to everyone who joins my team..."