Issue #192

July 27 2017

Editor Picks

You Say Data, I Say SystemIt’s not enough for us to be critical of Uber’s booking algorithm, or FOX News’ most recent infographic. We need to expand our attention to the systems that these mechanisms support; systems in which our participation is often both transparent and involuntary. By taking a systems approach to data I believe we can make better things. And we might also find deeper and more meaningful questions– questions that are as much about how these things work (or don’t work) as why they exist in the first place...

Hill For The Data Scientist: An xkcd StorySir Austin Bradford Hill, a statistician and epidemiologist, created a list of guidelines for evaluating whether there is evidence of a causal relationship...He determined the following aspects of associations ought to be considered when assessing causality. When thinking about this problem, an xkcd comic I have seen in every lecture on this topic came to mind...This inspired me to attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures (bear with me, some of these are a stretch 🙈💁🏻)...

Please Prove You’re Not a Robot
When science fiction writers first imagined robot invasions, the idea was that bots would become smart and powerful enough to take over the world by force, whether on their own or as directed by some evildoer. In reality, something only slightly less scary is happening. Robots are getting better, every day, at impersonating humans. When directed by opportunists, malefactors and sometimes even nation-states, they pose a particular threat to democratic societies, which are premised on being open to the people...

A Message from this week's Sponsor:

STPF is the premier opportunity for outstanding scientists and engineers to learn first-hand about policymaking while contributing their knowledge and analytical skills to address some of today’s most pressing societal challenges. Enhance your career while engaging with policy administrators and thought leaders.

For over 43 years, doctoral level scientists, social scientists, engineers, and health/medical professionals have applied their knowledge and technical expertise to policymaking at the national and international levels. Fellows serve yearlong assignments in all three branches of the federal government and represent a broad range of backgrounds, disciplines and career stages.

Data Science Articles & Videos

Inside Facebook’s AI WorkshopA few years ago the company’s machine learning group numbered just a few and needed days to run an experiment. Now, Joaquin Candela [head of Facebook’s Applied Machine Learning group] says, several hundred employees run thousands of experiments a day. AI is woven so intricately into the platform that it would be impossible to separate the products — your feed, your chat, your kid’s finsta — from the algorithms. Nearly everything users see and do is informed by AI and machine learning...

The DeepMind Debacle Demands Dialogue On DataThis month, the UK Information Commissioner's Office declared that the hospital operator had broken civil law when it gave health data to Google's London-based subsidiary DeepMind...But the arrangement failed to consider how patients expect their data to be used, and by whom. (There was no ruling against DeepMind)...This episode is disheartening for groups that promote the power of data for the public interest, such as the Royal Statistical Society, which I lead. I hope for a world where data is at the heart of understanding and decision-making. To achieve this we need better public dialogue...

Some Advice For Journalists Writing About Artificial IntelligenceDear Journalists...I'd like to offer some advice on how to write better and more truthfully when you write articles about artificial intelligence. The reason I'm writing this is that there are a whole lot of very bad articles on AI (news articles and public interest articles) being published in newspapers and magazines. Some of them are utter nonsense, bordering on misinformation, some of them capture the gist of what goes on but are riddled with misunderstandings. No, I will not provide examples, but anyone working in AI and following the news can provide plenty. There are of course also many good articles about AI, but the good/bad ratio could certainly be improved...

Targeting Disaster Relief From SpaceI use machine learning to better target disaster relief efforts. I focused on Typhoon Haiyan, which hit the Philippines in November of 2013. It broke records for having the highest wind speeds upon landfall and destroyed over 1 million homes...After natural disasters, it’s important to understand which areas suffered the most damage in order to prioritize relief efforts. Often times damage assessment maps are created by volunteers with the Humanitarian Open Street Map team who compare satellite imagery before and after the disaster and manually label each building with their evaluation of damage. However these maps are time and labor intensive to create, and not always accurate...Using satellite imagery before and after Typhoon Haiyan in the Philippines, I built a neural network to detect damaged buildings. Using the predictions from the model, I then created density maps of damage, illustrating priority areas for relief efforts...

Building a Music Recommender with Deep Learning
I’ve spent a lot of money on music over the years and one website that I have purchased mp3’s from is JunoDownload. It’s a digital download website predominantly used by DJs and has a huge back catalogue of tracks for sale on its platform...Wouldn’t it be cool if you could discover music that was released a few years ago that sounds similar to a new song that you like? Surely Juno are missing out on potential sales by not offering this type of feature on their website...After being inspired by a blog post I’d read recently from somebody who had classified music genres for songs in their own music library, I decided to see if I could adapt that methodology to build a music recommender...

A Jupyter Notebook Magic For Browser Notifications Of Cell Completion
This package provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work (or even another Mac desktop entirely) and still get a notification when your cell completes...

What's So Hard About Histograms?Histograms are a way to summarize a numeric variable. They use counts to aggregate similar values together and show you the overall distribution. However, they can be sensitive to parameter choices! We're going to take you step by step through the considerations with lots of data visualizations...

Jobs

The Data Science & Analytics group at Penguin Random House is seeking a Junior Data Scientist/Data Scientist...In this role, you will have an opportunity to work on a variety of high-profile projects under the mentorship of Senior Data Scientists and in collaboration with key decision makers across the organization...We are an agile team of data scientists and software engineers. The team has a wide mandate encompassing pricing systems, recommendation / personalization systems, title segmentation, supply chain, as well as ad-hoc analysis and data exploration....

Training & Resources

Neuroscience-Inspired Artificial Intelligence
The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields...

mxnet-the-straight-dope - An Interactive Book On Deep Learning, In Concept And In MXNet
This repo contains an incremental sequence of notebooks designed to teach deep learning, MXNet, and the gluon interface. Our goal is to leverage the strengths of Jupyter notebooks to present prose, graphics, equations, and code together in one place. If we're successful, the result will be a resource that could be simultaneously a book, course material, a prop for live tutorials, and a resource for plagiarizing (with our blessing) useful code. To our knowledge there's no source out there that teaches either (1) the full breadth of concepts in modern deep learning or (2) interleaves an engaging textbook with runnable code. We'll find out by the end of this venture whether or not that void exists for a good reason...

GeoSpatial Data Visualization in RThe goal of user2017.geodataviz is to privide a comprehensive overview of the options available in the R language for Geospatial data visualization. This was presented at useR! 2017 as a tutorial titled Geospatial visualization using R...This tutorial covers: a) R Packages for Spatial Analysis in R, b) Data Structures for Spatial Data in R, c) Operations Supported on Spatial Data, and d) Visualizing Spatial Data [Using Base Graphics, Using ggplot2 and helper packages, Using shiny for dynamic mapping, Using leaflet and related packages for interactive maps, and Using specialized packages such as tmap, choroplethr, ggmap, plotly, highcharter etc]...

Books

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you’ll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You’ll learn how tidytext and other tidy tools in R can make text analysis easier and more effective....