Issue #19

April 3 2014

Editor Picks

Jeff Hawkins has bet his reputation, fortune, and entire intellectual life on one idea: that he understands the brain well enough to create machines with an intelligence we recognize as our own. If his bet is correct, the Palm Pilot inventor will father a new technology, one that becomes the crucible in which a general AI is one day forged. If his bet is wrong, then Hawkins will have wasted his life. At 56 years old that might sting a little...

Imagine if a company’s three highly valued data scientists can happily work together without duplicating each other’s efforts and can easily call up the ingredients and results of each other’s previous work. That day has come...

Thanks to dwindling research budgets and the rising cost of science software, "open science" advocates may be succeeding at getting science to go open source. And it's thanks in part to a language called R...

Data Science Articles & Videos

META: What Data Scientists are reading. And why.We recently posted an analysis of the most-read articles on this newsletter for the past two quarters. We were curious to understand what was getting the most clicksand if there were any consistent areas of interest...

Forget the Algorithms and Start Cleaning Your DataThe idea that the combination of predictive algorithms and big data will change the world is a tempting one. And it may end up being true. But for now, the industry is facing a reality check when it comes to big data analytics. Instead of focusing on what algorithms to use, your big data success depends more on how well you cleaned, integrated, and transformed your data...

The Sexiest Job of the 21st Century is Tedious, and that Needs to ChangeAs organizations collect increasingly large and diverse data sets, the demand for skilled data scientists will continue to rise. In fact, it was dubbed “The Sexiest Job of the 21st Century” by HBR. Unfortunately, the day-to-day reality of the role doesn’t quite match the romanticized version...

Big Data: Are we making a Big Mistake?
Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends... Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense"...

Data Science + Crime Prevention = Predictive Policing
We recently caught up with George Mohler, Chief Scientistat PredPol, Inc and Assistant Professor of Mathematics and Computer Science at Santa Clara University. We were keen to learn more about hisbackground, the theory and technology behind predictive policing and the impact PredPol is achieving...

SelfieCity might be the Ultimate Data-Driven Exploration of the Selfie
Understanding what keeps customers engagedis incredibly valuable, as it is a logical foundation from which to develop retention strategies and roll out operational practices aimed to keep customers from walking out the door. Consequently, there's growing interest among companies to develop better churn-detection techniques, leading many to look to data mining and machine learning for new and creative approaches...

Differential Equations in Data Science
The ordinary differential equation (ODE) is a tool often overlooked in data science... However, it's a tool that's been in use for centuries, modeling everything from predicting optimal pharmaceutical dosing schedules through estimating options pricing. Here at URX we feel no tool should be left behind. We've re-surfaced the ODE and, as a gentle introduction, would like to show how it relates to a very common data science tool, markov chains...

How the NSA can use Metadata to predict your Personality
The president and congressional leaders want to end NSA bulk metadata collection, but not the use of metadata, which may even be expanded. From a technical perspective, the question of what your metadata can reveal about you, or potential enemies, remains as important as it was since the Edward Snowden scandal. The answer is more than you might think...

Swish Analytics: Algorithmic Sports, Predictions & Betting Recommendations
Swish Analytics Inc., is a sports technology startup based in San Francisco that developed algorithmic sports, predictions and betting recommendations. The three founders raised $300K in March from a group of private angel investors to deliver algorithmic sports predictions to bettors and fans in the underserved data science field...

Jobs

The Maps Data Insights team has an opening for a craftsman skilled in Large Scale DataMining and Machine Learning for making significant contributions in improving Apple Maps. The role involves developing models for identifying patterns and anomalies and for mining structured, semi-structured and unstructured data. The person will get an opportunity to contribute to projects ranging from the ones involving massive datasets to the ones solving small scale but very complex problems using machine learning and probabilistic modeling techniques...

Training & Resources

Datasets: Webscope from Yahoo!We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available...

If you’re a practicing scientist, you probably use statistics to analyze your data. From basic t tests and standard error calculations to Cox proportional hazards models and geospatial kriging systems, we rely on statistics to give answers to scientific problems. This is unfortunate, because most of us don’t know how to do statistics...

Introduction to Artificial Neural Networks Part 2 - Learning
In part 1 we were introduced to what artificial neural networks are and we learnt the basics on how they can be used to solve problems. In this tutorial we will begin to find out how artificial neural networks can learn, why learning is so useful and what the different types of learning are...

Books

This book from John Foreman (Chief Data Scientist at Mailchimp) makes Data Science extremely practical and accessible - using Excel as a primary means for exploring Data Science concepts. The book introduces the major Data Science techniques, how they work, how to use them, and how they benefit your business, large or small. It's not about coding or database technologies. It's about turning raw data into insight you can act upon, and doing it as quickly and painlessly as possible...