The problem with the data science language warsLike many other data tool creators, I've been annoyed by the assorted "Python vs R" click-bait articles and Hacker News posts by folks who in all likelihood might not survive an interview panel with me on it. The worst part of the superficial "R vs Python" articles is that they're adding noise where there ought to be more signal about some of the real problems facing the data science community. Let me say some very brief words about my present perspective on this...

Computer, respond to this email.What I love about working at Google is the opportunity to harness cutting-edge machine intelligence for users’ benefit. Two recent Research Blog posts talked about how we’ve used machine learning in the form of deep neural networks to improve voice search and YouTube thumbnails. Today we can share something even wilder -- Smart Reply, a deep neural network that writes email...

A Message from this week's Sponsor:

Distribute Processing on Your Cluster with Anaconda
Using Python on distributed computing technologies like Hadoop and Spark makes it easier to create and deploy advanced analytics in production. But managing packages on your cluster can be a full-time job. And that's why we created the cluster features of Anaconda. Learn how to manage Python packages across an entire cluster with one line of code in our webcast on November 12th.Sign Up Today.

Data Science Articles & Videos

Why there's not one "best way" to land a data science jobBeing the pragmatic and thoughtful person you are, one of the first questions you asked yourself was "What is the best way to land a data science job?" Which was all well and good until you started asking people the question and got so many different answers that somehow made the whole data science job search process seemed more and more mysterious with each additional answer...

Data mining Instagram feeds can point to teenage drinking patternsUsing photos and text from Instagram, a team of researchers from the University of Rochester has shown that this data can not only expose patterns of underage drinking more cheaply and faster than conventional surveys, but also find new patterns, such as what alcohol brands or types are favored by different demographic groups. The researchers say they hope exposing these patterns could help develop effective intervention...

Recently Watched: A Data Story (from Twitch)
Recently watched came up a couple of times in the past as a “nice to have” project, a.k.a. another one for the “maybe never” pile. After all, recency is everywhere. Netflix makes finishing all of House of Cards the default experience through recency. Sony sorts my game library by last played. I’ve been happily opening recent documents since Office 95. But it wasn’t clear it’d be valuable for Twitch until I asked our data the right question. How much of our viewership is already on recently watched channels?...

Visualizing Chess with ggplotThere are nice visualizations from chess data: piece movement, piece survaviliy, square usage by player, etc. Sadly not always the authors shows the code/data for replicate the final result. So I wrote some code to show how to do some this great visualizations entirely in R. Just for fun...

Artificial Intelligence and the Future of WorkArtificial intelligence seems like it might work the same way, creating jobs for artificial intelligence researchers and slowly displacing all other kinds of knowledge work. And while this might be where we end up a century from now, the path to get there won’t quite look the way people think...

Understanding the Bayesian approach to false discovery rates (using baseball statistics)
Sometimes, rather than estimating a value, we’re looking to answer a yes or no question about each hypothesis, and thus classify them into two groups...To solve this, we’re going to apply a Bayesian approach to a method usually associated with frequentist statistics, namely false discovery rate control...This approach is very useful outside of baseball, and even outside of beta/binomial problems...Knowing how to work with posterior predictions for many individuals, and come up with a set of candidates for further study, is an essential skill in data science...

How To Figure Out The Gaps In Your Data Science Skill SetWith your unique mixture of academic and non-academic projects, you will feel like there are gaps in your current background. You've searched around the web to see if you can find some insight into your situation, but so far no recommendations on what to do and what to learn have been personal enough for you. Although you feel like you meet the qualifications for a number of data science jobs, you worry that others are more qualified and they'll get the job instead of you...

Jobs

The Strategy and Innovation team leverages the power of data and analytics to shape strategic decisions at Memorial Sloan Kettering Cancer Center, a world renowned organization dedicated to the progressive control and cure of cancer through programs of patient care, research, and education. We are seeking a Data Scientist who will develop computational tools and lead complex analyses that provide insight into the delivery of cancer care. This is a high visibility role with frequent exposure to executive leadership and senior clinicians...

Training & Resources

Machine Learning Isn’t Data ScienceToo often, Machine Learning is used synonymously with Data Science. Before I knew what both of these terms were, I simply thought that Data Science was just some new faddish word for Machine Learning. Over time though, I’ve come to appreciate the real differences in these terms...So, for those too afraid of asking, I’m going to pretend that you asked...

Advanced Jupyter Notebook Tricks — Part IJupyter is so great for interactive exploratory analysis that it's easy to overlook some of its other powerful features and use cases. I wanted to write a blog post on some of the lesser known ways of using Jupyter — but there are so many that I broke the post into two parts...In Part 1, today, I describe how to use Jupyter to create pipelines and reports. In the next post, I will describe how to use Jupyter to create interactive dashboards...

Cosines and correlationThis post will explain a connection between probability and geometry. Standard deviations for independent random variables add according to the Pythagorean theorem. Standard deviations for correlated random variables add like the law of cosines. This is because correlation is a cosine...

Books

"As someone who's done over two decades of research and development on visualization technology, I highly recommend "Now You See It" for everybody - novice to expert. Stephen Few explains visual analysis clearly and conversationally. His examples are accessible, appropriate, and beautiful..."