Issue #213

Dec 21 2017

Google Map's MoatA write-up of all the behind-the-scenes work that goes into taking lots of kinds of raw data and turning it all into high quality Google Maps...

Earth to exoplanet: Hunting for planets with machine learningThough technology has aided the hunt, finding exoplanets isn’t easy. Compared to their host stars, exoplanets are cold, small and dark—about as tricky to spot as a firefly flying next to a searchlight … from thousands of miles away. But with the help of machine learning, we’ve recently made some progress...

A Message from this week's Sponsor:

Join BaseCamp 2018 and become a Data Scientist in 8 weeks. Our bootcamp will fast-track your start in the field by training you on real-life problems. BaseCamp’s curriculum includes Programming, Data Engineering, Machine Learning, and business aspects of Data Science. With data problems from real companies and a hiring day, we offer the most hands-on approach to this career path. The 2018 spring edition will take place in Bratislava, Slovakia. Apply now!

Data Science Articles & Videos

Thoughts on David Donoho’s "Fifty Years of Data Science"Looking back at the efforts of people like Tukey, Cleveland, and Chambers to broaden the meaning of statistics, I would argue that to some extent their efforts have failed. If you look at a textbook for a course in a typical PhD program in statistics today, I believe it would look much like the textbooks used by Cleveland, Chambers, and Tukey in their own studies. In fact, it might even be the same textbook! Progress has been slow, in my opinion. But why is that?...

Keras and deep learning on the Raspberry PiIn keeping with the Christmas and Holiday season, I’ll be demonstrating how to take a deep learning model (trained with Keras) and then deploy it to the Raspberry Pi. But this isn’t any machine learning model… This image classifier has been specifically trained to detect if Santa Claus is in our video stream...

The Guinness Brewer Who Revolutionized Statistics
One of the greatest minds in 20th Century statistics was not a scholar. He brewed beer. Guinness brewer William S. Gosset’s work is responsible for inspiring the concept of statistical significance, industrial quality control, efficient design of experiments and, not least of all, consistently great tasting beer...

Applied Machine Learning at Facebook:
A Datacenter Infrastructure PerspectiveMachine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack...

Simulating Chutes & Ladders in PythonOn the approximately twenty third game of the morning, as we found ourselves in a near endless cycle of climbing ladders and sliding down chutes, never quite reaching that final square to end the game, I started wondering how much longer the game could last: what is the expected length of a game? And then, at some point, it clicked: Chutes and Ladders is memoryless — the effect of a roll depends only on where you are, not where you've been — and so it can be modeled as a Markov process! By the time we (finally) hit square 100, I basically had this blog post written, at least in my head...

How many images do you need to train a neural network?
Today I got an email with a question I’ve heard many times – “How many images do I need to train my classifier?“. In the early days I would reply with the technically most correct, but also useless answer of “it depends”, but over the last couple of years I’ve realized that just having a very approximate rule of thumb is useful, so here it is for posterity...

Exploring the ChestXray14 dataset: problems
A couple of weeks ago, I mentioned I had some concerns about the ChestXray14 dataset. I said I would come back when I had more info, and since then I have been digging into the data. I’ve talked with Dr Summers via email a few times as well. Unfortunately, this exploration has only increased my concerns about the dataset...

Jobs

eCommerce is one of the fastest-growing areas within the consumer products industry and represents a significant opportunity to accelerate growth for PepsiCo going forward.

To ensure we win in this space we have established a dedicated eCommerce group, bringing together world class talent across F&B, digital, and key customers. While tied closely to broader PepsiCo, the eCommerce group has a unique start-up feel and defined values that embrace a more entrepreneurial mindset: bias for action; results oriented; community-focused; prioritization of people.

In order to maintain the necessary pace to meet the growth targets and compete effectively against start-ups or technology competitors requires a step-change in our thinking and traditional approaches to data analytics and utilization. Accordingly, we are seeking a Data Scientist to manage data entry and integrity, data clean-up, query-based analysis, and project management with business users...

Colorized Math Equations
Why aren't more math concepts introduced this way? I colorized a few of my favorite math topics below. Making the colorizations was surprisingly fun. Like writing a haiku, there's a game to trimming down a concept to its essence...

P.S., Want to reach our audience / fellow readers? Consider sponsoring. We just opened up booking for 2018 - grab a spot now; first come first served! Email us for more details - All the best, Hannah & Sebastian

Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.