Issue #206

Nov 2 2017

**Special Notice**: We’ve recently been contacted by the eCommerce group of a Fortune 50 company that’s looking to build a top caliber data science team ideally in NYC, but with some flexibility on location. They are looking for both senior and junior roles. If this might be of interest, please hit reply to this week’s newsletter with your relevant details and we will help make the right connections.

What causes wildfires in the US?Recent events were my motivation for this project, where I aim to create a classification model to predict the cause of a wildfire given its features, and create a tool in the form of a Flask application that could help authorities determine the cause of a fire when reasons are unknown...

A Message from this week's Sponsor:

Springboard offers you your own data science expert and career coach with the first online course to offer you a data science job or your money back. They'll help you tailor a personalized skills training and career search strategy that will get you into a data science career. Springboard graduates have been placed at Ford, Verizon, Nielsen, Kaiser Permanente, and the Federal Reserve.

Data Science Articles & Videos

2017 - The State of Data Science & Machine LearningThis year, for the first time, we conducted an industry-wide survey to establish a comprehensive view of the state of data science and machine learning. We received over 16,000 responses and learned a ton about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field...

O.K. Computer - Tell Me What This Smells Like
Over the years, biologists who specialize in the psychophysics of smell have continued to work away at the problem. Earlier this year, Vosshall and her collaborators published a new take on it, this time using computer algorithms...

From Data to Deployment – Full Stack Data Science
In this talk, we walked through an actual Indeed data science full-stack model building process: labeling data, performing analysis, generating features, building the model, validating the model, building infrastructure, deploying the model, and monitoring the solution. We discussed how these techniques are applicable across a broad set of domains...

How do CNNs Deal with Position Differences?An engineer who’s learning about using convolutional neural networks for image classification just asked me an interesting question; how does a model know how to recognize objects in different positions in an image? Since this actually requires quite a lot of explanation, I decided to write up my notes here in case they help some other people too...

Jobs

At Datadog, we’re on a mission to build the best monitoring platform in the world. We operate at high scale—trillions of data points per day—and high availability, providing always-on alerting, visualization, and tracing for our customers' infrastructure and applications around the globe.

Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way. We need you to design and build machine learning-powered products that help our customers learn from their data and make better decisions in real-time.

You will have a fantastic team of data engineers to support you, a collaborative environment to encourage your work, and the best technologies for performing data science at high scale in your toolkit...

Training & Resources

Eager Execution: An imperative, define-by-run interface to TensorFlow
Today, we introduce eager execution for TensorFlow. Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive...

The often-overlooked random forest kernelHere, we’ll discuss a type of kernel called the random forest kernel, which takes advantage of a pre-trained random forest in order to provide a custom-tailored kernel...

Bounter -- Counter for large datasets
Bounter is a Python library, written in C, for extremely fast probabilistic counting of item frequencies in massive datasets, using only a small fixed memory footprint...

P.S., Want to reach our audience / fellow readers? Consider sponsoring. We just have a few slots left in 2017 - grab a spot now; first come first served! Email us for more details - All the best, Hannah & Sebastian

Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.