Issue #133

June 9 2016

Editor Picks

What should we learn from past AI forecasts?To inform the Open Philanthropy Project’s investigation of potential risks from advanced artificial intelligence, and in particular to improve our thinking about AI timelines, I (Luke Muehlhauser) conducted a short study of what we should learn from past AI forecasts and seasons of optimism and pessimism in the field...

Marijuana through the lens of the New York Times
The legality of and public’s view towards marijuana is rapidly changing as more states decriminalize and legalize the drug. As such, how have the words associated with marijuana in news articles changed over time?...

10 Data Acquisition Strategies for StartupsThe “unreasonable effectiveness” of data for machine-learning applications has been widely debated over the years. It has also been suggested that many major breakthroughs in the field of Artificial Intelligence have not been constrained by algorithmic advances but by the availability of high-quality datasets. The common thread running through these discussions is that data is a vital component in doing state-of-the-art machine learning...

A Message from this week's Sponsor:

“The Science of Data-Driven Storytelling”
DataScience Inc. and the National Science Foundation’s West Big Data Innovation Hub have brought together leaders in academia, the non-profit sector, government, data science and publishing to discuss best practices for creating impactful data-driven stories. Click here to register for the live-streamed workshop, “The Science of Data-Driven Storytelling”.

Data Science Articles & Videos

Why did you choose Python for Machine Learning?Oh god, another one of those subjective, pointedly opinionated click-bait headlines? Yes! Why did I bother writing this? Well, here is one of the most trivial yet life-changing insights and worldly wisdoms from my former professor that has become my mantra ever since: "If you have to do this task more than 3 times just write a script and automate it."...

The Ugly Little Bits Of The Data Science ProcessThis morning there was a great conversation on Twitter, kicked off by Hadley Wickham, about one of the ugly little bits of the data science process. Hadley: "During a data analysis, you'll often create lots of models. How do you name them? Who has written good advice on the subject?"...I found the question incredibly interesting. It’s one of those weird dark corners where data science and the reality of computers and file systems bump up. Almost anyone who has ever done data science has found themselves with a folder looking something like this...

There is still only one testIn 2011 I wrote an article called "There is Only One Test", where I explained that all hypothesis tests are based on the same framework, which looks like this...Here are the elements of this framework...

Bayesian Deep LearningThere are currently three big trends in machine learning: Probabilistic Programming, Deep Learning and "Big Data". Inside of PP, a lot of innovation is in making things scale using Variational Inference. In this blog post, I will show how to use Variational Inference in PyMC3 to fit a simple Bayesian Neural Network...

Jobs

You will apply your mathematical or scientific training to analyze large volumes of diverse data, model complex human-scale problems, and develop algorithms to serve various needs... You will work in collaboration with other mathematicians and scientists in Research and with data engineers and design technologists across Analytics to imagine and build creative solutions to challenging questions, most often with a clear line of sight from your work to real-world impact...

Training & Resources

TPOT: A Python tool for automating data scienceIn this article, we’re going to go over three aspects of machine learning pipeline design that tend to be tedious but nonetheless important. After that, we’re going to step through a demo for a tool that intelligently automates the process of machine learning pipeline design, so we can spend our time working on the more interesting aspects of data science...

A Gentle Introduction To The Basics Of Machine LearningMachine Learning is one of the most important technologies nowadays. It is the key element that allows a computer to learn. It is in fact a subfield of Artificial Intelligence (AI). Right now AI is affecting all industries, from finance to medicine, from aerospace to e-commerce. All technological giants such as Microsoft, Google, Amazon, etc. are investing large amounts of money in the area. This post intends to explain the main concepts of Machine Learning to the general public...

A Gentle Introduction to Bloom Filter
Bloom filters are probabilistic space-efficient data structures. They are very similar to hashtables; they are used exclusively membership existence in a set. However, they have a very powerful property which allows to make trade-off between space and false-positive rate when it comes to membership existence. Since it can make a tradeoff between space and false positive rate, it is called probabilistic data structure...

Books

Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels...