Issue #116

February 11 2016

Editor Picks

Text Mining South ParkSouth Park, an adult animated television series spanning nearly 20 years, follows four main characters (Stan, Kyle, Cartman and Kenny) and an extensive ensemble cast of recurring characters. This analysis reviews their speech to determine which words and phrases are distinct for each character...

What has Kaggle learned from 2 million machine learning models?Kaggle is a community of almost 450K data scientists who have built nearly 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons on winning techniques we have learned from the Kaggle community...

Two Minute Papers - How Do Genetic Algorithms Work?Genetic algorithms are in the class of evolutionary algorithms that build on the principle of "survival of the fittest". By recombining the best solutions of a population and every now and then mutating them, one can solve remarkably difficult problems that would otherwise be hopelessly difficult to write programs for...

The happiness paradox: your friends are happier than youMost individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood...

AI is Transforming Google Search. The Rest of the Web is NextYesterday the Google veteran who oversees the company’s search engine, Amit Singhal, announced his retirement. And in short order, Google revealed that Singhal’s rather enormous shoes would be filled by a man named John Giannandrea. On one level, these are just two guys doing something new with their lives. But you can also view the pair as the ideal metaphor for a momentous shift in the way things work inside Google—and across the tech world as a whole. Giannandrea, you see, oversees Google’s work in artificial intelligence...

Introducing Vector NetworksThe pen tool as we know it today was originally introduced in 1987 and has remained largely unchanged since then. We decided to try something new when we set out to build the vector editing toolset for Figma. Instead of using paths like other tools, Figma is built on something we’re calling vector networks which are backwards-compatible with paths but which offer much more flexibility and control...

Class visualization with bilateral filtersA while ago I played with style visualizations and bilateral filters. The latter have the nice property of filtering out noise but preserving edges. Here are some example class from GoogLeNet (Inception network). ...

Interviewing Data Science Interns at Analytical Flavor Systems
We could exclusively hire interns who have previous experience with machine learning or “data science"…but we’d miss out on great candidates who are smart and driven to learn.
So how can we structure a data science interview for students who may not know data splitting, feature engineering, pre-processing, model building and hyperparameter optimization, model stacking, and withholding set validation?...

Jobs

Murmuration seeks massively improved education outcomes for kids by providing information, infrastructure, and support for education-related public advocacy and community building efforts. We are looking for an experienced, innovative Data Scientist to join our rapidly growing internal analytics team. The ideal candidate is more than a number cruncher. The role calls for strong expertise in predictive modeling, statistical analysis, and data visualization as well as the ability to clearly communicate complex analysis to non-technical audiences...

Training & Resources

Auto-scaling scikit-learn with SparkWe are excited to release a scikit-learn integration package for Spark that dramatically simplifies the life of data scientists using Python. This package, published as databricks:spark-sklearn (or spark-sklearn for short), automatically distributes the most repetitive tasks of model tuning on a Spark cluster, without impacting the workflow of data scientists:...

Data Science Deep Dive: Using the RevoScaleR Packages
This tutorial is an introduction to the enhanced R packages provided in SQL Server R Services. You will learn how to use the scalable enterprise framework for execution of R packages in Microsoft SQL Server 2016. A data scientist can use this new service to build custom R solutions that run in either local or server contexts, to support high-performance big data analytics...