Issue #120

March 10 2016

Editor Picks

Understanding Popularity on RedditThere are lots of theories about what makes something go viral; content should be emotionally compelling, it should appeal to a broad audience, etc,. Some people say they can’t define it but they know a great viral image when they see one. How accurate are these theories? Does our daily exposure to Reddit make us experts at predicting what posts will rise to internet fame? We have no idea but we built a game to find out...

After 150 Years the ASA Says "No" To p-valuesAnd yet, wonder of wonders, the American Statistical Association has finally taken a position against p-values. I never thought this would happen in my lifetime, or in anyone else’s, for that matter, but I say, Hooray for the ASA!...

A Message from this week's Sponsor:

Are you throwing models over to the wall to engineering?
At Yhat, we've developed a better way for data science teams and application developers to get analytical code into production. ScienceOps eases the handoff from data scientist to developer by making predictive models accessible via REST API requests.

Start shipping analytics fast, frequently and reliably. Sign up for a demo today!

Detecting Emotion in Faces Using Geometric FeaturesRecognizing emotions in facial expressions is relatively straightforward for humans, and in recent times machines are getting better at it too. The applications of emotion-detecting computers are numerous, from improving advertising to treating depression, the possibilities are limitless. Motivated mainly by the impact in mental health that such technology can have, I started building my own emotion recognition technology...

Deep Learning for Robots: Learning from Large-Scale InteractionWhile we’ve recently seen great strides in robotic capability, the gap between human and robot motor skills remains vast. Machines still have a very long way to go to match human proficiency even at basic sensorimotor skills like grasping. However, by linking learning with continuous feedback and control, we might begin to bridge that gap, and in so doing make it possible for robots to intelligently and reliably handle the complexities of the real world...

Bayesian Estimation of G Train Wait TimesIn the hope of not letting decent data go to waste, even if it's only 19 rows, this post is about pulling useful information from a weird, tiny dataset that I collected during the summer of 2014: inter-arrival times of the G train, New York City's least respected and (possibly) most misunderstood subway line...

What I Use to Visualize Data
“What tool should I learn? What's the best?” I hesitate to answer, because I use what works best for me, which isn't necessarily the best for someone else or the “best” overall...

BallR: Interactive NBA Shot Charts with R and ShinyThe NBA’s Stats API provides data for every single shot attempted during an NBA game since 1996, including location coordinates on the court. I built a tool called BallR, using R’s Shiny framework, to explore NBA shot data at the player-level...

Evaluating Container Platforms at ScaleThis article addresses three questions about scaling Docker Swarm and Kubernetes. What is their performance at scale? Can they operate at scale? What does it take to support them at scale?...

Jobs

Quartet Health is committed to enabling every person in our society to thrive by building a collaborative mental and physical health ecosystem. We are looking for a Data Scientist with a passion for data visualization, strong coding skills, and a history of working closely with big data analytics. You must have experience with SQL, Python, R, Java or other programming languages. If you are looking to make a positive impact in thousands of patients, and share a passion to improve mental health, we have an opportunity for you...

Training & Resources

D3.js ScreencastsYou need to create a D3.js data visualization to communicate your insights. But... #d3BrokeAndMadeArt! This time, your data join appears to have broken and the JavaScript console shows an error you don't recognize. Last time, you got stuck trying to figure out how to make axes that didn't look like 3rd grader made them. It makes you want to strangle D3 with your bare hands. Just how steep does the D3 learning curve need to be?! What if you could learn and master D3 quickly and deeply? Introducing DashingD3js.com screencasts...

7 Datasets You've Likely Never Seen BeforeI have a special place in my heart for funny, random data that you don't stumble across everyday. It can be a little bit harder to find (I suppose this is sort of a self-fulfilling property) but in my experience, it's well worth the extra digging. In this post, I'm going to go over 7 datasets that I've found over the years that I think are worth sharing. They are be a little bit obscure, but I can assure you they are quite a bit of fun!...

A (small) introduction to BoostingBoosting is a machine learning meta-algorithm that aims to iteratively build an ensemble of weak learners, in an attempt to generate a strong overall model. Lets look at the highlighted parts one-by-one...