Issue #18

March 27 2014

Editor Picks

Recently Drew Conway gave a great talk to the NYC Data Science Meetup group, examining the field of data science through the lens of the social scientist. Drew knows the intersection of those fields well—he trained as a social scientist, receiving a PhD in political science from NYU in 2013, and is now Head of Data at Project Florida...

Call it the LinkedIn for the 1 percent...the point of RelSci has nothing to do with expanding your circle of friends. The point is to use the people you do know to find a pathway to the rich and powerful ones you don't. It might just be the magic bullet for business development...

Julia is marketed as a super fast high performance scientific computing language that can reach speeds close to native C code...I decided to put Julia up against the python/numpy stack. This isn’t so much a boxing match between Julia and Python but rather an exercise in solving the same problem in two languages and comparing them...

5 Qs for StumbleUpon Principal Data Scientist Debora DonatoThe Center for Data Innovation spoke with Debora Donato, the Principal Data Scientist at San Francisco-based content recommendation company StumbleUpon. Donato spoke about some of StumbleUpon’s insights into different demographics’ interests, as well as the unique opportunities and challenges the mobile environment brings for data scientists...

Difference between Data Scientist and Data AnalystJobs related to Data Science have topped the charts in job portals. There are job openings for various job titles like Data Scientists, Data Analysts, and Data Engineers. Though all these job titles deal with data and sound similar, they do have a number of detailed differences. Ever wondered how different they are from each other? I did! And here are the differences I found between a Data Scientist and a Data Analyst...

Different Customers, Different Prices, Thanks To Big Data
In a traditional bazaar a seller might charge a well-dressed buyer twice as much as another based on visual clues or accents. Big data allows for a far more scientific approach to selling at different prices, depending on an individual’s willingness to pay...

Predicting Customer Churn with scikit-learn
Understanding what keeps customers engagedis incredibly valuable, as it is a logical foundation from which to develop retention strategies and roll out operational practices aimed to keep customers from walking out the door. Consequently, there's growing interest among companies to develop better churn-detection techniques, leading many to look to data mining and machine learning for new and creative approaches...

Learning Deep Face Representation
Face representation is a crucial step of face recognition systems. An optimal face representation should be discriminative, robust, compact, and very easy-to-implement. While numerous hand-crafted and learning-based representations have been proposed, considerable room for improvement is still present. In this paper, we present a very easy-to-implement deep learning framework for face representation...

Warning: Clusters May Appear More Separated in Textbooks than in Practice
Clustering is the search for discontinuity achieved by sorting all the similar entities into the same piles and thus maximizing the separation between different piles. The latent class assumption makes the process explicit. What is the source of variation among the objects? Heterogeneity arises because entities come in different types. [However] ... we are more likely to acknowledge that our clusters overlap early on and then forget because it is so easy to see type as the root cause of all variation...

Saffron Gets $7M to Build Brain-Like Learning Machine
Machine learning is all the rage today in the analytics space, but it's not the right big data tool for all circumstances. A company called Saffron Technology today announced it received a $7 million investment to continue building its big data analytic software that mimics how the human brain learns...

Jobs

The Applied Machine Learning team works on problems across Amazon. From automated targeting to threat detection, our problem space is deep and diverse and we have access to the planet-scale computing resources and datasets required to solve these difficult problems. As a Machine Learning Engineer (MLE) on the team, you'll be part of a highly motivated and independent team that deals with ambiguity every day. It'll be your job to make sense of the inherent ambiguity of the problems you're solving by using your experience, intuition and grit...

Training & Resources

Earlier we covered OLS regression. In this posting we will build upon this foundation and introduce an important extension to linear regression, regularization, that makes it applicable for ill-posed problems (e.g. number of predictors >> number of samples) and helps to prevent overfitting...

Free eBook: Practical Machine Learning: Innovations in Recommendations
Building a simple but powerful recommendation system is much easier than you think. This guide explains innovations that make machine learning practical for business production settings and demonstrates how even a small-scale development team can design an effective large-scale recommender...