Issue #11

February 6 2014

Editor Picks

The ability to instrument and interrogate data as it moves through a processing pipeline is fundamental to effective machine learning at scale. It can be challenging, however, to understand where in a workflow to employ data visualization, and, once committed to doing so, developing revealing visualizations that suggest clear next steps can be similarly daunting. In this talk we'll describe the role that information visualization technologies play in the LinkedIn data science ecosystem, and explore best practices for understanding the structure of large-scale data in a production environment...

We recently caught up with Abe Gong, Data Scientist at Jawbone and thought-leader in the Data Science community. We were keen to learn more about his background, his work at Jawbone and his latest side projects - including thought-provoking insights on how the ROI on Science is evolving...

At most companies, advanced analytics expertise is contained in a lab environment: a small team of analysts sitting at their computers and churning out reports and insights to support business decisions. But the real impact from advanced analytics comes from building models that make real-time decisions within production workflows. In this talk, Josh Wills (Senior Director Data Science at Cloudera) discusses how to use the ecosystem of technologies around Hadoop to support bringing models out of the lab and into the factory, with a focus on strategies for data integration, large-scale machine learning, and experimentation...

Data Science Articles & Videos

Why Machine Learning and Big Data need Behavioral EconomistsResearchers from Princeton University received mass media attention when they recently predicted the demise of Facebook. Data scientists at Facebook soon hit back with their own ‘study:’ “In keeping with the scientific principle (used by Princeton) ‘correlation equals causation,’ their research unequivocally demonstrated that Princeton may be in danger of disappearing entirely.” ... Causality can be very complex. This is where Machine Learning and Behavioral Economics meets Big Data....

Astrophysics: The Icing on the Big Data CakeAccording to Dr. Kirk Borne, Professor of Astrophysics and Computational Science at George Mason University, astrophysics data challenges represent some of the wider problems in big data. Dr. Borne embraces data science with boundless enthusiasm. Dr. Borne recently shared details about his astrophysics work and offered a unique perspective on the current state of big data...

Ten Years of NFL Plays Analyzed, Visualized, Quizzified
It’s third-and-3 and you desperately need a first down. What do you do, run or pass? We’ve structured ten years of NFL play-by-play data (raw data complements of Advanced NFL Stats), then uploaded it into Statwing for analysis.Now you can test your coaching instincts against the data...

Data Science and Football, together at Facebook
One of the most exciting opportunities created by the introduction of data science to football is the ability to analyze massive amounts of non-traditional data to learn more about the sport. Social media data is often noisy and unstructured, but presents a great snapshot of what fans are thinking and saying in real time. Although a great deal of existing work tries, with varying levels of success, to predict things with Twitter [PDF], Sean Taylor has taken a different tack and is trying to learn about the fans themselves...

Modeling the Evolution of User Expertise through Online Reviews
Recommending products to consumers means not only understanding their tastes, but also understanding their level of experience. For example, it would be a mistake to recommend the iconic film Seven Samurai simply because a user enjoys other action movies; rather, we might conclude that they will eventually enjoy it -- once they are ready. The same is true for beers, wines, gourmet foods -- or any products where users have acquired tastes: the `best' products may not be the most `accessible'. Thus our goal in this paper is to recommend products that a user will enjoy now, while acknowledging that their tastes may have changed over time, and may change again in the future...

High-Reproducibility & High-Accuracy Method for Auto Topic ClassificationMuch of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. We propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable...

Jobs

Movable Ink is leading the charge to make email marketing deliver real value to consumers. We’re one of the fastest growing start-ups in New York City, and welcome your interest in joining our amazing team. If you are passionate about innovation, and want to work somewhere where everyone takes a leadership role, Movable Ink might be the right place for you!...

Training & Resources

Sharing data is hard. Emails have size limits, and setting up servers is too much work. We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds...

Are you fascinated by robots? Well, these 'toys' may be astonishing for many but are complicated structures from within and require a lot of hard work. If you too are planning to kick start your career in robotics or if you are already studying and require some resources, here we bring some help with 6 video tutorials and playlists on Machine Learning! Happy Robotics!...

Introduction to Python for Statistical Learning
The first session in our statistical learning with Python series will briefly touch on some of the core components of Python’s scientific computing stack that we will use extensively later in the course. We will not only introduce two important libraries for data wrangling, numpy and pandas, but also show how to create plots using matplotlib...

Data Science Use CasesA fairly comprehensive and interesting list of current data science use cases - compiled by Kaggle...

P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)

Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.