Featured Blog Posts – March 2014 Archive (38)

Since they made marijuana legal in Washington state, I went to one of my favorite restaurants in Seattle, and discovered that they now use marijuana as an ingredient in their famous recipes, in particular in their smoky macarons.

It has absolutely no effect (mind altering) on me, but it also reminded me when I was young and tried a few drugs: none produced anything other than mechanical effect (increased heart beat) on me. Even…

prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results

This is data science from the trenches - both a case study, and a tutorial for data sciencist candidates. Here I illustrate how gut feelings, carefully selected data (rather than getting granular data), full understanding of business (horizontal knowledge), high level vision, and outsourcing (to make data science almost free) combined together, makes a data science project…

Data empowers business: it gives us the information we need to make the decisions that drive enterprises, industries, and economies. Big Data enables us to collect a massive amount of information (that we can store, search, share, and analyze) to bring us closer to the goal of finding trends that lead to smarter business decisions. Big data has big impact on businesses, governments, and societies, and its impact is continuously growing. And it gets even bigger than that.

The Data Scientist’s Four-Step Discovery Process

Published in The Economist. It shows the difference in cost-of-living between 2003 and 2013. However, I see two issues:

Making index = 100 for New York both in 2003 and 2013 is wrong. The reader will think New York prices stayed flat over 10 years, and it makes all comparisons 2003-2010 for other cites meaningless, as index might not have evolved the same way outside New York.

The choice of cities listed below is questionable. Why is Mexico City not…

The field of data science continues to grow, and with it come thought leaders who contribute to the industry through outreach and education. Many of the data science professors teaching today are leaders in the big-data field, speaking at conferences, writing books, and even creating groundbreaking big-data developments themselves. Find out which schools boast the most influential leaders in the data science industry.

With all of the discussion about Big Data these days, there is frequest reference to the 3 V’s that represent the top big data challenges: Volume, Velocity, and Variety. These 3 V’s generally refer to the size of the dataset (Volume), the rate at which data is flowing into (or out of) your systems (Velocity), and the complexity (dimensionality) of the data (Variety). Most practitioners agree that…

This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with highly correlated independent variables.…

Introduction

This article describes methods for machine learning using bootstrap samples and parallel processing to model very large volumes of data in short periods of time. The R programming language includes many packages for machine learning different types of data. Three of these packages include Support Vector Machines (SVM) [1], Generalized Linear Models (GLM) [2], and Adaptive Boosting (AdaBoost) [3]. While all three packages can be highly accurate for…