Brewing Multivariate BeerI was toying around with the idea of multivariate beer, along the same lines as Data Cuisine. I wanted to represent county demographics with beer ingredients. The higher a value, say, population density, the more hops I use, or the higher the median household income the more of a particular grain...

DopeLearning: A Computational Approach to Rap Lyrics GenerationWriting rap lyrics requires both creativity, to construct a meaningful and an interesting story, and lyrical skills, to produce complex rhyme patterns, which are the cornerstone of a good flow. We present a method for capturing both of these aspects. Our approach is based on two machine-learning techniques: the RankSVM algorithm, and a deep neural network model with a novel structure...

Science: Surfing the 4th Largest Stream of DataTwitch is the 4th largest stream of data on the internet. Conclusion: People really like watching each other play video games. I’ve always been curious about how people find content on Twitch, so I dug into the discovery process in aggregate...

A Week of Mining Seattle's Craigslist Apartment PricingEveryone is freaking out over San Francisco astronomically high rent prices right now when Seattle real-estate isn’t that far behind... I quickly went home and started trying to figure out how fast rent was rising in Seattle. Since reliably the best place to find apartments has been on Craigslist, I created a script using Scrapy to grab listings of apartments on Seattle craigslist and filtered them for the zipcodes within the Seattle boundaries...

Time-lapse Mining from Internet PhotosWe introduce an approach for synthesizing time-lapse videos of popular landmarks from large community photo collections. The approach is completely automated and leverages the vast quantity of photos available online...

Why We Need a Statistical RevolutionMy father told me the most important thing about solving a problem is to formulate it accurately, and one would think that, as statisticians, most of us would agree with that advice....

A Face in Numbers: Microsoft, Google & Big Data
Maybe we should add Microsoft’s HowOldRobot onto this naughty list. Their playful tool was enjoyed all over the world this month, but does it truly create a better guess as to your age? Here we seek to mathematically answer at what point should we reward only true precision and not what any probabilist can discern as a blooming, random number generator...

Tracking Economic Development with Open Data and Predictive Algorithms
The world runs on data, but all too often, the effort to acquire fresh data, analyze it and deploy a live analysis or model can be a challenge. We here at Algorithmia spend most of our waking hours thinking about how to help you with the latter problem, and our friends at Socrata spend their time thinking about how to make data more available. So, we combined forces to analyze municipal building permit data (using various time series algorithms), as a reasonable proxy for economic development...

Jobs

Bonobos is (proudly) the largest born-on-the-Internet men's apparel company in the US and we're (humbly) seeking to reinvent the traditional eCommerce experience...The Data Science and Engineering team at Bonobos was formed to make Bonobos the most data savy apparel company the world has ever seen. We champion the value of intellectual honesty in the company through data driven decision making. As the largest apparel brand ever built on the web in the United States, we have detailed insight into our customers’ behavior and preferences. The Data Science and Engineering team puts that information to use by building systems that recommend products, personalize the customer experience, and optimize operations...

Books

A pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free...

"If you're interested in learning about the current debates/ problems/ challenges in statistics, this is a good primer. The book is written for scientists, but it can benefit anyone who deals with data and analysis on a regular basis..."