Yahoo releases largest ever machine learning dataset to research

Yahoo Labs today announced that it will make available to researchers a dataset that’s record-breaking in terms of data density. The Yahoo News Feed dataset contains 110 billion events drawn from samples of anonymised transactions on the portal’s news feed, and weighs in at a whopping 1.5 terabytes zipped. The data release, part of the company’s Webscope initiative and announced on Yahoo’s Tumblr blog, is intended for researchers to use in validating recommender systems, high-scale learning algorithms, user-behaviour modelling, collaborative filtering techniques and unsupervised learning methods. Research on data mining will be enabled by the provision of local timestamps and limited information on the device used whilst accessing the news feeds – and these facets will also be useful in research on contextual recommender systems, currently one of the hottest…