The dataset is about 13.5 terabytes and holds the interactions of roughly 20 million users from February 2015 through May 2015, including those happening on Yahoo's homepage along with Yahoo News, Yahoo Sports, Yahoo Finance, and Yahoo Real Estate, according to Tech Crunch.

The dataset also contains demographic information such as age range, gender, and generalized geographic data. Items in the dataset include title, summary, and key phrases of the news article in question, local timestamps, and some device information, the tech website noted.

"Research scientists at Yahoo Labs have long enjoyed working on large-scale machine learning problems inspired by consumer-facing products," Rajan said in the Yahoo statement. "This has enabled us to advance the thinking in areas such as search ranking, computational advertising, information retrieval, and core machine learning.

"A key aspect of interest to the external research community has been the application of new algorithms and methodologies to production traffic and to large-scale datasets gathered from real products," he continued.

"Access to datasets of this size is essential to design and develop machine learning algorithms and technology that scales to truly 'big' data," Gert Lanckriet, a professor in the department of electrical and computer engineering at the university said in a statement, according to ZDNet.com.