What could you do with access to the complete — and massive — collection of every public tweet ever tweeted? Earlier this year, Twitter posed this question to academic and research institutions across the globe, ultimately planning to reward some of the best ideas with the first Twitter Data Grants, a pilot initiative that would provide the winning research projects access to that data.

One of those winners is a study of foodborne gastrointestinal illness, taking place at Harvard Medical School under the leadership of Dr. John Brownstein. The group works within the budding field of “digital epidemiology.”

As Brownstein described it, “We know that people use technology to communicate illness. … Our goal is to take that information, organize it and provide a new view of public health.”

Twitter will provide the researchers with data associated with tweets, culled from a set of keywords they have identified, including “diarrhea,” “nausea” and a range of other words related to food and feeling sick.

While all these tweets are public — in other words, not protected by users — Twitter does not typically allow for such extensive access. Results from keyword searches, for example, are limited to a certain number of tweets, and the real-time streaming function Twitter offers also yields only a subset of the tweets that meet the searcher’s criteria.

The information will enable the HMS team to search through the tweets to identify trends and patterns that are potentially associated with known incidents of foodborne illness, in the hopes of suggesting a method for predicting outbreaks — “a new way of monitoring for any issues at restaurants [and] potentially contaminated products,” according to Brownstein.

Because a tweet referencing post-dinner queasiness is not a perfect indication of whether someone indeed experienced a bout of food poisoning, for instance, the group plans to reach out to the authors of relevant tweets, in order to gather more information about their gastrointestinal experience. This more precise data will ideally facilitate a more accurate predictive model. The researchers are considering partnering with local public health departments for purposes of this outreach.

Dr. Brownstein indicated that the primary challenge to working with the data in this way is “a ‘signal and the noise’ issue.”

“We’re talking about smaller clusters of food related events,” he said. “The question is if that’s enough data to identify an event that’s taking place and to be a useful public health signal.”

While other public health researchers and officials may not be able to take advantage of as complete a body of tweets as this study, the digital epidemiologists at Harvard Medical School were able to do similar work previously with the more limited set of tweets from Twitter’s streaming feature. And Brownstein noted that Twitter is becoming more open with its data, so this opportunity may open up to others in the future.

In fact, in its initial announcement of the Data Grants, Twitter acknowledged “it has been challenging for researchers outside the company who are tackling big questions to collaborate with us to access our public, historical data.”

As Twitter moves in the direction of “connecting research institutions and academics with the data they need,” other powerful and perhaps surprising ways this data can be used remain to be seen.