About 1.8 million new scientific papers are published each year, and most are of little consequence to the general public — or even read, really; one study estimates that up to half of all academic studies are only read by their authors, editors, and peer reviewers.

But the papers that are read can change our understanding of the universe — traces of water on Mars! — or impact our lives here on earth — sea levels rising! — and when journalists get called upon to cover these stories, they’re often thrown into complex topics without much background or understanding of the research that led to the breakthrough.

As a result, a group of researchers at Columbia and Stanford are in the process of developing Science Surveyor, a tool that algorithmically helps journalists get important context when reporting on scientific papers.

“The idea occurred to me that you could characterize the wealth of scientific literature around the topic of a new paper, and if you could do that in a way that showed the patterns in funding, or the temporal patterns of publishing in that field, or whether this new finding fit with the overall consensus with the field — or even if you could just generate images that show images very rapidly what the huge wealth, in millions of articles, in that field have shown — [journalists] could very quickly ask much better questions on deadline, and would be freed to see things in a larger context,” Columbia journalism professor Marguerite Holloway, who is leading the Science Surveyor effort, told me.

Science Surveyor is still being developed, but broadly the idea is that the tool takes the text of an academic paper and searches academic databases for other studies using similar terms. The algorithm will surface relevant articles and show how scientific thinking has changed through its use of language.

For example, look at the evolving research around neurogenesis, or the growth of new brain cells. Neurogenesis occurs primarily while babies are still in the womb, but it continues through adulthood in certain sections of the brain.

Up until a few decades ago, researchers generally thought that neurogenesis didn’t occur in humans — you had a set number of brain cells, and that’s it. But since then, research has shown that neurogenesis does in fact occur in humans.

“This tells you — aha! — this discovery is not an entirely new discovery,” Columbia professor Dennis Tenen, one of the researchers behind Science Surveyor, told me. “There was a period of activity in the ’70s, and now there is a second period of activity today. We hope to produce this interactive visualization, where given a paper on neurogenesis, you can kind of see other related papers on neurogenesis to give you the context for the story you’re telling.”

Development of Science Surveyor began in 2014, and the researchers spent most of its first year examining similar previous efforts and beginning to think about the user interface.

In May, the team won a $500,000 Flagship Magic Grant from the Brown Institute for Media Innovation at Columbia that will fund the development of Science Surveyor through the current academic year as the researchers pursue three goals: work on the algorithms that will be used to determine the context and consensus around the research papers; develop a way to effectively visualize the information that Science Surveyor unearths; and then apply those tools in actual reporting to demonstrate how they can work.

Stanford professors Dan Jurafsky and Dan McFarland joined the team in December 2014 to add some computer science muscle, and they’ve been leading the effort to develop the computational methodologies that power Science Surveyor. Essentially, the researchers’ goal is to develop a mechanism that will take the paper and compare it to a database of previous research and determine how it compares to prior scientific contributions.

“If you read a paper, often the question is: Are there other similar developments?” Tenen said. “Is there consensus around this topic, or do trees really cause global warming? Is this a crazy paper that’s an outlier, or is it in the middle of some consensus cluster? We’ve been working on an algorithm that identifies consensus clusters, basically. I think we can have a scholarly contribution to the field around this consensus clustering issue.”

While the Stanford team focuses on the backend, Laura Kurgan, a Columbia architecture professor who leads Columbia’s Spatial Information Design Lab, is directing the effort to figure out how to best showcase the information surfaced in Science Surveyor. Last year, they began by creating network-analysis diagrams, but they’re working on developing a product that would “make very clear where the new findings would fit against the larger context,” Holloway said.

“We had very straightforward and simple graphics in our first iteration of this, and we all feel very strongly that it has to be something that is very simple, very visually engaging, but it can’t be complex,” she said. “What exactly that looks like, we don’t know yet.”

And Holloway and her journalism students are planning on using Science Surveyor to test the prototype and show how the tool could benefit reporters. They’re planning on focusing on stories in the areas of climate science and neuroscience.

A key to the ongoing development of the project, Holloway said, has been its interdisciplinary nature. By combining the work of computer scientists, designers, and journalists, she’s hopeful that they’ll be able to create a product that’s valuable and usable.

“You would just get one piece of it that wouldn’t necessarily be easily used by many different communities,” she said. “You might get the computational side worked out, but that might necessarily serve the needs of the science journalists, or it might be really gorgeous, but the algorithms aren’t as robust or creative as they could be.”

Everything the Science Surveyor team produces will be open source and will live on GitHub. During this initial year, the team is focusing on producing the code and sharing it in a way that someone in a newsroom with some coding knowledge would be able to use it.

“There’s a difference between a software tool and like a full-blown [tool] where you don’t have to know anything and you just click buttons and it does stuff for you,” Tenen said. “That sort of tool could be down the line — the one that’s completely consumer-facing, where you don’t need to have any computational knowledge. For this year, we would be happy just to produce more in the order of a software library.”

For now, the project is just an experiment, but they’re hopeful that a library or database might take interest in the project and that journalists using,Web of Science or Google Scholar, for example, might be able to use Science Surveyor on those databases.

Still, Holloway cautioned that the development process will likely be a lengthy one. “It’s a many-year project,” she said. “The fact that something like this doesn’t already exist speaks to the challenges.”