Shakespeare famously suggested that “a rose by any other name would smell as sweet.” Penn engineer Alejandro Ribeiro allows us to consider a similar question about Shakespeare himself: Would the Bard’s plays, if they were written with the help of someone else, still be as great?

Ribeiro, the Rosenbluth Associate Professor of Electrical and Systems Engineering, did not set out to assign credit for Elizabethan-era dramas. His main concern is networks of big data, like mapping how people are connected on Facebook. In 2012 he became interested in a particular question that he realized he could study by comparing word patterns in long pieces of writing. Over the next three years, he developed a method of analysis that allowed him to very accurately determine when two texts were written by the same person—even when the author of one of the texts wasn’t known ahead of time.

“It is possible for me to distinguish who the author is within the order of a two to three percent error rate. This is partly a result of the value of the tool, but mostly a result of the predictability of human beings,” says Ribeiro.

To make such distinctions, Ribeiro and his collaborators, former graduate students Santiago Segarra EAS’09 Gr’16 and Mark Eisen EE’14, first identified the most common connecting words in written English—like the, and, to, and with. These “function” words are used by all writers regardless of what they’re writing about, but in subtly different ways. For instance, one writer might use the following to more often than another, or be less likely to use the word with closely followed by the word in. These elements of style are easily overlooked by a reader focused on the meaning of a text but show up clearly when the researchers build what they call the “word adjacency network” of a text.

“What we’re looking at are the habits of the author in putting certain words near certain other words. It turns out one can quickly show that these habits are distinctive,” says Gabriel Egan, a Shakespeare scholar at De Montfort University in England, who joined Ribeiro in this research.

In July 2014 Ribeiro, Segarra, and Eisen posted an article to the scientific preprint site Arxiv.org on their word adjacency networks. The paper explained how word adjacency networks are akin to a statistical structure called a Markov chain, in which the links between words can be roughly interpreted as how often one word is used following another.By comparing the Markov chain from one text with the Markov chain from a second text, they could make a good estimate of the likelihood they were written by the same person.

Only after creating this technique did Ribeiro and his students go looking for ways to test it. Ribeiro says his wife is a big Shakespeare fan and he was aware that there was a longstanding debate about whether Shakespeare really wrote all the plays that bear his name. Competing theories range from the crackpot narrative in the 2011 movie Anonymous, which posits that Shakespeare didn’t write any of his own work, to more nuanced scholarly arguments about whether Shakespeare worked with unrecognized collaborators. With the help of Penn English professor Zachary Lesser, Ribeiro got in touch with Egan, who’d done a little work on authorial attribution of Shakespeare’s plays in the 1990s.

“I was very impressed when I heard Alejandro’s idea. I was thinking, ‘That’s brilliant, that’s very clever,’” Egan says.

Egan helped Ribeiro, Segarra, and Eisen to identify other leading dramatists who might have worked with Shakespeare and select texts to analyze for each playwright. They settled on seven potential collaborators and created word adjacency networks for each one, as well as a word adjacency network for Shakespeare based on 28 plays commonly thought to have been written by him. The word adjacency networks revealed that each author had his own clear grammatical tics. Ben Jonson, for example, was more likely than Shakespeare to use and following a.

After building profiles for each author, the researchers ran a thought experiment. They took each Shakespearean play and said: Suppose we didn’t know who wrote it; based on the word adjacency networks of the eight candidates, who would we suspect was the author?

To the disappointment of conspiracy theorists, Ribeiro and his coauthors found that most of Shakespeare’s plays, including all of the most famous ones, seem to have been written by, well, Shakespeare. But there were some notable exceptions. One was Two Noble Kinsmen, which was the last play Shakespeare wrote and is generally understood to have been a collaboration between Shakespeare and the younger dramatist John Fletcher. The researchers’ analysis supported that connection.

More surprisingly, the researchers found strong evidence of collaboration in Shakespeare’s Henry VI trilogy. Those plays were written early in Shakespeare’s career, and scholars have long suspected a second hand may have been involved, though they didn’t know whose. Ribeiro’s work points squarely to Christopher Marlowe, a famed contemporary of Shakespeare who is known to have influenced the Bard before being stabbed to death at the age of 29.

“Marlowe was the dominant playwright on the scene already,” says Lesser. “It’s always been interesting to imagine, had Marlowe not died right as Shakespeare was starting out, would Marlowe be the figure we are talking about today rather than Shakespeare?”

Word adjacency analysis suggests we might in fact be talking about Marlowe’s work more than we realize.

“The plays are relatively far from Shakespeare’s profile, which indicates they’re not entirely Shakespeare’s plays,” says Ribeiro.He adds that the “Shakespearean” elements are a telltale sign of the Bard’s work on the plays, but that “what’s remarkable about them is the similarity to Marlowe’s profile.”

The evidence in this new research that Shakespeare was especially collaborative at the beginning (Henry VI) and end ( Two Noble Kinsmen) of his career is consistent with what we know about theatrical practice at the time. Established and less-established playwrights commonly worked together in a mentor-apprentice relationship. Still, there are caveats to this new research. The main one is that most plays from Shakespeare’s age have been lost completely. Marlowe may be the best match among the seven playwrights tested, but there could be someone else lurking outside the scope of historical memory.

“It’s entirely possible there could be some [unknown] other figure who is a far better candidate than Marlowe,” says Lesser.

Ribeiro and his collaborators published the full results of their study in Shakespeare Quarterly, and a future edition of theNew Oxford Shakespeare will credit Marlowe as a coauthor of the Henry VI plays, in part because of this new work. Already the researchers are looking ahead to other applications of their technique. In particular, they intend to use it to sort among competing editions of Shakespeare’s plays to determine which ones are likely truest to Shakespeare’s own words. To take one example, the first edition of Hamlet renders the agonized prince’s indecision as “to be or not to be, I there’s the point,” while later editions give us the more familiar phrasing, “to be or not to be, that is the question.”

Which way did Shakespeare really write it? Perhaps Ribeiro and his coauthors will be able to provide an answer.