In 1980, a friend gave Martin Amis a new novel by a young American writer — Wild Oats, by Jacob Epstein, then just 24. Right away, Amis noticed certain similarities, “several phrases and similes,” lifted from his own first novel, The Rachel Papers, published a decade before.

Amis’s hero, for example, “could feel, gradually playing on my features, a look of queasy hope.” Epstein has it that his hero “could feel, playing across his face, a look of queasy hope.” Amis writes of legs “at first spastically shooting out in all directions, then coordinating into a groovy shuffle.” Epstein writes of legs “spastically shooting out in all directions at first, then coordinating into a groovy shuffle.” Where Amis invokes “Dear-Marje wisdom with no results,” Epstein conjures “Ann Landers wisdom, but with no result.”

In all, Amis found 50 instances of this kind of theft. “The boundary between influence and plagiarism will always be vague,” Amis wrote of the case in an essay for The Observer. “Reading Wild Oats, it soon became clear to me that the boundary, however hazy, had been decisively breached.”

Epstein, faced with this accusation, was contrite. “It is the most awful mistake, which happened because I made notes from various books as I went along and then lost the notebook telling where they came from,” he explained to a reporter at the time. The offending passages had been excised from subsequent editions. The first edition “should never have been published.”

If Epstein were a student, and Wild Oats not a novel but an essay, he would have been found out the moment he submitted the manuscript. What he’d appropriated from The Rachel Papers, even the material he’d nominally reworked or reworded, would be flagged, immediately, by computer software designed to identify plagiarism in academic work.

Like the Borg in Star Trek, the Turnitin database gets smarter and more adept over time.

A professor responsible for grading 300 term papers no longer needs to sniff out suspect sentences or paragraphs that seem vaguely out of place. Most colleges and universities, and many high schools, use programs such as Turnitin, which detect plagiarized content like magnetic wands detect metal. Students submit assignments using an online portal, the program scans the text, and when the teacher signs on to look at the batch of work, they can see what percentage of each paper contains recycled material and where every flagged line has been taken from.

Turnitin, the first and most popular plagiarism-detective service, was founded in 1998 by four students at Berkeley, intended to be an online peer-review system. In the early 2000s, it launched as a web service designed to help schools curb the growing trend of copy-and-pasting research without citation from the internet, and it is this speciality purpose that has made it ubiquitous in academia since.

Turnitin uses a “proprietary search algorithm” that “crawls and indexes current and archived web pages, and is comparable to major search engines,” as their About page puts it. They aggregate content from scholarly databases that might not be archived by Google, including “periodicals, biographies, brochures, encyclopedias, magazines, journals, books and abstracts,” as well as medical resources, tens of millions of articles from the academic research publisher Gale and textbooks both new and out-of-print from Pearson and McGraw-Hill. If someone legitimate published it, Turnitin most likely has it in their servers.

Most ingeniously, Turnitin archives every essay students submit. Like the Borg in Star Trek, the Turnitin database gets smarter and more adept over time, growing with every paper fired its way. This instant-archive feature is most useful in preventing collusion: two or more students handing in papers with any appreciable overlap would be flagged. More broadly, it contributes to the vast scale of Turnitin’s resources.

The database has been gathering new material for nearly 20 years now, and the company boasts on its website that its “unparalleled index” contains 929-million archived student papers — a Borgesian library of academic content that makes it extraordinarily difficult for would-be plagiarists to steal anything, anywhere. It’s hard to imagine the kind of obscure content a student would have to unearth for their pilfering to elude the sensors. It would involve more laborious research and drudgery, certainly, than simply writing an original paper.

It’s hard to imagine the kind of obscure content a student would have to unearth for their pilfering to elude the sensors.

Plagiarism seems straightforward enough: a writer uses words that aren’t their own. But Turnitin clarifies how many kinds of theft fall under the plagiarism heading, and how sophisticated, and therefore difficult to catch, some of those kinds of theft can be. Turnitin refers to what it calls the Plagiarism Spectrum, an educational tool which “identifies 10 types of plagiarism based on findings from a worldwide survey of nearly 900 secondary and higher education instructors.”

The Plagiarism Spectrum includes basic forms, such as the Clone (“submitting another’s work, word-for-word, as one’s own”) and the CTRL-C (“contains significant portions of text from a single source without alterations”), as well as more elaborate cons, like the Remix (“paraphrases from multiple sources made to fit together”) and the 404 Error (“includes citation to non-existent or inaccurate information about sources”). Simple clones and CTRL-Cs are easy for humans to root out using the internet — you can plug phrases from an essay into Google and find their original source yourself. But with key words changed and sentence structures altered, it becomes trickier to nail the hybrid-plagiarism fakes. So the Turnitin software scours papers for patterns and structural similarities rather than merely picking out blocks of stolen words.

Read a few college term papers — or just read a few news articles on the web — and you will notice something that looks a lot like plagiarism but isn’t quite. It’s cliché.

Imagine a student in a film studies class assigned to write about Psycho. If they write, at the beginning of their essay, of “director Alfred Hitchcock’s seminal psychological horror movie from 1960,” they will, totally unintentionally, have happened on a sentence strikingly similar to thousands of other film studies essays about Psycho, as well as probably a few hundred movie-review websites, its IMDb and Wikipedia pages, and any number of other sources that default to familiar, slightly hackneyed writing when talking about this film.

Is it plagiarism? Not in academic terms. But it’s difficult for a computer program to know the difference between writing that’s lazy and writing that’s stolen.

“We don’t exclude common phrases and cliché expressions from the algorithm,” a representative from Turnitin explains to me about the process. “We check student work against our database, and if there are instances where student writing is similar to, or matches against, one of our sources, we will flag this for an instructor to review. Ultimately, human judgement is required to make a determination about plagiarism, and it’s likely that, if a commonly used phrase is flagged, an instructor would make the distinction.”

But the main function is more philosophical.

This is typical of the company’s broader view of its role as a kind of policing service. Turnitin isn’t there to mechanically find fault and punish students for infractions. It aims to be a “conversation starter,” and it emphasizes the need, in the face of student error or lapses of judgement, of “a larger teaching moment around the importance of original writing, proper citation, and academic integrity.”

Turnitin’s own data points out that “the odds of writing the same 16 words in the same order by chance are one in a trillion.” The software is very good at catching instances where words are in the same order and it is virtually impossible, statistically, for it to be a coincidence. But the main function is more philosophical. Turnitin gets people thinking about what it means to plagiarize, and, the hope is, gives them a better understanding of how to write.

The internet makes it possible for Turnitin to crack down on most forms of plagiarism, most of all the kinds of plagiarism that involve copying and pasting. It’s ironic, because the internet and the computer’s copy-paste function created a plagiarism boom in the late 1990s and early 2000s, when computer literacy was low among educators and before Turnitin had taken hold.

An article in The New York Times from 2001 warned that, “in this era of cut and paste,” “a new generation of students is faced with an old temptation made easier than ever,” as several high-profile cases of academic plagiarism at the time “painted in sharpest relief how easy cheating had become.” A contemporary survey cited in the article found that “more than half” of high school students across the United States “admitted either downloading a paper from a Web site or copying a few sentences from a Web site without citation.”

As teachers became more computer savvy, and indeed as schools began making conscious efforts to fight plagiarism, this Wild West copy-and-paste abandon was brought under control. It would be a tremendously lucky student — and an exceptionally careless teacher — who was allowed to pass an essay downloaded from the internet off as his own work today. Enforcement, when it comes to plagiarism, is largely a matter of deterrence. In other words, if you know your school has the ability to spot stolen material with flawless accuracy, you are significantly less likely to try — and if you are stupid enough to try anyway, and you get caught and disciplined, you will almost definitely not try a second time. Once proven effective, just the threat of Turnitin does the work.

There will always be ways to game the system.

A study conducted last year on the program’s behalf found that, among students submitting essays using its software, “levels of unoriginal content” and “rates of similarity” had “dropped significantly by their second paper.” Noticing their tendencies to cite improperly or borrow too generously, students tended to “correct their practices” and be more conscious of the importance of proper citation and original work. “This study found that these effects are long-lasting, occur in both secondary and higher education institutions, and appear across the globe regardless of the country in which the students were studying.”

Of course, there will always be students who want to cheat. And students being savvy, there will always be ways to game the system, to thwart the software, to elude capture by the robots there to ferret thieves out. One of the last frontiers for academia is the ghostwritten essay, the essay for hire — what’s known as “contract cheating,” defined as “the practice of students engaging a third party individual or service to complete their written assessments.” Turnitin has developed a new program, called Authorship Investigate, designed to target ghostwriters and those who would hire them in lieu of writing their own work. It will “use a combination of machine learning algorithms and forensic linguistic best practices to detect major differences in students’ writing style between papers.”

What’s remarkable about the Wild Oats scandal, in retrospect, is how far along it managed to get before someone realized anything was wrong. Epstein’s editors never noticed he was stealing. The fact-checkers and copy-editors at Little, Brown, Epstein’s publisher, didn’t catch the crime. Once it was actually printed and bound, out in the world, on bookshelves and in shop windows, it was widely read, discussed, celebrated, even effusively reviewed, by many people who’d either never read, or didn’t remember, a successful novel by Martin Amis.

Epstein’s extensive cribbing from Amis went unobserved until Amis himself read Epstein — alarming, when you consider he might never have gotten around to it. If nothing else, this situation demonstrates how easy it was, circa 1980, for anyone, published novelists included, to plagiarize. Epstein almost got away with it. With Turnitin, he would have been caught.

“The psychology of plagiarism is fascinatingly perverse,” Amis wrote of Epstein, when the case broke. “It risks, or invites, a deep shame, and there must be something of the death-wish in it.” That death-wish clearly remains, for writers of the Turnitin era no less than for writers of the 1980s, as evidenced by the revelations and exposes of plagiarism that have lately been in the news. Technology can help us detect theft — can cross-reference infinite databases, trawl staggering libraries of pre-existing text, to nose out the culprits, often as soon as they’ve committed the crime. But it can’t extinguish the impulse to steal, the literary kleptomania that compels writers to take from another.