Archives

Follow us on Twitter

Papers with simpler abstracts are cited more, study suggests

Research papers containing abstracts that are shorter and consist of more commonly used words accumulate citations more successfully, according to a recent study published in the Journal of Informetrics.

After analyzing more than 200,000 academic papers published between 1999 and 2008, the authors found that abstracts were slightly less likely to be cited than those that were half as long. Keeping it simple also mattered— abstracts that were heavy on familiar words such as “higher,” “increased” and “time” earned a bit more citations than others. Even adding a five-letter word to an abstract reduced citation counts by 0.02%.

According to Mike Thelwall, an information scientist at the University of Wolverhampton, UK, who was not a co-author on the paper:

Despite the lack of clear cause-and-effect evidence, which is almost inevitable in informetrics studies (including my own), the paper has found suggestive, if not conclusive, evidence of an association between language simplicity and citations.

During the study, the authors downloaded bibliometric data of the 30,000 most cited papers in each year between 1999 and 2008 (300,000 in total) from the online database Web of Science. After discarding papers with missing abstracts, titles and other details, they analyzed 216,280 academic papers, counting how many characters, words and sentences each abstract had.

Next, they turned to the Google Ngram database to work out what they call “word frequency” — a measure of how commonly specific words are used in the English language. They then looked to see if these correlated with the papers’ citation counts.

The result was a weak — but statistically significant — correlation that suggested that papers with simpler abstracts garner more citations, said co-author Adrian Letchford, a data scientist at the University of Warwick in Coventry, UK. He told us:

Our analysis suggests that doubling the word frequency of an abstract may increase citations by only 0.7%.

Although there is a correlation between simpler abstracts and higher citation counts, this does not necessarily imply causation, Letchford noted. He added that his choice of using word frequency as a proxy for simplicity originated from a psychological experiment called the “lexical decision task.” Letchford explained:

In the experiment, participants sit in front of a screen which shows them a word. The word is either real, or a jumble of random letters. The participants have to decide as quickly as possible if the word is real or made up. People are able to recognise a real word faster if it is more frequently used. This suggests that frequently used words are perhaps more familiar to our brains, and easier to process and understand.

Of course, the number of times a paper is cited may in part be influenced by journal impact factors, the field of research and the various writing styles in different disciplines. In order to account for this, the authors compared papers within individual journals across the whole time period, and found the opposite to be true: Papers with longer abstracts gathered slightly more citations, across the entire study period. But the trend disappeared when they looked at the relationship between abstract length and citations within individual journals on a year by year basis.

Thelwall told us that a “major limitation” of the study is that it is restricted to only highly cited papers. He noted that other factors such as author nationalities might also play a role in both language complexity and citation counts.

We propose three possible explanations of why papers with shorter abstracts or more frequently used words may gain more citations. High impact journals might restrict the length of their papers’ abstracts and require writing suitable for a wider audience. For example, abstracts in Science are restricted to 125 words. Similarly, papers reporting greater scientific advances might be written with shorter abstracts and contain less technical language. A third potential explanation is that shorter abstracts with more commonly used words may be easier to read and hence attract more citations.

Last August, Letchford also co-authored another paper, which proposed that academic papers with shorter titles amass higher citation counts. To him, the findings he reports in “The advantage of simple paper abstracts” suggest a need to simplify academic literature. He told us:

I feel that the core ideas surrounding a piece of research could be written to reach a wide audience. Writing the technical details in a clear and succinct way without omitting any details is tricky, but I feel that it is possible to do. Ultimately, it’s very important for scientists to make their work understood.

The depiction of this study is inaccurate in several ways. For example, the second paragraph says “the authors found that abstracts were slightly less likely to be cited than those that were half as long… Even adding a five-letter word to an abstract reduced citation counts by 0.02%.” This suggests direct measurements of the attributes of existing abstracts, which those particular findings were not.

This is how the authors describe what the second para is referring to – the result of a statistical model that includes various factors and assumptions, including number of authors:

“We find that the average abstract contains words that occur 4.5 times per million words in the Google Ngram dataset. According to our model, doubling the median word frequency of an average abstract to 9 times per million words will increase the number of times it is cited by approximately 0.74% [sic]. If the average English word is approximately 5 letters long, then removing a word from an abstract increases the number of citations by 0.02% according to our model.”

The doubling and adding of a 5-letter word are not actual, but theoretical.

Which papers were studied is also inaccurate: only articles in the top 1% of cited articles, in journals with more than 10 of such articles in a year, were retained for analysis and developing the models.

In addition, the Ngram database is not able to assess how commonly words are used in the English language. Only how commonly they are used in the Google Books corpus. It is not from a subset of books that are written in plain language. Many common easy-to-understand words are relatively infrequent, and many complex words are very frequent.