11 Answers
11

I have actually considered this quite a bit, being both a linguist who studies these things, and a scholar who publishes papers.

Etymologically speaking, the word data is the plural of datum in Latin. In Latin, data would get plural verb agreement. Now, languages borrow words and do whatever they want with them, so this historical fact about data has no relevance in judging what is "correct" in English. There is significant evidence that data has established itself as a mass noun in English, suggesting that, for most people, "data is" is the most natural way to speak.

However, in a university/scholarly paper, I would recommend using "data are", rather than "data is".

The reason: some stickler professors and pedantic scholars believe that, logically, if datum is an English word for a single piece of data (which it is), that datamust logically be plural. The fact that most people do things differently only means, to them, that most people are doing it wrong. Whether you agree with that or not is somewhat irrelevant.

So you have two choices.

If you use "data is", then reasonable people (yes, I am biased) who read your paper will not bat an eye, but stickler professors might judge you on your perceived ignorance or inappropriate level of informality.

If you use "data are", then the stickler professors will not judge you to be ignorant, and the reasonable people will think "that's an acceptable variant" or "this person is a stickler for language" (or if they are me, will think "this person is pandering to the sticklers — a necessary evil"), but nobody will think you are ignorant.

So, choosing (2), "data are" is clearly your safest bet, and is what I always do (and what I find nearly all of my colleagues do).

+1 A good answer, though I would say it depends what you study and the English ability of said professors. Where I studied, a major international university, many of the professors did not have English as a first language and would be more likely to know and accept the colloquial use. Indeed, most of my professors were not red hot in English, as is common in Computing.
–
OrblingDec 16 '10 at 18:02

@Orbling: I can't say this for certain about every person, but I imagine non-native speakers would accept this use from a native speaker, even if they would not be familiar with it. (At the same time, if they have been reading academic papers for years, they might well be familiar with it anyway.)
–
KosmonautDec 16 '10 at 18:09

4

True, it certainly depends upon the individual person. When I hear "data are" it grates, no matter how many times I have seen it in research papers.
–
OrblingDec 16 '10 at 18:38

I'm a scholar of sorts, and write and say "data are" in academic contexts. Usually: sometimes I forget. That said, I would never correct someone who sad "data is" nor hold it against them. That usage is fixed now, and I suppose the formally correct usage will die in time.
–
dmckeeDec 16 '10 at 20:05

1

One way to "soften" the "grating effect" is to add a couple of words between data and the plural verb. For example, instead of "Our data suggest that..." one could write "The data we collected suggest that..." (Normally, I recommend striking superfluous words, rather than adding them, but this might be one exception – assuming the author felt in a quandary about which to use).
–
J.R.Mar 24 '12 at 9:28

This is intended as a clarification of the "correctness" of using data as a mass noun, for those strict-minded sticklers (there's plenty of them) who might be unconvinced by Kosmonaut's "languages borrow words and do whatever they want with them":

1 - "Datum" and "data (plural)" are historically correct, so "data (mass noun)" must be wrong. How can "data" have a mass noun form as well as a singular and plural?
You'd never say "Oh, I spilled rice on the floor. Wait, it's okay, I only
spilled 4 rices". There's a separate noun phrase for the singular and
plural ("grains of rice").

Consider potato. It has a singular form, meaning one distinct root vegetable, a plural form, meaning multiple distinct root vegetables, and a mass form, meaning an amount of foodstuff made from potatoes. Imagine a dinner table, where each diner has a baked potato on their plate (singular), and everyone is sharing a platter of roast potatoes (plural) and a bowl of mashed potato (mass) (hopefully among other things...). If you ask someone to "pass the potato", they'll understand that you mean the bowl of mass mash, not the tray of plural potatoes or the singular potato on their plate.

2 - There can be such a thing as "a datum" in a way which is not true
for "a water". Imagine someone looking at a database full of data and
saying, "There is so much data in this, I can't see where to start".
Surely this is like standing in a migration of birds and saying "There
is so much bird in the sky, I can't see the sun..."? Since data can be
countable, surely "data" can't be primarily a mass noun?

Data is not necessarily countable. Data in a neat Excel sheet might have countable cells, but what about the data that is lost when photo editors talk about "data loss" when increasing the contrast of a digital photo made of binary machine code data? There's no clear way of defining where one datum starts and the next one stops — would a datum in this context be a bit, a byte, or the data defining one pixel? Such a line would be arbitrary, like looking for units of rice in a processed flat rice cracker. It's an amount measured in units of mass — 67kb of data in a jpg, 2 grams of rice in a rice cracker.

Even seemingly trivial cases aren't so trivial. What's one datum in a modern relational database? One value, one row? What about where there are table joins and foreign keys? Is a structural definition a datum? You can create a convention-specific definition, but it's not a universal definition like one bird.

3 - Following that pattern, shouldn't the mass noun of data be datum
(the singular), like how the mass noun of potatoes is potato?

No. It's rare, but not completely unique, for a count noun to develop from a plural, in cases where the singular over time becomes less and less universally meaningful. "Physics" used to mean the set of countable, defined, distinct natural sciences - until the field developed such that it became clear that the lines between one physic and another wasn't as sharp or universal as previously thought.

You could answer "What's happening at CERN?" with "A lot of physics", but you wouldn't expect the reply "How many?". This is because there's no longer a clear established universal dividing line between one physic and another. Your answer would interpret the question as, "How much?" and would be a measurement of amount: "Enough to occupy 4,000 physicists". In the same way, you could answer "What does this supercomputer store?" with "A lot of data", but the reply "How many?" would incorrectly assume that all data has one clear common countable unit and that there is a clear universal dividing line between one datum and another across all contexts. Even if this data did happen to have a consistent countable convention, replying "7 million data" would be ambiguous unless the asker already knew this convention. A more useful answer would be to interpret it as "How much?" and give an answer in terms of a measurement of amount: "Nearly 220 petabytes".

Use of data as a plural seems pretentious and pedantic, as if to make of show of your knowledge that in Latin, data is a plural form of datum. I have several reasons for being stubborn about data as a mass noun.

Datum is a reference line in a mechanical drawing. More than one of these may called data, if you must show off your knowledge of Latin.

If you can tell me how many data you have, then I will call them data, but as long as you need quantitative units to tell me the size of your dataset, then I will call it a collective singular. My storage holds up to 1 TB of data. There are not 1 trillion data in there, however.

No data point can stand on its own, but rather it derives meaning and significance from its context. What were the conditions of its measurement? What were the other measurements? Etc. It doesn't make semantic sense to refer to a single datum unless it has that specific meaning, as a reference point or baseline. What we mean by data as a plural is semantically different from what we mean by data as a collective singular.

As addressed in the question linked, it depends if you use the uncountable noun, meaning "a collection of data", or the plural form of datum. If it is the former, then the verb would be singular, otherwise it would be plural.

Now I would say, that in most university papers, you would use the uncountable singular form. The exception would be when data would describe an ensemble of measurements or when data is used in philosophy paper. (According to Wiktionary's definition.)

Just because a noun is uncountable doesn't mean that it has to be singular. Consider clergy and gentry. A collection of data is commonly called a data set in scientific writing. In modern usage, I would definitely classify the noun data as uncountable, independent of whether it is singular or plural.
–
Peter Shor May 22 '11 at 1:50

@Peter: For me, the fact that when data is plural, it is the plural form of datum, which seems to be a countable noun when looking at the definition, makes the plural data countable. Note that I never say that all uncountable words were to be singular.
–
EldroßMay 23 '11 at 7:22

I thought you implied that all uncountable words were singular when you said that if data was not the plural of datum, it had to be singular.
–
Peter Shor May 23 '11 at 11:09

Etymologically, data comes from Latin. This is well-known. Unfortunately, in Latin, its plurality was defined by devices that exist in English only in a far lesser capacity: gender and noun case.

In the Latin nominative case, data could be either the neuter plural or the feminine singular of datum. The neuter singular was datum, the masculine singular datus, the feminine plural datae, and the masculine plural dati.

Use of data as a plural in English (the earlier form) comes from a suggestion that we should incorporate the words closest in Latin meaning to how they will be used in English: the neuter singular datum and the neuter plural data.

However, data could also function as the feminine singular in Latin, which I conjecture led to its commonplace use as a mass noun in English.

I enjoy using these words as they were used in Latin: in a survey of male students, I might say "After the dati were collected, each outlying datus was removed." In a survey of female students, I might say "After the datae were collected, each outlying data was removed." In a survey of pineapples, I might say "After the data were collected, each outlying datum was removed."

Most people do not enjoy this. The first two usages are not by any means commonplace (possibly even unattested outside random tangents on the internet), with the third occasionally seen as archaïc but often accepted or even preferred, with data used as plural.

It is more common today, however, to use data as a mass noun; that is, "the data was collected," not "each data was collected." Datum remains typical in the latter case.

The fact is though, more and more "authorities" are using it as a singular.

"The Oxford English Dictionary defines it like this:

In Latin, data is the plural of datum and, historically and in specialized scientific fields , it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified. In modern non-scientific use, however , despite the complaints of traditionalists, it is often not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which cannot normally have a plural and takes a singular verb. Sentences such as data was (as well as data were) collected over a number of years are now widely accepted in standard English."

In contrast to that:

"The official view from the Office for National Statistics takes the traditional approach. The ONS style guide for those writing official statistics says:

The word data is a plural noun so write "data are". Datum is the singular."

It's worth remembering the priceless words from an introduction to the OED: "This book is descriptive, not prescriptive."

Once again, "data" is a great example of a word in transition. A reminder that with questions of spelling and grammar, the concept of what's "right" is a difficult one. All truth is social, and all the more so with language correctness.

Collins has n (functioning as singular), not claiming it as a plural noun, though it gives both singular and plural near-synonyms in its definitions of polysemes:

current events; important or interesting recent happenings

information about such events, as in the mass media

Surely the way the noun is (in this case universally) used rather than its form (or its origin, from a plural noun in Middle English newes, meaning new things) decides correct usage. Tidings is treated differently.

However, data is treated as requiring singular concord by some authorities and plural concord by others - as stated in previous answers. (Amusingly, in this case Collins is slightly more prescriptive than the AHDEL!) I believe that normal non-academic usage strongly advocates singular concord, while different universities still hold different opinions. Because there are no ex-cathedra (in the absolute sense) rulings in these areas, a university must give its own preferences in an in-house style guide (as many do) and be prepared to tolerate opposing usages from other equally entitled institutions. Students should make sure that, in submitted work, they follow the style guide of the university that will ultimately pass or fail them.

It would be good to clarify that you mean "the American Heritage Dictionary of the English Language." "AHDEL" does not match that in top Google searches or Wikipedia disambiguation.
–
Jon of All TradesApr 5 '13 at 16:24

"a bowl of mashed potato"
This is very strange, at least in US useage. In my 67 years I don't think I have ever heard anyone say other than "a bowl of mashed potatoes," and I've lived from North Caroilina to Indiana to New York to Connecticut to Michigan to Louisiana and back to North Carolina by way of Tennessee. A mashed potato is possible, but only if one mashes but one potato.
Data works well as a mass/common noun with a singular verb as long as the data is of a uniform nature (concerning only per capita income). If the data is varied (concerning both per capita income and books purchased per unit of time), then it is not a mass/common noun and would do well to take a plural verb.
It would only seem reasonable to vary datum's declension according to sex if it were used as an adjective. In the examples above, it is a noun. In those examples, only the neuter datum/data, "a thing/the things given," would seem appropriate. In any case, one should note that the Latin 2nd declension masculine plural ending "i" was pronounced much as the English "ee," and the Latin 1st declension feminine plural "ae" was pronounced similar to the English "eye," which rather confuses both the point and any listeners who might care to understand why one would choose to say such things.

I would regard "the data is consistent" and "the data are consistent" as having slightly different meanings. The former carries an implication that some particular (possibly large) set of information was completely examined and found to be consistent. The latter carries an implication that multiple independent pieces of information were examined, but does not particularly imply that any particular identifiable set of information was examined completely. Neither implication is absolute, in but some situations I would suggest the singular form as more appropriate, and in others I would suggest the plural.

Longman Dictionary of Contemporary English says: After data, you can use a singular verb or, in formal or technical English, a plural verb. E.g.,
The data is collected by trained interviewers.
These data are summarized in Table 5.
Do not say 'datas’ or 'a data'.
And based on Oxford: data is used as a plural noun in English while the sigular is datum.