A Wall Street Journal Bulletin

Vol. 26, No. 3: Big Data

Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, we observed last June as we thereby joined the majority.

Now we encounter big data, defined as a collection of data sets so vast as to require more sophisticated data-processing applications.

As big data is a collection of collections, in effect, singular referents usually are appropriate, as with our recent headline on a column, Why ‘Big Data’ Is a Big Deal. The quotation marks usually aren’t needed as the term gains traction, and it shouldn’t be capitalized in the body of an article, as we have rendered it on occasion.

Articles in a recent Journal section on big data jumped wildly between data is and data are – whether big or little.

Use of the singular or plural data (with appropriate modifiers such as much or many) often becomes an arbitrary decision, but consistency within an article is a big deal for our credibility.

More about less and fewer

The late columnist William Safire was remembered, among other reasons, for his having persuaded the Safeway grocery chain to change its express checkout signs to “10 items or fewer” from “10 items or less.”

But some authorities argue almost convincingly that “10 items or less” is defensible, or even preferred. Under this reasoning, the amount (10 items) is perceived as a metaphorical cutoff line or benchmark that is notionally singular, not plural. So the sign in the grocery store doesn’t necessarily mean “10 items, or fewer items than that,” they argue. It may just as logically mean “10 items, or less than this benchmark number.” So the reasoning goes.

Because linguists are far from unanimous on the subject, however, it’s better to try to avoid “less” with plurals like “items” – a construction that some readers consider illiterate. Fewer, of course, generally applies to plurals or indivisible numbers of things – integers to techies.

Take this headline: “At Dynegy, Boss’s Drive for a ‘Winning’ Culture Means More Cubicles and Less Emails.” Less Emailing would pass muster, but with the plural emails, fewer is still much preferred.

Exceptions and subtleties abound, however. Consider:

Truckers are driving fewer miles, allowing operators to squeeze more years out of vehicles. Because mileage is a continuum and fractional miles are commonly referred to, less miles would be advocated by purists. But S&S contributor David E. Gold sees fewer miles as defensible in such constructions, as well as in fewer dollars going to Washington, for example.

What about quantities of calories? Fewer is generally used, as calories are almost universally represented as discrete units – although, in the physical world, it is difficult to discern a single calorie.

In an effort to have less trouble with the general rules and fewer errors, we provide some Journal passages cited by David Gold:

● Dish lost a net 19,000 subscribers in that quarter – less than the 111,000 subscribers lost in the previous year’s quarter. There are no fractional subscribers, so fewer applies.

● Leo Bunnin…sells less than 100 Impalas a year. Unless he also sells fractional Impalas, he sells fewer of them.

● Showtime and Starz all have less than 30 million domestic subscribers for their flagship channels.” A twofer: Make it both have fewer or each has fewer.

● The company plan also must be set up to allow Roth contributions; currently less than half [of these companies] do so. Make it fewer than half of them.

Style rulings and reminders

The new pope, the first one to use the name Francis, isn’t using the Roman numeral Iafter his name, just as Queen Elizabeth I had no numeral until Queen Elizabeth II took the throne. By contrast, Pope John Paul I did choose to use the Roman numeral during his short, 33-day reign in 1978.

● Under our rules for possessives, Francis doesn’t take the extra s after the apostrophe because the second syllable begins with an s sound and ends with an unaccented s sound. So it is Pope Francis’ reign, as with Kansas’ corn, Massachusetts’ capital and Texas’ cowboys.

● The pope’s vehicle is known as the popemobile – one word.

● The ritual of Mass is capitalized, as are the Seder and other specific rituals.

● Air Canada (now without a corporate designation) is no longer a unit of ACE Aviation.

● Latam Airlines Group SA resulted from the merger of LAN Airlines and TAM Airlines. Based in Chile, it has units operating in much of South America.

● Superrich is one word.

● Newsmaking is one word (like newsmaker).

● Ink is to be generally avoided as a verb.

● Shut-eye is still hyphenated, as in Webster’s New World dictionary.

● Chops, a slang noun for technical skill (especially of a jazz or rock musician), is surfacing too often. Deal-making chops and foreign-policy chops are recent examples of vogue usage.

● In doubting Thomas, the d is lowercase.

● Cyprus has been deleted from the stylebook’s Middle East or Mideast entry, as the island nation, much in the news lately, is a member of the European Union and the euro zone (for now, at least).

● The Libyan Investment Authority can be shortened to LIA on later references, but, as with all uncommon initialisms, it is better avoided in favor of the authority, for example.

● Icann is the abbreviated form for the Internet Corporation for Assigned Names and Numbers. But use it sparingly and close to the full name.

● Anti words, including antiabortion, generally remain unhyphenated. But colleague Bill Power notes an exception. While antimoney might well be unhyphenated on its own, we should use the hyphen with anti-money-laundering, considering that the opposition is to laundering and not necessarily money.

By way of review, students, the adjectival long-standing takes the hyphen. Longtime doesn’t. Pay attention so you can pass the quiz later on.

Heads above the rest

● “Guten Tag Y’All: Nashville Takes Its Music Offshore,” by Dennis Berman and Lisa Vickery, about a German promoter booking country-music tours abroad.