This question is not about italicisation or how to construct plurals. I wonder what are general guidelines for writing foreign words based on a Latin alphabet in English text. I know that, for languages written in completely different script systems, there exist more or less standard per-language “romanization” procedures (such as writing Japanse in rōmaji). My question is about words from languages with Latin script, where some glyphs are not found commonly in English. Examples of such characters include:

accents of all kinds (é, ô, ñ, etc.)

ligatures (æ, œ, ĳ, Å, ç, ķ, etc.)

characters not found at all: thorn (Þ), eth (ð), german ß,

How should these words be written in English text? Should they be copied entirely, normalized in some way (e.g., ß → ss, é → e), or transliterated so that they can be read as they should be spelt?

My personal preference is to borrow them as they are, but I would like to know what style guides recommand and what is common usage in press.

Added: data from some research is inconsistent:

New Oxford American Dictionary has piñata, but Anschluss (vs. ß) and oeuvre (vs. œ).

The Guardian uses the two variants of both Anschluß and œuvre

I don’t know other languages well enough that I know what to look for :)

2nd addition: I’m starting a bounty on this, because I style haven’t found any reference to actual style guides, or research from data in open-access corpuses/corpora (which I don’t know how to do myself).

Just to state the obvious: the best thing when writing English is to use an English word. That's not possible in cases where there's a specific technical or historical meaning - e.g. Anschluß/Anschluss - or for concepts from non-English-speaking cultures - such as the piñata. But it is good practice, when using a foreign word in this way, to consider whether there is such a justification, or whether instead there exists an English word that can express the right concept with less risk of confusing the reader. (The over-use of foreign words seems particularly prevalent in academic writing...)
–
psmearsJan 19 '11 at 15:10

@ShreevatsaR : OK ! That's why I restricted myself to a comment and not a full answer ;-). To make a proper study, I should roughly evaluate the OCR reliability by visually looking "by hand" through google books extract.
–
Frédéric GrosshansJan 22 '11 at 16:27

8 Answers
8

The Times (not to be confused with the New York Times) style guide says:

foreign words Write in roman when foreign words and phrases have become essentially a part of the English language (eg, elite, debacle, fête, de rigueur, soirée); likewise, now use roman rather than italic, but retain accents, in a bon mot, a bête noire, the raison d'être. Avoid pretension by using an English phrase wherever one will serve. See accents

and

accents Give French, Spanish, Portuguese, German, Italian, Irish and Ancient Greek words their proper accents and diacritical marks; omit in other languages unless you are sure of them. Accents should be used in headlines and on capital letters. With Anglicised words, no need for accents in foreign words that have taken English nationality (hotel, depot, debacle, elite, regime etc), but keep the accent when it makes a crucial difference to pronunciation or understanding - café, communiqué, détente, émigré, façade, fête, fiancée, mêlée, métier, pâté, protégé, raison d'être; also note vis-à-vis. See foreign words, Spanish

The ideal would be to preserve the word as it appears in its native language, but it is something that English-speakers are very lazy about. Anything that looks orthographically odd (i.e. has glyphs that aren't part of everyday English) stands out and is usually thought of as pretentious, so there is a strong social pressure to normalize. Ligatures in particular tend to unlink into their component letters (think of ß as a ligature in this respect), and missing characters translate in various slightly inconsistent ways.

If a word becomes common currency in English, it gets normalized over time. In particular, the accents fall off: writing "café" is considered a bit affected these days, and "rôle" has pretty much died out, for example.

I think the pressure in typed work is one of inconvenience to the typist, rather than social pressure. English typists tend not to know the shortcuts, even then they are often poorly supported. Thus it takes ages to enter in a foreign word correctly with non-standard glyphs. People are "lazy" about this, in the sense that they do not like to have to type at 1wpm. Personally, I always try to show it properly - but that can be a pain in some applications.
–
OrblingJan 22 '11 at 23:34

The Chicago Manual of Style (16th edition), says that ligatures should be decomposed in Latin and transliterated Greek, as well as in words borrowed into the English lexicon. However, æ and œ should be used for Old English and French words respectively, when respectively in an Old English or French context.

There’s a whole chapter on foreign languages, but in general I’d preserve accents and “strange” letters when including words from foreign languages.

But it depends a lot on context as well. What kind of text are you writing, who is your audience? In an academic context the answer is usually pretty straightforward (just see how books and papers in relevant fields do it), but if you’re writing for a wider audience, simplifying may be prudent.

The accents that are more likely to be kept in are the ones we're more familiar with. British kids are taught French so French words are almost never changed. Latin-based alphabets that the country is not familiar with, such as Scandinavian languages are more likely to be changed, either to french style or none.

I am no expert in this matter, but as a native speaker of other languages beside English, I would like to contribute the following.

I have the impression that the OP is using the label "accents of all kinds" for things that fall in, at least, two very different categories: some are true accents, such as the one in "á", and some others are not, like the one on "ñ". The "á" in Spanish is still an "a" to all effects, but an accented one. However, an "ñ" is not an accented "n" but a totally different letter altogether. They are not exchangeable.

“there are not exchangeable”: are á and a freely exchangeable? they are not either
–
F'xJan 22 '11 at 8:35

2

What I mean is that "á" and "a" are the same letter in the alphabet. "á" is just a modified "a". But "ñ" and "n" are two separate letters in the alphabet; "ñ" is not a modified "n", very much like "q" is not an "o" with an extra protruding twig.
–
CesarGonJan 22 '11 at 11:48

4

@CesarGon: In Spanish, n and ñ are distinct letters. In English, which has never treated accented letters as distinct, ñ is just an n with a tilde (~). The two languages have different concepts of what defines a letter, so it's not fair to make absolute statements like that.
–
Jon PurdyJan 23 '11 at 0:24

@John: I made no absolute statement; if you read my post carefully, you will notice that explicitly say "in Spanish". I am not trying to make statements about other languages here. Also, I acknowledge that different languages have different concepts of what defines a letter, but since this question is, precisely, about how foreign words should be incorporated into English, I think that it makes sense to take into account the alphabet of the source language. If only the English rules were to matter, this question would be very easy to solve!
–
CesarGonJan 23 '11 at 12:16

2

@CesarGon: Alright, I was a bit harsh. But the point is that an English speaker who has never seen ñ may be confused, but they'll still pronounce it as though it were just n. To them there is no difference. It's necessary to account for an Anglocentric worldview to discuss how foreign words should be adapted, because the words are subject to interpretation by Anglocentrists. And my logic is fine: from a Spanish viewpoint, ñ is a letter that English lacks, and from an English viewpoint, ñ is n with an accent. Neither is more correct. Also, I don't get notifications if you type "John".
–
Jon PurdyJan 23 '11 at 22:05

If I look at my dictionary (NOAD), I find some evidence of the contrary: Anschluß is turned into Anschluss, and oeuvre is used instead of œuvre. There is some inconsistency, though, because it has “piñata”.
–
F'xJan 19 '11 at 12:43

1

I think ligatures probably are the exception actually, and are usually written as two separate letters, but accents are usually kept intact. I have seen ß plenty of times in print; I would say most reasonably educated people know what it means.
–
user3444Jan 19 '11 at 13:46

@ElendilTheTall: I would say most people who have studied German know what it is, and a scattered number of well read people. The majority of the populous do not.
–
OrblingJan 22 '11 at 23:36

@Orbling: not sure where you're from but most people here in Britain have studied German at least a little, in seconday school, so I guess it varies worldwide.
–
user3444Jan 23 '11 at 11:14

@ElendilTheTall: Not so, I am from London. Schools teach either French, German or Spanish to all pupils from age 11 up, only one language; the top stream students usually get to take on another at age 12-13. This is the norm in all state schools that I know in the south. So the odds of a given student learning German is below 50%. Also in recent years many schools have been offering French and Spanish (or even something more outlandish like Japanese) as the alternate.
–
OrblingJan 23 '11 at 11:37

I've often seen phrases or words in foreign languages italicized in text to signal to the reader that the word is of foreign origin, and may therefore be later explained to the reader at the author's discretion. The author can include as many foreign characters as they feel is appropriate.

Be very careful about this, however. For example, if the reader has no familiarity with the language in question and the author decides not to explain the text in question, the reader may start to feel excluded or left out of the story.

One extreme example would be the teachers in Charlie Brown cartoons (not text, I know). Usually, the Charlie Brown characters will partially restate the trombone "wah-wo-whas" in their reply, so as to keep the intended audience included in the implied other half of the conversation.

The decision to transliterate or not depends on the original language. Russian, Sanskrit, and Ancient Greek are Indo-European languages that are usually transliterated. However, commentaries on original texts often do not transliterate. Some writers use both original and transliterated texts (Jacob Klein's A Commentary On Plato's Meno).