news and discussions mainly related to Chinese characters and romanization

Main menu

Yearly Archives: 2010

Post navigation

In December Taiwan will be getting a new city. In fact, it will be the most populous city in the entire country: X?nb?i Shì (???).

For those not familiar with the situation, I should perhaps give a bit of background. Taiwan won’t suddenly have more people or buildings. Instead, the area known as Taipei County (which does not include the city of Taipei but which occupies a much greater area than Taipei and has a much greater total population) will be getting a long-overdue official upgrade to a “special municipality,” which means that it will get a lot more money and civil servants per capita from the central government. And as such the area will be dubbed a city, even though in appearance and demographic patterns it isn’t really a city at all but still a county containing several cities (which are to become “districts” despite having hundreds of thousands more inhabitants than some other places labeled “cities”), lots of towns, and plenty of empty countryside.

The Mandarin name will change from Táib?i Xiàn to X?nb?i Shì. (X?n is the Mandarin word for “new.” Xiàn is “county.” Shì is “city.” And b?i is “north.”)The official so-called English name is, tentatively, “Xinbei City.” Hanyu Pinyin! Yea!

Talking about “English” names is often misleading, since many people conflate English and romanization of Mandarin; and the usual pattern of Taiwanese place names not written in Chinese characters tends to be MANDARIN PROPER NAME + ENGLISH CATEGORY (e.g., “Taoyuan County”). So, at least in this post, I’m going to be a bit sloppy about what I’m calling “English.” Forgive me. OK, now back to the subject.

A couple of days ago, however, both major candidates for the powerful position of running the area currently known as Taipei County (Táib?i Xiàn) had a rare bit of agreement: both expressed a preference for using “New Taipei City” instead of “Xinbei City.” Ugh.

And to top things off, a couple dozen pro-Tongyong Pinyin protesters were outside Taipei County Hall the same day to protest against using Xinbei because it contains what they characterize as China’s demon letter X. Actually, that last part of hyperbole isn’t all that much of an exaggeration of their position. The X makes it look like the city is being crossed out, some of the protesters claimed.

This is, of course, stupid. But unfortunately it’s the sort of stupidity that sometimes plays well here, given how this is a country that pandered to the superstitious by removing 4’s from license plate numbers and ID cards and by changing the name of a subway line because if you cherry-picked from its syllables you could come up with a nickname that might remind people of a term for cheating in mah-jongg (májiàng). (Why bother with letting competent engineers do things the way they need to be done when problems can be fixed magically through attempts to eliminate puns!)

The protesters would prefer the Tongyong form, Sinbei. I suspect foreigners here would rapidly change that to the English name “Sin City,” which I must admit would have a certain ring to it and might even be a tourist draw. Still, Tongyong has already done enough damage. Those wanting to promote Taiwan’s identity would be much better off channeling their energy into projects that might actually be useful to their cause.

The reason the government selected “Xinbei City” is that “New Taipei City” would be too similar to “Taipei City,” according to the head of the Taipei County Government’s Department of Civil Affairs. And, yes, they would be too similar. Also, Xinbei is simply the correct form in Hanyu Pinyin, which is Taiwan’s (and Taipei County’s) official romanization system. It would also be be much better still to omit “city” altogether.

Consider how this might work on signs, keeping in mind that Taipei and X?nb?i Shì are right next to each other. So such similar names as “New Taipei City” and “Taipei City” would run the risk of confusion, unlike, say, the case of New Jersey and Jersey. I wonder if the candidates for mayor of Xinbei are under the impression that they should change the name of the town across from Danshui from B?l? to something else because visitors to Taiwan might otherwise think they could drive to the Indonesian island of Bali from northern Taiwan.

They probably said they liked “New Taipei City” better because it sounds “more English” to them. And it is more English than “Xinbei.” But that’s not a good thing.

Once again it may be necessary to point out what ought to be obvious: The reason so-called English place names are needed is not because foreigners need places to have names in the English language. If it were, I suppose we could redub many places with appropriate names in real English: “Ugly Dump Filled With Concrete Buildings” (with numbers appended so the many possibilities could be distinguished from each other), “Nuclear Waste Depository,” “Armpit of Taiwan,” “Beautiful Little Town that Turns Into a Tourist Hell on Weekends,” etc. The possibilities are endless, though perhaps some of the nicer places would need to be given awful names — following the Iceland/Greenland model — lest they be overrun. The problem is that Chinese characters are too damn hard, and people who can’t read them (i.e., most foreign residents and tourists) need to be able to find places on maps, on Web pages, through signs, etc. And they need to be able to communicate through speech with people in Taiwan about places. Having two different names — the Mandarin one and the so-called English one — is just confusing. Having one name in Mandarin written in two systems (Chinese characters and romanization), however, makes sense and works best. (If Taiwan were to switch to using Taiwanese instead of Mandarin, that would be a whole ‘nother kettle of fish.)

But things that make sense and politicians don’t often fit well together.

Consider the signs. What a @#$% mess this could be. Let’s compare a few ramifications of using Xinbei and Taipei vs. using New Taipei City and Taipei City.

definitely no need to add “city” to either name, because there would be no “Taipei County” that might need to be distinguished from the city of Taipei, nor would there be a “Xinbei County” that would need to be distinguished from the city of Xinbei

(By the way, if any Taiwan reporters want to pick up on this blog post, please don’t just follow the usual practice here of simply asking one or two random foreigners if they think the name “New Taipei City” sounds OK, so then you conclude that there’s no problem. Try to get people who’ve actually thought about the situation for more than a few seconds and who could give you an informed opinion. My apologies to those reporters who of course know better.)

Between November, 23, 2009, when Singapore first began registering .sg names in Chinese characters, and June 10, 2010, when registrations of Chinese-character .sg domain names opened to all without any additional fee, only 1,024 such names were registered, or just 0.88 percent of all .sg domain names. This apparently includes not just second-level domains (e.g., ??.sg) but also third-level domains (e.g., ??.com.sg).

The percentage will likely rise in the coming months, as the process has only recently opened to everyone on a first-come, first-served basis. But, still, demand for such names in Singapore has so far been underwhelming.

A bit more information:

Registrations were accepted in phases, with registrations for government organizations starting on Nov. 23, 2009. Beginning in January, SGNIC began accepting domain name registrations from trademark holders.

During the third phase, the general public was allowed to register domain names starting on March 25, but applicants were charged a “priority fee” of S$100 (US$72) for each domain name, with domain names sought by several applicants awarded to the highest bidder.

In all three phases, applicants could apply for a domain name made up of Chinese numbers or a name with just one Chinese character for a fee of S$500 [US$360]….

The fourth and final phase began on June 10, with SGNIC accepting domain name applications on a first-come, first-served basis. The S$100 priority fee is no longer required, but applicants are no longer allowed to register domain names using Chinese numbers or names with just one Chinese character….

When IDA announced the introduction of Chinese-language domain names last year, SGNIC said the effort was partly intended to help Singaporean businesses target the Chinese market.

The magnificent Grand dictionnaire Ricci de la langue chinoise, better known as le Grand Ricci, has just been released on DVD, almost a decade after its release in book form and exactly four hundred years after the death of Matteo Ricci.

For a sample of the dictionary’s format and entries, see the 25 pages of entries for shan. Alas, as this example shows, the entries are not word parsed. But at least Hanyu Pinyin is now available for those who prefer it to Wade-Giles.

As long as I’m mentioning Ricci-related work, I might as well use the occasion to note that the Taipei Ricci Institute is putting its collection of books on permanent loan to Taiwan’s National Central Library.

Also, I’d like to note that parts of Matteo Ricci’s original dictionary can now be viewed through the Google Books scan of a publication from earlier this century of his Dicionário Português-Chinês.

In Taiwan, the new movie Date Night has been given the Mandarin title Yu?huì o mài gà (?????/?????).

Yu?huì is simply the word for “date.” The interesting part is “o mài gà” (???), which is a Mandarinized form of the English “oh my god.” (I wonder if this, being written in Hanzi despite still being basically English, would pass China’s new need for supposed purity.)

Most people here — especially those younger than about 40 — would simply write “oh my god” (or, less frequently, “o my god”) in English in the middle of an otherwise Mandarin text. (I’ll spare everyone the chart of Google searches; but it backs this up.) But brevity is standard in movie titles here, and “???” is a lot more compact on a movie poster than “oh my god.” This, however, raises the question of why “???” instead of the equally concise “OMG”. I don’t know the answer to that. But the path of lettered words in Mandarin is certainly not without twists and turns.

Like most other uses of Hanzified English, the results are not entirely faithful to the original sounds.

Mandarin’s ou would be a closer phonetic fit than o for the English “oh”.
There’s ?u (?/?), a surname. But most of the time this Chinese character is pronounced q? (being one of those many Chinese characters with multiple pronunciations), so that certainly wouldn’t work well. There’s ?u, which has a more clearly phonetic Hanzi (?/?), but which has to do with vomit (?utù/??/??). Another possible choice would be ?u (?/?); but that is associated mainly with Europe and doesn’t get used much as a phonetic component in non-Europe-related loan words outside the word for ohm: ?um? (??/??).

Mài (the Mandarin word for wheat), unlike most other Mandarin morphemes pronounced mai (various tones), gets used phonetically in lots of various loan words, such as Màid?ngláo (McDonald’s/???/???), Màiji? (Mecca/??/??), D?nmài (Denmark/??/??), and K?màilóng (Cameroon/???/???). So its use is to be expected, though semantically there’s no link. And mài is certainly a better fit for the English my than it is for the Mc of McDonald’s, the Mec of Mecca, the mark of Denmark, or the me of Cameroon.

For ga there’s not a lot of choice. ? is often seen in the phonetic loan g?lí (curry). The biggest problem here is that the same ? is also used as k? in a different, common phonetic loan: k?f?i (coffee). There’s ?; but, like ?, it’s not exactly a well-known character.

As for Pinyin, I suppose the orthography could get interesting: o mài gà, o màigà, omài gà, or omàigà. But a Pinyin orthography would probably simply encourage people to write this in the original: oh my god.

BTW, you may wish to try the following experiment. The gà in o mài gà is most often seen in writing the word g?ngà (??/??), which means awkward/embarrassed. Ask native speakers of Mandarin to write g?ngà in Hanzi for you by hand without using a dictionary, a computer, or any other form of assistance. I bet that most people — even those with university degrees — won’t be able to write this common, ordinary word correctly.

And for lagniappe, the character ? is also sometimes seen in written Taiwanese as the equivalent of Mandarin’s ji? (?/add). I spotted an example of this just the other day on a cafe sign (in the sense of “buy something and ga something else for a special price”) but didn’t have a camera with me.

With any luck, this will be the last post for some time in my none too exciting but hopefully useful series on technical aspects of creating Pinyin subtitles.

Some people like to have Pinyin subtitles and Hanzi subtitles appear at the same time. Although I think that’s generally a bad idea (too much text to get through quickly that way, people would benefit from becoming accustomed to reading Pinyin texts as Pinyin texts, etc.), I’ll go ahead and offer instructions on how to make Pinyin subtitles appear above Chinese character subtitles.

These directions are for Microsoft Word, though other programs could be used instead.

Using Word, open copies of the two subtitle files you’d like to combine.

To get the alignment between the two files to match when they’re combined, it’s important that each subtitle entry is only one line long. You can check for possible instances of multi-line subtitles with a wildcard search (CTRL+H –> More –> Use wildcards).

Find what (with “Use wildcards” checked):([!0-9])^13([!0-9^13])

If that search finds any multi-line subtitles, you’ll need to temporarily adjust those lines in both subtitle files, as follows:

Find what (with “Use wildcards” checked):([!0-9])^13([!0-9^13])

Replace with:\1|\2

Again, be sure to run that search-and-replace in both subtitle files. You’ll replace the “|” with a RETURN later.

Next, in the file with the Chinese characters (not the Pinyin file) strip out everything except for the text of the subtitles, leaving just the Hanzi text. (I wrote about this earlier in How to strip subtitle files down to text. The method is also useful for removing such information if you want to create the text of the screenplay.)

Find what (with “Use wildcards” checked):^13[0-9:\,\-\> ]{1,}^13

Replace with:^p

Note: You may need to run the above “replace all” twice for Word to catch everything.

You should have something that looks like this (with paragraph marks shown):

1¶
?! ????¶
¶
????¶
¶
??¶
¶
??¶
¶
????????¶

Now add extra lines, so the lines with Chinese characters will fit into the new document in the correct places.

Find what (with “Use wildcards” checked):^13^13

Replace with:^p^p^p^p^p

Delete the very first line — the one with the “1” in it. Then add three blank lines above this.

You should have something that looks like this (with paragraph marks shown):

¶
¶
¶
?! ????¶
¶
¶
¶
¶
????¶
¶
¶
¶
¶
??¶

Select all (CTRL+A). Then convert this to a table:
Table –> Convert –> Text to Table

Now switch to the Pinyin subtitles file.

First, add the extra lines blank lines into which you will later insert the Chinese characters that correspond with the Pinyin.

Subtitle files are wonderful things. But for those times when you want to just read the text by itself and not bother with the movie (for example, if you want to prepare a script), they can look a little cluttered — what with all of that extra timing information.

1
00:00:49,000 –> 00:00:51,500
Yo! Li ye lai la

2
00:00:52,200 –> 00:00:53,600
Li ye lai la

3
00:01:06,900 –> 00:01:08,400
Xiulian

The directions below for how to remove all of the extra numbers, etc., refer to Microsoft Word, since most people already have that tool.

To strip out everything except for the text of the subtitles, run the following wildcard search (CTRL+H –> More –> Use wildcards).

Find what:^13[0-9:\,\-\> ]{1,}^13

Replace with:^p

Replace all.

Note: You may need to run the above “replace all” twice. Also, unless you add an extra return at the top of the document you’ll need to clean up the first entry by hand.

The above search-and-replace will yield

Yo! Li ye lai la

Li ye lai la

Xiulian

If, however, you want to at least temporarily keep the basic timing information (such as to help you identify scene boundaries more quickly), you can do so as follows.