Phonetic Writing Systems

l

s

I

i

o

t

k

m

e

e

l

t

o

t

h

o

h

i

k

i

n

s

s

g

.

Since this page is about Japanese characters, you're going to
need a Japanese font, such as MS Mincho, and a compatible browser
to get much out of it beyond the Romaji section. Recent versions of
Mozilla, Chrome, Opera, and IE all display this page properly (at least
for me) but I can't say anything with certainty about other browsers or
earlier versions.

Before getting into the character sets used in Japanese, note
that Japanese may be written horizontally or vertically. Horizontal
writing is borrowed from the West and, as such, is read in rows, each
row read left to right, starting with the topmost row and moving down
(like this text). Vertical writing, the traditional Japanese form, is read
in columns, each column read top to bottom, starting with the
rightmost column and moving left, as shown in the demonstration
to the right. Occassionally, these columns are only one row deep,
which results in text that reads from right to left (siht ekil), but this is
rare outside of decorative uses.

In any case, Japanese uses four different character sets.
Here are three of them, in order of what is likely to be increasing
foreignness from the perspective of the average Westerner. The
fourth, kanji, is on its own
page.

ローマ字 (ローマじ) Romaji

This one should be nothing new. It's just the Roman alphabet
(the one English uses). It's rarely used in written Japanese,
though it does show up occasionally. Though sometimes it appears
for impact, or because ASCII tends to be less trouble for computers,
the main use seems to be in providing sort of an intermediate level
between Western languages like English and standard Japanese.
This can make it useful for those beginning to learn the language,
though if you're serious about learning Japanese, you're probably
better off avoiding it and jumping straight into kana.

It's also ironic that, despite being the standardized spelling,
"Romaji" is not a correct romanization of ローマ字 under
any system. It should be (spaces optional) "Rouma ji",
"Rooma ji", "Rōma ji", or "Rôma
ji", all of which indicate the long vowel. Numerous place names,
like Tokyo (properly "Toukyou" or the equivalent), and
other words that have assimilated into English, like dojo (properly
"doujou" or the equivalent), suffer from the same
problem. Possibly it's a result of lazy copying dropping the macron
from the ō used for a long o
in the Hepburn romanization.

There are a few things to watch out for when dealing with
romanized Japanese. Just because it looks English doesn't mean
it's pronounced like English (the vowels, at least, are closer to Spanish),
and there are other quirks that vary depending on which of the
several romanization systems you're dealing with. Here are all the
important pronunciation points that I can think of for now:

Syllables

Japanese is based on syllables, though linguists insist that
they're morae, not syllables, because of some obscure difference
between the two terms. Regardless, the point is that each syllable,
or mora if you prefer, is pronounced for (roughly) the same amount
of time when said correctly (at least officially; there are of course
variations in actual usage, such as when someone elongates part
of a word for emphasis).

You can check a kana chart (like those below) to see what
the morae are, but it's usually fairly simple to pick them out if you
know what you're looking for. Each is normally one of the following:

A single vowel (a, i, u, e, or o), like each of the three morae
in あおい (aoi, a - o - i).

A consonant (k, g, s, z, t, d, n, h, p, b, m, y, r,
or w)
followed by a vowel, like those in なまもの
(namamono, na - ma - mo - no).
Depending on the romanization system used, sh, j, ch,
ts, dj, dz, f, or l may also appear as consonant
sounds, but these aren't actually different consonants, just different writings
to more closely approximate what they sound like to Western listeners
when certain consonants and certain vowels occur together. One
example with several of these is ふしまつ, which may be either
of fushimatsu or husimatu
depending on how it's romanized.

A consonant followed by ya,
yu, or yo, such as
きょ (kyo) or にゃ (nya).
When written in kana, these use two
kana, with the second one being smaller than usual and treated
as a modifier of the first rather than being its own sound.

A single n. This one is often the trickiest
to pick out, due to how easy it is to confuse with an
n+vowel mora, but it's clear enough
when the n is at the end of a word or followed
by a consonant. Many romanization systems will follow a lone
n with an apostrophe for
clarity when ambiguous (and some even when not). Others will
double it, though I find that just makes things more confusing when a
lone-n mora is followed by an
n+vowel one, which happens fairly often,
as in words like こんな (konna, ko - n - na).
The real problem is that this n isn't quite
the same sound as the n used with
vowels and shouldn't be written with the same English
character (maybe ŋ
would be more appropriate), but after a few hundred
years of convention, it's a bit late to fix that.

A single consonant before another of the same consonant.
For example, ざっし (zasshi, za - s - shi) or
きっぷ (kippu, ki - p - pu). Depending on
romanization, t before ch
or d before j may also count,
since they're the same consonant in Japanese. A common example is
エッチ (ETCHI, E - T - CHI), also written
ecchi or etti.
As discussed in the pronunciation guide below, the doubled
consonant represents what linguists call a gemination, basically
an extended consonant sound. Though the extra consonant is its
own mora, the sound isn't repeated. (This is the only situation that
I'm aware of where a shrunken kana represents a mora of its own
instead of assimilating into the previous sound).

Vowel Pronunciation

These are typically more like Spanish than English.

a (short a):
Similar to English short 'a' as in "father".

aa (long a):
Same sound as a but lasts longer.

ai :
a + i,
very similar to English long 'i' as in "item".

i (short i):
Similar to English
long 'e' as in "beech", also similar to English
short 'i' as in "ribbon". Additionally, when
combined with a voiceless consonant
(k, s, t, h, p) and followed
by another voiceless consonant or (to a lesser extent)
the end of an utterance, it tends to be weakly pronounced,
so, for example, ashita tends
to sound more like ashta.

Exception: The initial i
in the verbs 行く (iku, to go),
and 言う (iu, to say), is often
pronounced as yu. This appears
to be an oddity of the verbs, which are sometimes written as
ゆく (yuku) and
ゆう (yuu)
in kana instead of いく (iku) and
いう (iu).

ii (long i):
Same sound as i but lasts longer.
Closer to an English long 'e' than the shorter version.

u (short u):
Similar to 'u' in English "user"
or 'oo' in English "boot" but not in
"foot". Additionally, when combined with a
voiceless consonant (k, s, t, h, p)
and followed by another voiceless consonant or nothing,
it tends to be very weakly pronounced, so, for example,
desu tends to sound more like
dess (except when the pronunciation
is exaggerated).

Exception: ou
is (usually) pronounced as a long o,
but see below for an exception to the exception.

uu (long u):
Same sound as u but lasts longer.

e (short e):
Similar to English long a as in "cane",
or English short e as in "elf".

ee (long e):
Same sound as e but lasts longer.

ei : e +
i, virtually the same sound as
ee (even native speakers can't always
tell the difference). Classes and textbooks
have told me that there is no difference, but listening closely to
actual pronunciation, particularly in music, has convinced me that
they are not quite identical. However, some romanization systems
will write e + i as
ee in an attempt to help with pronunciation,
even though it departs from the kana writing and doesn't necessarily
help with pronunciation anyway.

o (short o):
Similar to English long 'o' as in "open".

oo or ou
(long o): Same sound as o
but lasts longer. The difference between oo
and ou may reflect either the kana spelling
or the preference of whoever romanized it, and has absolutely
no effect on pronunciation in modern Japanese.

Exceptions: When the
o and u are parts
of different words (as in kono ue),
or one but not the other is part of a prefix or suffix, each is
pronounced separately. Additionally, ou
at the end of an uninflected (dictionary form) verb,
such as omou,
is pronounced as two distinct vowel sounds, an o
plus an u. My understanding is that this has
something to do with these verbs originally being written with
hu instead of u,
though apparently the h was never
actually pronounced regardless. This may be confusing,
but since using kana doesn't make ou
any less ambiguous, you're going to have to deal with it
regardless.

As you have likely noticed, a long vowel in Japanese (and
in most non-English languages, for that matter) has the same sound
as the short vowel but is held for a longer period of time. English
oddities aside, there's a reason long vowels are called that.

Consonant Pronunciation

k, z, t, d, p, m:
Much like their English equivalents

g: Like the hard English 'g' in
"goat", but not the soft 'g' in "gym"

s: Similar to English 's',
but not hissed and never pronounced as 'z'

sh: Similar to English, though
sha may sound more like 'sya',
and so on for the other vowels. This is really the same
consonant as s, but the subtle difference
from English 's' is more conspicuous when combined with
i or y.

j: Sort of a cross between English
'z' and 'j' (at least that's my impression of it), though
ja is sort of a cross between 'zya'
and 'jya', and so on for the other vowels. As indicated by the
kana (see below), this is the voiced counterpart to
sh, so it also has a similarity to that sound.
This is really the same consonant as z,
but the subtle difference from English 'z' is more prominent
when combined with i or
y.

ch: Similar to English,
though cha is sort of a 'tya' sound,
and so on for the other vowels. This is really the same consonant
as t, but the subtle difference from English
't' is more prominent when combined with
i or y.

ts: More or less like in English
"ants", for example. This is really the same
consonant as t, but the subtle difference
from English 't' is more prominent when combined with
u. If it sounds like an s,
you're saying it wrong. Tsunami
is not "sunami".

dj, dz:
Pronounced essentially the same as j
and z, so it's not uncommon for
romanizations to just write j and
z to begin with. When used, the spelling
difference reflects the kana involved, and has little if any
discernable effect on pronunciation in modern Japanese.
Both dj and dz
are really the same consonant as d,
but the subtle difference from English 'd' is especially
prominent when combined with i,
y, or u.
dj is the voiced counterpart to
ch, and dz
is the voiced counterpart to ts.

n: There are two fairly different
Japanese sounds both romanized as n.
One always occurs as part of a character along with a vowel,
and the other always occurs as its own character. Compare
words like こなす (konasu, ko - na - su)
or あかね (akane, a - ka - ne) with ones
like こんど (kondo, ko - n - do) and
あんぱん (anpan, a - n - pa - n).

The Japanese n
that comes with a vowel sound
(as na, ni, nu, ne, no, nya, nyu, nyo),
is essentially the same as English 'n'.

The Japanese n that occurs
as its own character, sometimes called the syllabic
n, comes directly before a consonant or
at the end of a word, or appears doubled (in some systems of
romanization) or with an apostrophe after it when romanized.
Though still similar to an English 'n', it comes more from the
back of the throat, and sounds somewhat different depending
on the surrounding sounds.

When ending an utterance or followed by a vowel,
it's basically a nasalization of the previous vowel.

When followed by sounds where the lips are mostly
closed (m, p, b), it resembles an
English 'm'.

When followed by k or
g, it resembles English 'ng'
as in "song"

These variations are direct and fairly natural results of the
surrounding sounds, and aren't normally worth worrying about, except
that some systems of romanization will write the character as
m or ng in these
situations (for instance, sempai instead of
senpai for せんぱい). When in isolation,
the syllabic n sounds much like grunting
or humming. In some ways, it behaves more like a vowel than
a consonant, though it isn't one linguistically
(one dialect of the made-up language Hymmnos
even treats 'n' as a vowel, presumably based on this character).

h: Similar to English 'h', but sounds
more like an 'f' in hu/fu
(alternate romanizations of the same character), since it's not quite
the same as English 'h'. However, the particles は (wa)
and へ (e) are sometimes
romanized as ha and he,
respectively, because (presumably for historical reasons)
they're written using those kana.
I don't much like that rendition, since it runs counter to the
pronunciation, and one of the primary purposes of romanization
is to aid pronunciation.

f: Like a cross between English 'h' and
'f' in hu/fu (see h,
above). It's a bit more 'f'-like when followed by a vowel other than
u, which is uncommon and normally occurs
only in borrowed words.

b: Similar to English 'b',
though it may also have a bit of English 'v' to it

y: Always pronounced as a consonant,
as in English "yodel", never as a vowel as in
"baby".

l, r:
This is the really fun one. It's a lot like a cross between
English 'r' and 'l' with a bit of 'd' thrown in for good measure. You
know how you press your tongue to the roof of your mouth behind
your teeth to make an 'l' sound and don't for an 'r', but they aren't
much different otherwise? Try tapping your tongue on the top
of your mouth, maybe a bit further back, for an instant while making
either sound... that's about as well as I can describe how to do it.
I've always thought it sounds more like English 'l' than 'r', but it's most
often romanized as r. How much it sounds
like either 'l' or 'r' also depends both on the surrounding sounds
and on the speaker. I've heard some singers that pronounce it
so much like an English 'l' that I can't tell the difference, while
others make it more 'r'-like.

w: Only two characters in modern
Japanese use this consonant. In the wa
character, it's much like in English. However, in the
wo character, it's barely pronounced,
if at all, making the whole thing sound nearly the same as
just an o. Feel free to ignore the
w sound in wo
entirely when speaking, and you'll be close enough.

Exception: Borrowed words may contain the kana
combination ウォ (u + small
o), which is typically romanized as
wo and should be pronounced with
a distinct w sound (or at least a
u sliding into an o).

Borrowed words may also contain the kana combinations
ウィ (u + small i),
romanized as wi,
and ウェ (u + small e),
romanized as we. As with ウォ,
the w in these combinations is to be
pronounced, otherwise they would have little reason to exist.

Exception: Though now obsolete, there are Japanese
characters for we and wi.
On the rare occasion that these do appear, they are pronounced
as e and i, with little or no
discernable w sound, much like in the standard
wo character. The only example I can think of
offhand is the Touhou Project character Tewi, whose name
is written てゐ in Japanese and is pronounced as
tei, which sounds much like
English "tail" minus the 'l'.

Doubled consonants: In theory, the doubled consonant is held
longer. This works fine for sounds like s
that can be prolonged, but for sounds like k,
the net effect is that the second consonant is
pronounced and the first acts more as a pause, with the preceding
vowel cut off abruptly. This effectively strengthens the consonant
sound. In any case, the consonant is not actually said twice. Linguists
call this gemination, and it occurs in English as well, though typically
only when one word ends with the same sound the next begins with.
An example given on Wikipedia is to compare "night rain"
and "night train" with each other. In the second phrase, the
't' sound is geminated. Though speakers don't normally pronounce
the 't' at the end of "night" and the 't' at the beginning of
"train" as two distinct sounds, the difference between the
two phrases is still clear.

When sung, especially slowly, the proper pronunciation
often doesn't work very well, so this may end up sounding more like
an extended vowel. For example, shikkari
typically is sung more like shi - i - ka - ri than
shi - (pause) - ka - ri. Also note that you'll never
hear shi - k - ka - ri, as there's only a single
k sound.

Miscellany

Katakana is sometimes romanized in all capitals while hiragana
and kanji are usually assigned lowercase. This preserves the
emphasis that katakana usually represents (see the katakana
section for more).

Japanese has a pitch accent rather than a stress accent,
which basically means that, instead of one syllable being pronounced
louder and longer as in English, each mora is said with essentially the
same volume and duration, but with some pronounced at a higher or
lower pitch than others. For instance ima normally
has the i high and the ma
low in the word 今 (now), but the i low and the
ma high in the word 居間 (living room). However,
which morae
have which pitch may vary by region, and it rarely makes much of a
difference anyway (unlike in Chinese, which I've heard places critical
importance on pitch). At worst, fouling up pitches in Japanese will make
your speech sound somewhat awkward and unnatural, much like putting
the "emPHAsis on the wrong sylLAble" in English, and
shouldn't mangle the meaning beyond all recognition.

The best way to learn to pronounce Japanese is to listen to it.
Audio resources are available in numerous places online and may
also be available in libraries. If you happen to know a native
Japanese speaker or an experienced non-native, even better.

Because different people think differently, there are several
different romanization schemes. Several official ones, even. I cover
those differences and my personal preferences in the section on
hiragana.

片仮名 (かたかな) Katakana

This character set is primarily used to write words borrowed
from other languages. The top two languages borrowed from are
English and Portuguese (not counting Chinese, since borrowed
Chinese words are typically assimilated more completely into
Japanese and written in kanji). However, just because you know
an English word that Japanese borrowed doesn't mean you'll be
able to pick it out. Since the sounds don't match exactly, words
usually have to be adapted to fit the kana available—like
ice cream → アイスクリーム (AISU KURIIMU);
try saying it out
loud, keeping in mind that way Romaji is pronounced—and
since there are hardly any redundant sounds in Japanese, homonyms
and near-homonyms from other languages typically end up with
identical kana (like "race" and "lace", both
written レース).

Katakana is additionally used for emphasis, scientific names,
sound effects, and possibly other purposes that I haven't come
across yet or can't think of at the moment, so don't assume that
all words in katakana must automatically be borrowed. It's sort of
like the italics of Japanese.

Here's the standard katakana chart and some extended
characters (actually variations of the standard in most cases),
with my preferred romanization (more on that a bit later). The
kana invented to better accommodate foreign words are relatively
recent and therefore less common, and often not completely
standardized, but I have seen many of them at least occasionally
in actual usage.

Standard chart

アa

イi

ウu

エe

オo

カka

キki

クku

ケke

コko

サsa

シshi

スsu

セse

ソso

タta

チchi

ツtsu

テte

トto

ナna

ニni

ヌnu

ネne

ノno

ハha

ヒhi

フfu

ヘhe

ホho

マma

ミmi

ムmu

メme

モmo

ヤya

ユyu

ヨyo

ラra

リri

ルru

レre

ロro

ワwa

ヰwi

ヱwe

ヲwo

ンn or n'

Other morae

ガga

ギgi

グgu

ゲge

ゴgo

ザza

ジji

ズzu

ゼze

ゾzo

ダda

ヂdji

ヅdzu

デde

ドdo

バba

ビbi

ブbu

ベbe

ボbo

パpa

ピpi

プpu

ペpe

ポpo

ー(long vowel mark)

ッ(gemination mark)

2-charcter morae

キャkya

キュkyu

キョkyo

ギャgya

ギュgyu

ギョgyo

シャsha

シュshu

ショsho

ジャja

ジュju

ジョjo

チャcha

チュchu

チョcho

ヂャdja

ヂュdju

ヂョdjo

ニャnya

ニュnyu

ニョnyo

ヒャhya

ヒュhyu

ヒョhyo

ビャbya

ビュbyu

ビョbyo

ピャpya

ピュpyu

ピョpyo

ミャmya

ミュmyu

ミョmyo

リャrya

リュryu

リョryo

Invented morae

ヴァva

ヴィvi

ヴvu

ヴェve

ヴォvo

クァkwa

グァgwa

クィkwi

グィgwi

クェkwe

グェgwe

クォkwo

グォgwo

キェkye

ギェgye

スィsi

ズィzi

シェshe

ジェje

ツァtsa

ツィtsi

ドゥ,デュdu

トゥ,テュtu

ツェtse

ツォtso

ティti

ディdi

チェche

ニェnye

ファfa

フャfya

フィfi

フュfyu

フェfe

ヒェhye

フォfo

フョfyo

ビェbye

ピェpye

ミェmye

リェrye

ウィwi

ウェwe

ウォwo

Note that the ァ, ィ, ゥ, ェ, ォ, ャ, ュ, and ョ used in
combinations are written the same way as the full-sized
characters ア, イ, ウ, エ, オ, ヤ, ユ, and ヨ, but smaller.
The gemination character ッ is similarly a smaller version
of the character ツ.

Though the character ヶ appears to be a small ケ (and
is typically input to computers as though it were), it's not actually
a kana at all, but shorthand for the kanji 箇 or 个 and usually
pronounced か (ka), が (ga),
or こ (ko). There's also a ヵ character,
which can be used in its place when the pronunciation
is か (ka), though apparently purists don't
like it.

ヲ is hardly ever used except to write the particle
wo in all-katakana text. Borrowed
words typically use ウォ instead.

ー, called the Katakana-Hiragana Prolonged Sound Mark
in Unicode and 長音符 (chouonpu,
literally "long vowel mark") in Japanese,
is the usual way to indicate a long vowel
in katakana. Thus, キー has a long i sound
and is romanized as KII or
KÎ. When Japanese is written vertically,
the ー character becomes a vertical mark. The ー is not the same
as the dash ―. Be aware that a kana vowel may be used instead,
especially for words that are normally written in hiragana or kanji.

ッ, officially called the 促音 (sokuon,
with a literal meaning similar to "urge sound") and often
referred to descriptively as the 小さい「つ」
(chiisai tsu, small 'tsu'), more or less extends
the following consonant sound backward the way ー extends the
preceding vowel sound forward (the technical term for this is
gemination). Many consonants don't extend well, though, so it ends
up being more like a pause much of the time. Additionally, when an
utterance ends with a ッ, there is no consonant to extend.
In these cases, it indicates an abrupt cutoff of the sound before it
(a glottal stop). Finally, anything romanized with a doubled
n will involve ン and not ッ.

When sung, especially slowly, the proper pronunciation often
doesn't work very well, so this may end up sounding more like
an extended vowel. For example, shikkari
typically is sung more like shi - i - ka - ri than
shi - (pause) - ka - ri. Also note that you'll never
hear shi - k - ka - ri, as there's only a single
k sound.

As if there weren't enough nonstandard kana already,
written sound effects and similar cases may make up even more.
ア゛ーーッ！ could be a strangled scream, for instance. I have
no idea how you would romanize that.

Converting from other languages

What makes katakana so interesting and useful even if you
don't know a word of Japanese is that, as explained above, it's most
often used to write words that aren't Japanese in origin. Especially
in recent years, more katakana words are borrowed from English
than from any other language, and video games (just to give an
example) frequently give English, or at least pseudo-English, names
to items, skills, and so on. If you know katakana and understand how
words tend to be adapted, you stand a good chance of being able to
figure out the original word. Here are some of the conventions generally
used to convert English (specifically, though much of this applies to
other languages as well) words to katakana.

English short vowels are often unchanged, in the sense that the
romanization has the same letter for it as the original English.

memo → メモ (MEMO)

opera → オペラ (OPERA)

pajamas → パジャマ (PAJAMA)

Other vowel sounds tend to come out as whatever sounds the
closest to the source word. Notably, English long 'i' approximates
to Japanese a + i.

queen → クイーン (KUIIN)

science → サイエンス (SAIENSU)

blade → ブレイド (BUREIDO)

lightning → ライトニング (RAITONINGU)

More often than not, pronunciation is what matters, not spelling.
However, some words treat the spelling as Romaji and go from
there, which usually distorts the pronunciation significantly. Since
the kana-ization rules change, and are not universally agreed on
to begin with, some words have several katakana versions.

aura → オーラ (OURA) (common, based on pronunciation)
or アウラ (AURA) (uncommon, based on spelling)

The 's' in words that are typically used in the plural is often
dropped (as Japanese generally ignores the concept of plural),
but may be kept instead. Whatever works, I guess.

pajamas → パジャマ (PAJAMA)

shoes → シューズ (SHUUZU)

sports → スポーツ (SUPOUTSU)

As you may have noticed, numerous combinations of English
consonants simply aren't possible in Japanese. Most of the time,
the problem of having too many consonants in one place is solved
by adding the fairly weak vowel u as needed.
't' and 'd' usually become ト (to) and
ド (do) in these cases to avoid
tsu and dzu,
while 'n' usually becomes ン (and sometimes 'm' does too).
ヌ (nu) is rarely used except to represent
certain French names, such as Joan of Arc
(Jeanne d'Arc), written as ジャンヌ・ダルク
(JANNU DARUKU).
The same rules apply when a word ends in a consonant or when
a vowel is silent in English. Note that extra vowels are generally not
added where it can be avoided.

mint → ミント (MINTO)

McDonald's → マクドナルド (MAKU DONARUDO)

instant → インスタント (INSUTANTO)

knife → ナイフ (NAIFU)

computer → コンピューター (CONPYUUTAA)

Sample exceptions:

sport → スポーツ (SUPOUTSU),
not スポート (SUPOUTO)

salad → サラダ (SARADA),
not サラド (SARADO). But I think this
one comes from Portuguese "salada", so it's not
a true exception.

Sometimes, consonants are doubled (geminated) in
Japanese when these extra vowels are added. I'm not sure
exactly how to tell when this will happen, but it seems common
with ending 't' and 'd' sounds (unless they come after ン) and
when the vowel would be too prominent otherwise (I know, that's
entirely too subjective). There might be a more precise rule,
but I doubt it considering that the whole system seems to work
on a "close enough" basis. In any case, here are a few...

apple → アップル (APPURU)

hit → ヒット (HITTO)

L and R sounds normally both become r.

delta → デルタ (DERUTA)

wrist → リスト (RISUTO)

The exception is that vowel+'r' combinations (in "car",
"oar", etc.) are usually treated as vowel sounds.
'ar', 'er', 'ir', and 'ur' sounds usually become a long
a, and 'or' usually becomes a long
o. (It's incidentally a good idea, even in
English, to adjust pronunciation this way when singing.)

car → カー (CAA)

bluebird → ブルーバード (BURUUBAADO)

cork → コーク (COUKU)

Using ヴ for 'v' is a comparatively recent concept,
and somewhat uncommon. Many words with a 'v' sound
just use the b characters instead,
especially if they've been around for a while.

video → ビデオ (BIDEO)

drive → ドライブ (DORAIBU)

Japanese has no 'si' sound, so シ is used for both 'shi'
and 'si'. スィ may be used occassionally but is uncommon.

simple → シンプル (SHINPURU)

cinnamon → シナモン (SHINAMON)

fancy → ファンシー (FANSHII)

shield → シールド (SHIIRUDO)

Japanese has no direct equivalent for either pronunciation of
'th'. The soft 'th' as in "thought" and "bath"
generally becomes s, while the hard 'th'
found in "this" and "that" tends to become
z.

thunderbird → サンダーバード (SANDAABAADO)

rhythm → リズム (RIZUMU)

Words may be abbreviated, especially in popular names,
and particularly when video games or other technology are
involved.

American football → アメフト (AMEFUTO)

upload, update → アップ (APPU)

pocket monster → ポケモン (POKEMON)

Reverting to other languages

Since some tweaking goes on, it's understandable that it can
be difficult to decypher a borrowed word, particularly on unusual
borrows such as those often found in fiction. Here are some
common points of confusion.

Added vowels: Since many words need to add vowels
when borrowed, any given short u
(or o after t or
d) may or may not be from the original.
It helps to check against the possibilities and see what
makes the most sense in context.

Ambiguous consonants: Since 'l' and 'r' both become
r, 's' and soft 'th' both become
s, 'z' and hard 'th' both become
z, 'b' and (usually) 'v' become
b, and 'si' and 'shi' both become
shi, it's unclear which consonant is appropriate
in these cases. Again, it helps to check and see what makes sense.
The translators for Lufia 2 apparently didn't do this (though
I enjoyed the game anyway) and came up with monsters like the
"Iron gorem" (should be "Iron golem") and
"Asashin" (should be "Assassin").

Vowel sounds in general: This can get hideous in translations.
Is that long a supposed to be 'ar', 'er', 'ir', 'ur',
just an extended 'a', or none of them? Is this long
o a long 'o', an 'or', or something
else? When the party encounters monsters called オーク
(OOKU), are they oaks or orcs?
What do you do with vowel sounds that people are
likely to mispronounce no matter how you spell them?
(This is why I like to include "rhymes with" and
"sounds like" sidenotes.)

English words that sound the same but have different
meanings, especially when the spellings are also different,
only make things worse. Should ベア (BEA)
be "bear" or "bare"? Context can help,
but sometimes it isn't enough.

Mix and match for more confusion.
Is ロード (ROODO) "load",
"lode", "lord", "road", or
"rode"?

All this gets even worse when something needs to be written
"in English" but, like many character and place names,
isn't necessarily derived from any specific existing word. Here are
just a few that have been argued about:
Is クレス (KURESU), of Tales of
Phantasia, Cless or Cress?
In FF7, is エアリス (EARISU) Aeris or Aerith?
Was FF4's リディア (RIDIA) intended to be
Lydia instead of Rydia? What are you supposed to do with
クルル (KARURU), from FF5? I've seen Cara,
Krile, and the plain romanization Kururu, and none of them work
particularly well.

Since Japanese rarely uses spaces, one chunk of katakana
may actually be two or more words. As just one example, this
seems to be the cause of an error in the Wild Arms 3
manual that reads "forcibility" where it clearly should say
"force ability" (top of page 32 if anyone's curious), and
this is even though it correctly says "force ability" further
down the page.

As mentioned above, borrowed words are often shortened,
and some have their meanings distorted almost beyond recognition.
While some aren't that hard to figure out, like ファミコム
(FAMIKOMU) being
a fami(ly) com(puter) = video game system, other borrowed words
are counterintuitive from an English point of view. For example,
パンツ (PANTSU) isn't "pants"
like you might expect, it's (usually) actually underpants (though
the British might be able to figure that one out on their own).
ズボン (ZUBON, trousers, from the French
jupon), ジーンズ (JIINZU, jeans),
and トレーニングパンツ (TOREENINGU PANTSU,
sweatpants, from "training pants") are better choices
when talking about pants in Japan. Another confusing example
is that while マンション (MANSHON) looks
like it should mean "mansion", and even comes
from that word, it actually refers to an apartment.

平仮名 (ひらがな) Hiragana

This is the most commonly used phonetic character set in
Japanese writing. Any Japanese word can be written using only
hiragana. Hiragana represent the same sounds as katakana,
but the sounds added to better fit borrowed words don't normally
apply to hiragana, which is not typically used for borrowed words.
It can happen, such as when the word needs special emphasis,
but it's uncommon. So here's the hiragana chart.

Standard chart

Other morae

2-character morae

あa

いi

うu

えe

おo

きゃkya

きゅkyu

きょkyo

かka

きki

くku

けke

こko

がga

ぎgi

ぐgu

げge

ごgo

ぎゃgya

ぎゅgyu

ぎょgyo

さsa

しshi

すsu

せse

そso

ざza

じji

ずzu

ぜze

ぞzo

しゃsha

しゅshu

しょsho

たta

ちchi

つtsu

てte

とto

だda

ぢdji

づdzu

でde

どdo

じゃja

じゅju

じょjo

なna

にni

ぬnu

ねne

のno

ばba

びbi

ぶbu

べbe

ぼbo

ちゃcha

ちゅchu

ちょcho

はha

ひhi

ふfu

へhe

ほho

ぱpa

ぴpi

ぷpu

ぺpe

ぽpo

ぢゃdja

ぢゅdju

ぢょdjo

まma

みmi

むmu

めme

もmo

にゃnya

にゅnyu

にょnyo

やya

ゆyu

よyo

ひゃhya

ひゅhyu

ひょhyo

らra

りri

るru

れre

ろro

びゃbya

びゅbyu

びょbyo

わwa

ゐwi

ゑwe

をwo

ぴゃpya

ぴゅpyu

ぴょpyo

んn or n'

みゃmya

みゅmyu

みょmyo

っ(gemination mark)

りゃrya

りゅryu

りょryo

As in katakana, the small characters ゃ, ゅ, ょ, and っ
are written just like the larger equivalents, except for their size.
The small characters ぁ, ぃ, ぅ, ぇ, and ぉ also exist, and
are also written just like the larger equivalents, but are
far less common than their katakana counterparts.

The ー is occasionally used to indicate long vowels in
hiragana, but long vowels are normally indicated, unsurprisingly,
by adding another of the vowel that is to be lengthened. The
exception is that a long o is usually written
by adding う, though some words use お because of the kanji
involved. Also, an e followed by an
i is very nearly the same as a long
e, but not
quite identical.

The characters ゐ (wi) and
ゑ (we) have gone obsolete and
almost never appear in modern Japanese.

Voiced, Unvoiced, and Semi-Voiced

Those funny little marks:

By now you've probably noticed that many of the basic
kana have other kana that look the same except for a few little
marks in the corner. There's a reason for that. The consonants
k, s, t, and h are what
linguists call "unvoiced" or
"voiceless" consonants, which means that they are
pronounced without the use of the vocal chords. Adding the
mark ゛, called the 濁点 (dakuten,
"voiced mark") or informally the てんてん
(ten ten, "dot dot"),
to kana with these consonants produces the
equivalent "voiced" consonants
g, z, d, and b.
As you may have guessed, voiced consonants are those that
require use of the vocal chords to pronounce. Additionally,
kana with the h consonant may also
take the mark ゜, called the 半濁点 (handakuten,
"half-voiced mark") or informally the
まる (maru, "circle"),
to produce the p,
a "semivoiced" consonant.

There are also several uses of the
dakuten that don't quite
fit the normal usage. The katakana ウ (u)
may appear with a dakuten as ヴ to represent a 'vu' sound,
though the b consonant is used for 'v' just
as often. In addition, kana that cannot normally have a
dakuten
may be written with one when indicating abnormal or distorted
noises similar to the base kana. For instance, あ゛ seems
to be fairly popular for rendering strangled shouts, though I'm
not sure how you'd romanize it.

It seems that linguists also use the
handakuten on
k kana to represent an 'ng' sound,
but I've never seen it personally. Anyway, 'ngu' would look
like く゜, for example.

Sorting

The basics:

The usual ordering is called 五十音順 (gojuu on jun,
"50-sound order") after the kana table (which originally
contained 50 sounds rather than the modern 45), or あいうえお順
(a i u e o jun, "a i u e o order")
after the first row of kana, much as English alphabetical order
is also called ABC order.

Plain hiragana follow the order of the standard kana chart:
あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわゐゑを.
This much is fully standardized. ん doesn't exactly fit into the
standard chart, but typically comes after を.

The kana は (ha) and
へ (he) are considered the same
for sorting purposes regardless of whether they're used as
particles and pronounced as wa and
e (respectively) or used as parts of
words and pronounced ha and
he.

Except for tiebreaking purposes, all variants of a kana are
treated as the same character. Specifically, a hiragana character
and the equivalent katakana character are considered the same,
unvoiced (は) and voiced (ば) and semivoiced (ぱ) kana are
considered the same, and normal-sized (つ) and reduced-sized
(っ) kana are considered the same. This is somewhat similar to
upper-case and lower-case English letters being considered the
same except for tiebreaking purposes, if more complicated.

The ヴ character invented to handle 'v' sounds in foreign
words is typically handled as a "voiced" ウ, if only
because that's what it looks like. Some instead treat ヴァ as a
variant of バ (ba), etc.,
but while this has the advantage of placing
very similar sounds together, it breaks with the usual method
of handling each individual kana separately.

As in English, [end of term] comes before any character.
In other words, shorter terms come before longer ones that
start out the same, and 'same' in this case means the same
base kana, ignoring any variants. To give concrete
examples, くろ (kuro) comes before
ぐろう (gurou) or
クロウ (KUROU), each of which
come before クロウチ (KUROUCHI).
This is much like in English sorting, where "an"
comes before "ant", which comes
before "antihero".

Kanji have no
effect on ordering, in the sense that the kanji themselves do
not matter, except when the kanji themselves are being sorted,
rather than terms. Kanji terms are sorted by their reading,
the way they would appear if written in kana.

Tiebreakers and other tricky stuff:

As noted previously, hiragana and katakana, unvoiced,
voiced, and semivoiced kana, and full-sized and small kana are
all considered equivalent when not directly competing, and the
ー complicates things further. So what happens if two items are
identical except for one of these equivalent characters? This is
where the tiebreaking comes into play. Unfortunately, the system
for doing so appears to be somewhat less than universal.

I'm not sure how kanji vs. kana figures into this... presumably
words written in kana come before those in kanji as part of the
tendency to place basic unmodifed hiragana before anything
else.

Large (normal) kana may come either before or after their
shrunken equivalents, as long as the sorting is consistent within
the dictionary/index/whatever. I get the impression that large
before small is considered more correct, but since computerized
character encodings put the small kana before their large
equivalents, machine-sorted lists put small before large, and
indifference takes over. Personally, I think it makes sense to
sort the large kana first, in keeping with the tendency to place
basic unmodifed hiragana before anything else.

びよういん (biyouin) before
びょういん (byouin)

きやく (kiyaku) before
きゃく (kyaku)

かつて (katsute) before
かって (katte)

Most handle the ー symbol for indicating long vowels as
equivalent to the extended vowel, but others consider it equivalent
to no character and effectively drop it when sorting, like the English
hyphen. Rarely, it will instead be handled as a completely different
character and sorted after ん (syllabic n),
which I consider to be very poor
handling since it puts words that are phonetically identical far apart
in sort order. Even machine sorting usually knows better.

As if all that weren't a big enough mess already, there's
the question to do if the rules you're using conflict. For example,
if unvoiced comes before voiced and hiragana comes before
katakana, which comes first,
が (ga, hiragana, but voiced) or
カ (KA unvoiced, but katakana)?
Again, there don't seem to be any standardized rules here.
Fortunately, this sort of conflict is relatively uncommon,
especially in indices and informal lists that aren't likely
to spell out their rules. Dictionaries will typically
describe what conventions they use.

While I'm no dictionary, I do think it makes sense to define
an ordering system, even if I never need to use the full details of it.
The examples given in the following steps are invented for
convenience and unlikely to correspond to actual words.

Sort first by the base kana, putting shorter terms before longer
terms that begin with the same base kana. Regard each kana as
an individual unit, regardless of whether or not it's part of a compound
sound (きゃ (kya), ヴィ (VI),
etc.). For now, regard all variants as the same
kana, ignoring voicing, size, and character set. For now, also regard
the long vowel marker ー as identical to the preceding vowel sound,
including e and o,
even though those could be romanized as
ei and ou.

かあき ⇒ カーキク ⇒ かーきくけ ⇒ カアキクケコ

ちゃつ ⇒ ちやつて ⇒ ちゃってと ⇒ ちやってとた

はひ ⇒ ばひふ ⇒ はぴぶへ ⇒ ぱひふへほ

If any two (or more) terms are regarded as identical so far but
are not written identically, then within these terms, sort unvoiced
before voiced and voiced before semi-voiced. If more than one
mismatch occurs, all earlier mismatches count as larger differences
than all later ones. Regard ヴ (VU)
as a voiced ウ (U).

さしす ⇒ さしず ⇒ さじす ⇒ ざしす ⇒ ざしず

かきく ⇒ カキグ ⇒ がきく ⇒ ガキグ ⇒ ガギグ

ちゃふ ⇒ ちやぶ ⇒ ちゃぷ ⇒ ぢゃぶ ⇒ ぢやぷ

If any two (or more) terms that are not written identically are still
regarded as identical, then within these terms, sort normal-sized
kana before small ones. If more than one mismatch occurs, all
earlier mismatches count as larger differences than all later ones.

キヤフオテイ ⇒ キヤフオティ ⇒ キヤフォテイ ⇒ キャフオティ ⇒ キャフォティ

きやつえ ⇒ キヤツェ ⇒ きゃつえ ⇒ キャツェ

If any two (or more) terms that are not written identically are still
regarded as identical, then within these terms, sort hiragana before
katakana and both before kanji (the long vowel marker counts as
whatever the preceeding vowel is). If more than one mismatch
occurs, all earlier mismatches count as larger differences than all
later ones.

あいうえお ⇒ あいうエお ⇒ あいウえオ ⇒ あイうえお ⇒ アイウえお ⇒ アイウエオ

えーのー ⇒ ええのオ ⇒ えーノー ⇒ えエのー ⇒ エエノオ

If any two (or more) terms that are not written identically are still
regarded as identical, then within these terms, sort actual kana
before the long vowel marker. If more than one mismatch occurs,
all earlier mismatches count as larger differences than all later ones.

パアトナア ⇒ パアトナー ⇒ パートナア ⇒ パートナー

If any two (or more) terms that are not written identically are still
regarded as identical, then within these terms, I give up and sort
them at random. This could occur when they have identical kana,
but different kanji. While there are several kanji-sorting schemes,
I'm not familiar enough with any to attempt to use them. Of more
immediate concern to me is that several items in my topic index link
to more than one topic due to multiple usages, but these all have
brief supplemental notes in English that I use as
tiebreakers.

いろは order:

An alternate order exists but is rarely used for sorting.
Actually a poem known as the いろは (Iroha)
after its first three kana, it is remarkable primarily for using each
of the 47 kana in use at the time exactly once. The poem is
traditionally divided into lines as follows, though this results
in breaking up several words:

いろはにほへと
ちりぬるをわか
よたれそつねな
らむうゐのおく
やまけふこえて
あさきゆめみし
ゑひもせす

Though this order is uncommon for sorting, the kana
sometimes appear in this order as labels for an ordered list,
for example.

Romanization Conventions

There are at least three different major romanization schemes
in use, and that's not counting all the variants from people (like me)
who don't care much what's official. Here's a quick guide to certain
variants that I'm aware of and which ones I normally use.

Occasionally I'll come across something outlandish
that's not listed here... and that's when winging it comes into
play.

None of this matters when a term has an official
romanization. 東京 is "Tokyo" even though it should
be Toukyou, ローマ字 is "romaji"
instead of ROUMA ji, etc.

All others use the renderings given on the kana charts
above. The only exceptions are that I typically romanize the
particles は and へ as wa and
e, respectively, since that's
how they're pronounced, regardless of the kana. Some insist on
using ha and he
due to the kana, and while that arguably has
some merit, it confuses the pronunciation rather than indicating
it.

As I see it, my combination of choices has the advantage
of approximating the English sounds while assigning a different
romanization to every common mora, with the exception of
を/ヲ and ウォ, which doesn't matter much because ウォ is only
used for borrowed words, while を/ヲ is virtually never used for
borrowed words.

What I mean by n being ambiguous
at times is with such kana
as に, んい, and んに. They all clearly need an
i and an n or two,
but all three are different and even have different pronunciations.
If you make ん always n,
then they're ni, ni,
and nni, which
ignores the difference between に and んい. On the other hand,
if it's always n', you get ni,
n'i, and n'ni,
which, for んに, is redundant and funny-looking, not to mention
that it leaves a lot of words with an apostrophe on the end.
I prefer ni, n'i,
and nni for these reasons. Similarly,
I prefer to romanize にゃ, んや, and んにゃ as
nya, n'ya,
and nnya. This is probably my
biggest gripe with the Microsoft Japanese IME—if I type
"s o n n a", I expect to see そんな, not the そんあ
that it actually gives me. The stupid thing converts
"n n" to ん instantly and automatically without any
regard to context, when I expect it to have the sense to interpret
"n n a" as ん (n) +
な (na). If I wanted
んあ (n'a), I'd type "n ' a".

It might make more sense to write the
r row with ls,
considering that I've always thought the consonant sounds
more like an l anyway. The
r writing is so prevalent, though, that
it's essentially uncontestable. Kind of like how モーグリ is a lot closer to
"moagly", but "moogle" is too widely
known to bother arguing about.

My preference of OU for
O + ー is purely because I hate
seeing OO for words that use it.
This partly stems from seeing some people romanize
o + う as oo,
which goes entirely against the kana. ありがとう
(arigatou) will never be
arigatoo to me.

I also can't agree with writing を (wo)
as just o. It's not necessarily (depending
partially on dialect) the same sound as お (o),
even if it is very close.