This also affects the tab-auto-completion. If I start typing ls N and press tab it is correctly expanded to ls Näyttökuva.png. But if I start typing ls Nä tabbing does nothing.

How can I either:

configure bash/terminal so it understands special characters

type the special characters so bash/terminal understands them?

In Terminal's encoding is set to UTF-8 in the Settings–tab and the Encoding–tab is in its default state, ie. UTF-8, Mac OS Roman, ISO Latin 1, ISO Latin 9, Windows Latin 1, ASCII, NextStep + some Asian codings are enabled.

Even stranger (though probably not essential for the question):

If I type ls N, press tab, delete characters from the end until it reads ls Nä and press tab again, the command expands to ls Nättökuva.png [sic].

If I try deleting the letters second time back to ls Nä and press tab it expands to ls Nätökuva.png. Third run expands to ls Näökuva.png.

For some reason, the 4th run gives ls Nä̈kuva.png (notice the umlauts over umlauts). Tabbing the ls Nä̈ gives ls Nä̈kuva.png every time. Nevertheless, it works:

2 Answers
2

I think bash is tripping over some anomalies in how accented characters are handled. You might want to grab some popcorn, because this is going to get technical for a little bit...

Unicode allows some accented characters to be represented in several different ways: as a "code point" representing the accented character, or as a series of code points representing the unaccented version of the character, followed by the accent(s). For example, "ä" could be represented either precomposed as U+00E4 (UTF-8 0xc3a4, Latin small letter 1 with diaeresis) or decomposed as U+0061 U+0308 (UTF-8 0x61cc88, Latin small letter a + combining diaeresis).

OS X's HFS+ filesystem requires that all filenames be stored in the UTF-8 representation of their fully decomposed form. In an HFS+ filename, "ä" MUST be encoded as 0x61cc88, and "ö" MUST be encoded as 0x6fcc88.

I'm pretty sure what's happening here is that when you type "Näyttökuva.png" at the command line, it's "typing" the characters in precomposed form. When the file is created, the filesystem decomposes the characters for storage. Everything is fine so far. But when you try to use tab-completion starting with "Nä", I think bash is failing to decompose the "ä" before searching for matches, and of course it doesn't find any.

To illustrate the difference, here's an example of what encoding is used when I just type "Näyttökuva.png" at the command line, vs. what's used when I store it as a filename and use tab completion to fill it in:

Now, as for the matter of characters getting lost when deleting and re-tab-completing, I suspect that's closely related. Specifically, I think bash is "deleting" one code point per press of the delete key, but erasing one character from the Terminal window per press. Because one of the deleted characters ("ö" this time) consisted of two code points, but only one character, the Terminal display gets out of sync. Try tab-completing the whole filename, deleting it back to "Näytt", then re-tab-completing: bash seems to think that only the combining diaeresis was deleted, not the entire "ö", so it re-adds the combining diaeresis, but it this time it attaches to the "t":

$ echo Näytẗkuva.png
Näyttökuva.png

Note that when I press return, bash actually has the entire filename there; it's just the Terminal display that was confused.

TL;DR bash has some bugs handling decomposable accented characters.

EDIT: after some mulling, I think the only full solution is to fix bash (/wait for its developers to fix it). There might also be a way to input characters in decomposed form, but I have no idea what that would be. But I did find some partial workarounds:

Drag-and-droping a file from the Finder pastes in its correct form. Since the Finder gets the filename from the filesystem, it's already decomposed, so it just works.

You can actually tab-complete the accented character itself. For example, if you type "Na" and then tab, it'll match "Näyttökuva.png" because the canonical decomposition of "ä" starts with "a". But if you have a file named "Narwal.gif" in the same directory, that won't be very helpful...

I haven't tested this, but if you bind tab to menu-complete instead of complete, it should let you tab through possible matches so you can select the one you want even if you can't type the next letter. (Or you could bind it to a different keystroke, so you can use it only when you need to.)

For fixing the problem with the Terminal display getting out of sync, you could bind something to redraw-current-line -- it won't prevent the problem from happening, but it'll give you a way to resynchronize the display.

Thanks, I enjoyed the popcorn. I think you've nailed the cause of the problem: using $ echo -e "N\xC3\xA4*" | ls (the echo gives Nä*) results Näyttökuva.png. The problem exist also with the other shells in Mac OS; and with e.g. zsh ls N gets auto-completed to ls Na<0308>ytto<0308>kuva.png
–
koiyuMar 19 '11 at 19:12

I also tried the autocompletion and ls Nä* in bash in Xubuntu and it worked properly, so it bugs somewhere between keyboard & OS X & Terminal. I also tested that within Bootcamp partition, but the problem persists (ie. it doesn't happen only with HFS+ files).
–
koiyuMar 19 '11 at 19:24

(Now saw your edit concerning workarounds) At least the first two work. The #2 is interesting: autocompleting Na works, but Nay does not (though it is understandable because there actually is ¨ between the a and y. In Xubuntu ls Na* doesn't work (though Nä* works so it's really not an issue). Concerning wildcards — one other workaround could be replacing ä & ö with a? and o? e.g. ls Na?y*. Of course this increases ambiguity, but it might come handy in some cases.
–
koiyuMar 19 '11 at 19:41

2

The reason it works in Xubuntu may just be that the filesystem uses the same form as the terminal interface. If you do ls N* | xxd in Xubuntu, does it give composed or decomposed characters?
–
Gordon DavissonMar 19 '11 at 19:46

Assuming that Xubuntu stores the filename in composed form, try running the command touch $'Na\xcc\x88ytto\xcc\x88kuva.png' and see what happens -- my guess is it'll create a new file with a very very similar name.
–
Gordon DavissonMar 19 '11 at 19:51

However I combined some information from this old guide, and as suggested and instructed here:

I installed a newer bash in my Snow Leopard. After installing it, bash completion works correctly! (Snow Leopard shipped with 3.2.48(1) and MacPorts installed 4.2.45_1). Remember to make the changes in /etc/shells and running chsh.

Also, because of some other instructions, I have in .inputrc:

set meta-flag on
set input-meta on
set output-meta on
set convert-meta off