The combination of --ignore-case and UTF-8 is very slow, even when no special treatment is required for UTF-8. There's also a huge regression in speed compared to Ubuntu Hardy's grep-2.5.3 with I-don't-know-what patches. Some timing data:

I understand that the combination of UTF-8 and ignore-case is a tricky situation, and if I'm having tr_TR.UTF-8 locale then sure I want to pay this price for the correct handling of dotless i's.

Most of the time, however, I'm working with en_US.UTF-8 and grepping variable names in source code and such, usually without any accents.

Grep could do the following:

It could look at the pattern, and check if the following conditions are all true:

- no placeholder that could match a variable-length character (e.g. no "." in the pattern) or other weird stuff

- only ASCII characters

- only characters whose old-fashioned ASCII upper/lowercase counterparts are the same as the locale-aware upper/lowercase counterparts, that is, no "i" or "I" in the pattern if the locale is Turkish.

If all these are true, it could use whatever algorithm it's using for 8-bit locales, because it will find the same matches. This would provide a 40-60x speedup for a very common use case: case insensitively finding an English word.