This works exactly as described in the manual. However, I want to
match words that contain characters beyond a-z, e.g.
prästgården. Matching the regular expression \v(\w+) against
prästgården yields to three matches, instead:

prästgården
^^ ^^^ ^^^^

How to match words containing characters beyond a-z? My locale is set to English and if possible I'd like to keep it that way.

POSIX character classes (e.g. [[:alpha:]]\+ in this case) are supposed to do what you want here, but according to the Vim docs (:help regex) it doesn't: "These items only work for 8-bit characters." It does happen to work here with Vim 7.3 on OS X 10.8, but Vim 7.3 on Linux doesn't work, so I assume there's something Apple-specific about this Vim that allows it. You'll also find that doing it through the Vim Perl binding also fails, even though Perl has very good Unicode support. You might need to switch to an external Perl script, so you can turn on full Unicode support.
– Warren YoungJan 7 '13 at 2:32

By the way, if you do go with Perl, you want to use \p{Word} instead of a POSIX character class. There are a lot of exception cases in Perl's POSIX character class handling, which you avoid when you use Unicode properties instead.
– Warren YoungJan 7 '13 at 2:34

3 Answers
3

Vim (as of version 7.3) is very limited with respect to support of non-ASCII characters in patterns. In particular, \w only matches ASCII letters, which is of limited usefulness.

There are a few character class patterns that do support Unicode. Of interest to you are \I, which by and large matches letters and only letters, plus _ and @. At least on Debian squeeze (in a UTF-8 locale), there are errors; for example × and ÷ are matched as letters, but all Latin accented letters seem to be recognied correctly. \I can be configured through the isident option, at least for the ASCII part.