You apparently already know what a regex is based on how you've tagged your question. Did you try reading the documentation for the String class? In particular, look for the word 'regex'; there are a few methods, and a bit of thought should tell you how to proceed... :)
–
Karl KnechtelSep 26 '11 at 8:12

2

The phrase "special character" is so overused to be almost completely meaningless. If what you mean is, "I have this list of specific characters I want to remove," then do as Thomas suggests and form your pattern with a regex character class and replaceAll them away. If you have more esoteric requirements, edit the question. :)
–
Ray ToalSep 26 '11 at 8:18

1

those are not special characters... these are: äâêíìéè since they're not your common 1-byte character types like - + ^ are... anyway, as Ray stated, either do a replaceAll for them, or, do a parse on the string, add the chars that are not the chars you want to take out to another string and in the end just do a += to a String you'll be returning.
–
Gonçalo VieiraSep 26 '11 at 9:16

3 Answers
3

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");

If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

A third way could be something like this, if you can exactly define what should be left in your string:

String result = yourString.replaceAll("[^\\w\\s]","");

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

Additional information on Unicode

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

+1 for the best general-purpose solution. Since you are listing a couple variations in the absence of details from the OP, you might as well show and explain patterns like [\P{L}]
–
Ray ToalSep 26 '11 at 8:21

Also note that the - character must be the first or last one in the list or it needs to be escaped.
–
kapepSep 26 '11 at 8:24

[^\\p{L}\\p{Z}] seems to eliminate German Umlauts (ä,ö,ü) as well (at least it does so for me:/), so "The regex matches everything that is not a letter in any language" doesn't seem to be 100% correct
–
PeterMay 1 '13 at 10:19