java.lang
Class Character

The Character class wraps a value of the primitive
type char in an object. An object of type
Character contains a single field whose type is
char.

In addition, this class provides several methods for determining
a character's category (lowercase letter, digit, etc.) and for converting
characters from uppercase to lowercase and vice versa.

Character information is based on the Unicode Standard, version 4.0.

The methods and data of class Character are defined by
the information in the UnicodeData file that is part of the
Unicode Character Database maintained by the Unicode
Consortium. This file specifies various properties including name
and general category for every defined Unicode code point or
character range.

The file and its description are available from the Unicode Consortium at:

The char data type (and therefore the value that a
Character object encapsulates) are based on the
original Unicode specification, which defined characters as
fixed-width 16-bit entities. The Unicode standard has since been
changed to allow for characters whose representation requires more
than 16 bits. The range of legal code points is now
U+0000 to U+10FFFF, known as Unicode scalar value.
(Refer to the
definition of the U+n notation in the Unicode
standard.)

The set of characters from U+0000 to U+FFFF is sometimes
referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater
than U+FFFF are called supplementary characters. The Java
2 platform uses the UTF-16 representation in char
arrays and in the String and StringBuffer
classes. In this representation, supplementary characters are
represented as a pair of char values, the first from
the high-surrogates range, (\uD800-\uDBFF), the
second from the low-surrogates range
(\uDC00-\uDFFF).

A char value, therefore, represents Basic
Multilingual Plane (BMP) code points, including the surrogate
code points, or code units of the UTF-16 encoding. An
int value represents all Unicode code points,
including supplementary code points. The lower (least significant)
21 bits of int are used to represent Unicode code
points and the upper (most significant) 11 bits must be zero.
Unless otherwise specified, the behavior with respect to
supplementary characters and surrogate char values is
as follows:

The methods that only accept a char value cannot support
supplementary characters. They treat char values from the
surrogate ranges as undefined characters. For example,
Character.isLetter('\uD840') returns false, even though
this specific value if followed by any low-surrogate value in a string
would represent a letter.

The methods that accept an int value support all
Unicode characters, including supplementary characters. For
example, Character.isLetter(0x2F81A) returns
true because the code point value represents a letter
(a CJK ideograph).

In the Java SE API documentation, Unicode code point is
used for character values in the range between U+0000 and U+10FFFF,
and Unicode code unit is used for 16-bit
char values that are code units of the UTF-16
encoding. For more information on Unicode terminology, refer to the
Unicode Glossary.

MIN_RADIX

public static final int MIN_RADIX

The minimum radix available for conversion to and from strings.
The constant value of this field is the smallest value permitted
for the radix argument in radix-conversion methods such as the
digit method, the forDigit
method, and the toString method of class
Integer.

MAX_RADIX

public static final int MAX_RADIX

The maximum radix available for conversion to and from strings.
The constant value of this field is the largest value permitted
for the radix argument in radix-conversion methods such as the
digit method, the forDigit
method, and the toString method of class
Integer.

valueOf

Returns a Character instance representing the specified
char value.
If a new Character instance is not required, this method
should generally be used in preference to the constructor
Character(char), as this method is likely to yield
significantly better space and time performance by caching
frequently requested values.

toString

Returns a String object representing the
specified char. The result is a string of length
1 consisting solely of the specified char.

Parameters:

c - the char to be converted

Returns:

the string representation of the specified char

Since:

1.4

isValidCodePoint

public static boolean isValidCodePoint(int codePoint)

Determines whether the specified code point is a valid Unicode
code point value in the range of 0x0000 to
0x10FFFF inclusive. This method is equivalent to
the expression:

codePoint >= 0x0000 && codePoint <= 0x10FFFF

Parameters:

codePoint - the Unicode code point to be tested

Returns:

true if the specified code point value
is a valid code point value;
false otherwise.

Since:

1.5

isSupplementaryCodePoint

public static boolean isSupplementaryCodePoint(int codePoint)

Determines whether the specified character (Unicode code point)
is in the supplementary character range. The method call is
equivalent to the expression:

codePoint >= 0x10000 && codePoint <= 0x10FFFF

Parameters:

codePoint - the character (Unicode code point) to be tested

Returns:

true if the specified character is in the Unicode
supplementary character range; false otherwise.

Since:

1.5

isHighSurrogate

public static boolean isHighSurrogate(char ch)

Determines if the given char value is a
high-surrogate code unit (also known as leading-surrogate
code unit). Such values do not represent characters by
themselves, but are used in the representation of supplementary characters in the
UTF-16 encoding.

This method returns true if and only if

ch >= '\uD800' && ch <= '\uDBFF'

is true.

Parameters:

ch - the char value to be tested.

Returns:

true if the char value
is between '\uD800' and '\uDBFF' inclusive;
false otherwise.

isLowSurrogate

public static boolean isLowSurrogate(char ch)

Determines if the given char value is a
low-surrogate code unit (also known as trailing-surrogate code
unit). Such values do not represent characters by themselves,
but are used in the representation of supplementary characters in the UTF-16 encoding.

This method returns true if and only if

ch >= '\uDC00' && ch <= '\uDFFF'

is true.

Parameters:

ch - the char value to be tested.

Returns:

true if the char value
is between '\uDC00' and '\uDFFF' inclusive;
false otherwise.

isSurrogatePair

Determines whether the specified pair of char
values is a valid surrogate pair. This method is equivalent to
the expression:

isHighSurrogate(high) && isLowSurrogate(low)

Parameters:

high - the high-surrogate code value to be tested

low - the low-surrogate code value to be tested

Returns:

true if the specified high and
low-surrogate code values represent a valid surrogate pair;
false otherwise.

Since:

1.5

charCount

public static int charCount(int codePoint)

Determines the number of char values needed to
represent the specified character (Unicode code point). If the
specified character is equal to or greater than 0x10000, then
the method returns 2. Otherwise, the method returns 1.

This method doesn't validate the specified character to be a
valid Unicode code point. The caller must validate the
character value using isValidCodePoint
if necessary.

codePointAt

Returns the code point at the given index of the
CharSequence. If the char value at
the given index in the CharSequence is in the
high-surrogate range, the following index is less than the
length of the CharSequence, and the
char value at the following index is in the
low-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at the given index is returned.

Parameters:

seq - a sequence of char values (Unicode code
units)

index - the index to the char values (Unicode
code units) in seq to be converted

codePointAt

public static int codePointAt(char[] a,
int index)

Returns the code point at the given index of the
char array. If the char value at
the given index in the char array is in the
high-surrogate range, the following index is less than the
length of the char array, and the
char value at the following index is in the
low-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at the given index is returned.

Parameters:

a - the char array

index - the index to the char values (Unicode
code units) in the char array to be converted

codePointAt

public static int codePointAt(char[] a,
int index,
int limit)

Returns the code point at the given index of the
char array, where only array elements with
index less than limit can be used. If
the char value at the given index in the
char array is in the high-surrogate range, the
following index is less than the limit, and the
char value at the following index is in the
low-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at the given index is returned.

Parameters:

a - the char array

index - the index to the char values (Unicode
code units) in the char array to be converted

limit - the index after the last array element that can be used in the
char array

codePointBefore

Returns the code point preceding the given index of the
CharSequence. If the char value at
(index - 1) in the CharSequence is in
the low-surrogate range, (index - 2) is not
negative, and the char value at (index -
2) in the CharSequence is in the
high-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at (index - 1) is
returned.

codePointBefore

public static int codePointBefore(char[] a,
int index)

Returns the code point preceding the given index of the
char array. If the char value at
(index - 1) in the char array is in
the low-surrogate range, (index - 2) is not
negative, and the char value at (index -
2) in the char array is in the
high-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at (index - 1) is
returned.

codePointBefore

public static int codePointBefore(char[] a,
int index,
int start)

Returns the code point preceding the given index of the
char array, where only array elements with
index greater than or equal to start
can be used. If the char value at (index -
1) in the char array is in the
low-surrogate range, (index - 2) is not less than
start, and the char value at
(index - 2) in the char array is in
the high-surrogate range, then the supplementary code point
corresponding to this surrogate pair is returned. Otherwise,
the char value at (index - 1) is
returned.

IndexOutOfBoundsException - if the index
argument is not greater than the start argument or
is greater than the length of the char array, or
if the start argument is negative or not less than
the length of the char array.

Since:

1.5

toChars

public static int toChars(int codePoint,
char[] dst,
int dstIndex)

Converts the specified character (Unicode code point) to its
UTF-16 representation. If the specified code point is a BMP
(Basic Multilingual Plane or Plane 0) value, the same value is
stored in dst[dstIndex], and 1 is returned. If the
specified code point is a supplementary character, its
surrogate values are stored in dst[dstIndex]
(high-surrogate) and dst[dstIndex+1]
(low-surrogate), and 2 is returned.

Parameters:

codePoint - the character (Unicode code point) to be converted.

dst - an array of char in which the
codePoint's UTF-16 value is stored.

dstIndex - the start index into the dst
array where the converted value is stored.

Returns:

1 if the code point is a BMP code point, 2 if the
code point is a supplementary code point.

IndexOutOfBoundsException - if dstIndex
is negative or not less than dst.length, or if
dst at dstIndex doesn't have enough
array element(s) to store the resulting char
value(s). (If dstIndex is equal to
dst.length-1 and the specified
codePoint is a supplementary character, the
high-surrogate value is not stored in
dst[dstIndex].)

Since:

1.5

toChars

public static char[] toChars(int codePoint)

Converts the specified character (Unicode code point) to its
UTF-16 representation stored in a char array. If
the specified code point is a BMP (Basic Multilingual Plane or
Plane 0) value, the resulting char array has
the same value as codePoint. If the specified code
point is a supplementary code point, the resulting
char array has the corresponding surrogate pair.

codePointCount

Returns the number of Unicode code points in the text range of
the specified char sequence. The text range begins at the
specified beginIndex and extends to the
char at index endIndex - 1. Thus the
length (in chars) of the text range is
endIndex-beginIndex. Unpaired surrogates within
the text range count as one code point each.

IndexOutOfBoundsException - if the
beginIndex is negative, or endIndex
is larger than the length of the given sequence, or
beginIndex is larger than endIndex.

Since:

1.5

codePointCount

public static int codePointCount(char[] a,
int offset,
int count)

Returns the number of Unicode code points in a subarray of the
char array argument. The offset
argument is the index of the first char of the
subarray and the count argument specifies the
length of the subarray in chars. Unpaired
surrogates within the subarray count as one code point each.

offsetByCodePoints

Returns the index within the given char sequence that is offset
from the given index by codePointOffset
code points. Unpaired surrogates within the text range given by
index and codePointOffset count as
one code point each.

IndexOutOfBoundsException - if index
is negative or larger then the length of the char sequence,
or if codePointOffset is positive and the
subsequence starting with index has fewer than
codePointOffset code points, or if
codePointOffset is negative and the subsequence
before index has fewer than the absolute value
of codePointOffset code points.

Since:

1.5

offsetByCodePoints

Returns the index within the given char subarray
that is offset from the given index by
codePointOffset code points. The
start and count arguments specify a
subarray of the char array. Unpaired surrogates
within the text range given by index and
codePointOffset count as one code point each.

IndexOutOfBoundsException - if start or count is negative,
or if start + count is larger than the length of
the given array,
or if index is less than start or
larger then start + count,
or if codePointOffset is positive and the text range
starting with index and ending with start
+ count - 1 has fewer than codePointOffset code
points,
or if codePointOffset is negative and the text range
starting with start and ending with index
- 1 has fewer than the absolute value of
codePointOffset code points.

Since:

1.5

isLowerCase

public static boolean isLowerCase(char ch)

Determines if the specified character is a lowercase character.

A character is lowercase if its general category type, provided
by Character.getType(ch), is
LOWERCASE_LETTER.

isTitleCase

A character is a titlecase character if its general
category type, provided by Character.getType(ch),
is TITLECASE_LETTER.

Some characters look like pairs of Latin letters. For example, there
is an uppercase letter that looks like "LJ" and has a corresponding
lowercase letter that looks like "lj". A third form, which looks like "Lj",
is the appropriate form to use when rendering a word in lowercase
with initial capitals, as for a book title.

These are some of the Unicode characters for which this method returns
true:

isTitleCase

Determines if the specified character (Unicode code point) is a titlecase character.

A character is a titlecase character if its general
category type, provided by getType(codePoint),
is TITLECASE_LETTER.

Some characters look like pairs of Latin letters. For example, there
is an uppercase letter that looks like "LJ" and has a corresponding
lowercase letter that looks like "lj". A third form, which looks like "Lj",
is the appropriate form to use when rendering a word in lowercase
with initial capitals, as for a book title.

These are some of the Unicode characters for which this method returns
true:

toLowerCase

Converts the character argument to lowercase using case
mapping information from the UnicodeData file.

Note that
Character.isLowerCase(Character.toLowerCase(ch))
does not always return true for some ranges of
characters, particularly those that are symbols or ideographs.

In general, String.toLowerCase() should be used to map
characters to lowercase. String case mapping methods
have several benefits over Character case mapping methods.
String case mapping methods can perform locale-sensitive
mappings, context-sensitive mappings, and 1:M character mappings, whereas
the Character case mapping methods cannot.

toLowerCase

Converts the character (Unicode code point) argument to
lowercase using case mapping information from the UnicodeData
file.

Note that
Character.isLowerCase(Character.toLowerCase(codePoint))
does not always return true for some ranges of
characters, particularly those that are symbols or ideographs.

In general, String.toLowerCase() should be used to map
characters to lowercase. String case mapping methods
have several benefits over Character case mapping methods.
String case mapping methods can perform locale-sensitive
mappings, context-sensitive mappings, and 1:M character mappings, whereas
the Character case mapping methods cannot.

Parameters:

codePoint - the character (Unicode code point) to be converted.

Returns:

the lowercase equivalent of the character (Unicode code
point), if any; otherwise, the character itself.

toUpperCase

Converts the character argument to uppercase using case mapping
information from the UnicodeData file.

Note that
Character.isUpperCase(Character.toUpperCase(ch))
does not always return true for some ranges of
characters, particularly those that are symbols or ideographs.

In general, String.toUpperCase() should be used to map
characters to uppercase. String case mapping methods
have several benefits over Character case mapping methods.
String case mapping methods can perform locale-sensitive
mappings, context-sensitive mappings, and 1:M character mappings, whereas
the Character case mapping methods cannot.

toUpperCase

Converts the character (Unicode code point) argument to
uppercase using case mapping information from the UnicodeData
file.

Note that
Character.isUpperCase(Character.toUpperCase(codePoint))
does not always return true for some ranges of
characters, particularly those that are symbols or ideographs.

In general, String.toUpperCase() should be used to map
characters to uppercase. String case mapping methods
have several benefits over Character case mapping methods.
String case mapping methods can perform locale-sensitive
mappings, context-sensitive mappings, and 1:M character mappings, whereas
the Character case mapping methods cannot.

Parameters:

codePoint - the character (Unicode code point) to be converted.

Returns:

the uppercase equivalent of the character, if any;
otherwise, the character itself.

toTitleCase

public static char toTitleCase(char ch)

Converts the character argument to titlecase using case mapping
information from the UnicodeData file. If a character has no
explicit titlecase mapping and is not itself a titlecase char
according to UnicodeData, then the uppercase mapping is
returned as an equivalent titlecase mapping. If the
char argument is already a titlecase
char, the same char value will be
returned.

Note that
Character.isTitleCase(Character.toTitleCase(ch))
does not always return true for some ranges of
characters.

toTitleCase

public static int toTitleCase(int codePoint)

Converts the character (Unicode code point) argument to titlecase using case mapping
information from the UnicodeData file. If a character has no
explicit titlecase mapping and is not itself a titlecase char
according to UnicodeData, then the uppercase mapping is
returned as an equivalent titlecase mapping. If the
character argument is already a titlecase
character, the same character value will be
returned.

Note that
Character.isTitleCase(Character.toTitleCase(codePoint))
does not always return true for some ranges of
characters.

Parameters:

codePoint - the character (Unicode code point) to be converted.

Returns:

the titlecase equivalent of the character, if any;
otherwise, the character itself.

digit

public static int digit(char ch,
int radix)

Returns the numeric value of the character ch in the
specified radix.

If the radix is not in the range MIN_RADIX <=
radix <= MAX_RADIX or if the
value of ch is not a valid digit in the specified
radix, -1 is returned. A character is a valid digit
if at least one of the following is true:

The method isDigit is true of the character
and the Unicode decimal digit value of the character (or its
single-character decomposition) is less than the specified radix.
In this case the decimal digit value is returned.

The character is one of the uppercase Latin letters
'A' through 'Z' and its code is less than
radix + 'A' - 10.
In this case, ch - 'A' + 10
is returned.

The character is one of the lowercase Latin letters
'a' through 'z' and its code is less than
radix + 'a' - 10.
In this case, ch - 'a' + 10
is returned.

digit

public static int digit(int codePoint,
int radix)

Returns the numeric value of the specified character (Unicode
code point) in the specified radix.

If the radix is not in the range MIN_RADIX <=
radix <= MAX_RADIX or if the
character is not a valid digit in the specified
radix, -1 is returned. A character is a valid digit
if at least one of the following is true:

The method isDigit(codePoint) is true of the character
and the Unicode decimal digit value of the character (or its
single-character decomposition) is less than the specified radix.
In this case the decimal digit value is returned.

The character is one of the uppercase Latin letters
'A' through 'Z' and its code is less than
radix + 'A' - 10.
In this case, ch - 'A' + 10
is returned.

The character is one of the lowercase Latin letters
'a' through 'z' and its code is less than
radix + 'a' - 10.
In this case, ch - 'a' + 10
is returned.

Parameters:

codePoint - the character (Unicode code point) to be converted.

radix - the radix.

Returns:

the numeric value represented by the character in the
specified radix.

getNumericValue

Returns the int value that the specified Unicode
character represents. For example, the character
'\u216C' (the roman numeral fifty) will return
an int with a value of 50.

The letters A-Z in their uppercase ('\u0041' through
'\u005A'), lowercase
('\u0061' through '\u007A'), and
full width variant ('\uFF21' through
'\uFF3A' and '\uFF41' through
'\uFF5A') forms have numeric values from 10
through 35. This is independent of the Unicode specification,
which does not assign numeric values to these char
values.

If the character does not have a numeric value, then -1 is returned.
If the character has a numeric value that cannot be represented as a
nonnegative integer (for example, a fractional value), then -2
is returned.

getNumericValue

Returns the int value that the specified
character (Unicode code point) represents. For example, the character
'\u216C' (the Roman numeral fifty) will return
an int with a value of 50.

The letters A-Z in their uppercase ('\u0041' through
'\u005A'), lowercase
('\u0061' through '\u007A'), and
full width variant ('\uFF21' through
'\uFF3A' and '\uFF41' through
'\uFF5A') forms have numeric values from 10
through 35. This is independent of the Unicode specification,
which does not assign numeric values to these char
values.

If the character does not have a numeric value, then -1 is returned.
If the character has a numeric value that cannot be represented as a
nonnegative integer (for example, a fractional value), then -2
is returned.

Parameters:

codePoint - the character (Unicode code point) to be converted.

Returns:

the numeric value of the character, as a nonnegative int
value; -2 if the character has a numeric value that is not a
nonnegative integer; -1 if the character has no numeric value.

isSpaceChar

public static boolean isSpaceChar(char ch)

Determines if the specified character is a Unicode space character.
A character is considered to be a space character if and only if
it is specified to be a space character by the Unicode standard. This
method returns true if the character's general category type is any of
the following:

isSpaceChar

public static boolean isSpaceChar(int codePoint)

Determines if the specified character (Unicode code point) is a
Unicode space character. A character is considered to be a
space character if and only if it is specified to be a space
character by the Unicode standard. This method returns true if
the character's general category type is any of the following:

isISOControl

public static boolean isISOControl(char ch)

Determines if the specified character is an ISO control
character. A character is considered to be an ISO control
character if its code is in the range '\u0000'
through '\u001F' or in the range
'\u007F' through '\u009F'.

isISOControl

public static boolean isISOControl(int codePoint)

Determines if the referenced character (Unicode code point) is an ISO control
character. A character is considered to be an ISO control
character if its code is in the range '\u0000'
through '\u001F' or in the range
'\u007F' through '\u009F'.

forDigit

public static char forDigit(int digit,
int radix)

Determines the character representation for a specific digit in
the specified radix. If the value of radix is not a
valid radix, or the value of digit is not a valid
digit in the specified radix, the null character
('\u0000') is returned.

The radix argument is valid if it is greater than or
equal to MIN_RADIX and less than or equal to
MAX_RADIX. The digit argument is valid if
0 <=digit < radix.

If the digit is less than 10, then
'0' + digit is returned. Otherwise, the value
'a' + digit - 10 is returned.

Parameters:

digit - the number to convert to a character.

radix - the radix.

Returns:

the char representation of the specified digit
in the specified radix.

getDirectionality

public static byte getDirectionality(char ch)

Returns the Unicode directionality property for the given
character. Character directionality is used to calculate the
visual ordering of text. The directionality value of undefined
char values is DIRECTIONALITY_UNDEFINED.

getDirectionality

public static byte getDirectionality(int codePoint)

Returns the Unicode directionality property for the given
character (Unicode code point). Character directionality is
used to calculate the visual ordering of text. The
directionality value of undefined character is DIRECTIONALITY_UNDEFINED.

Parameters:

codePoint - the character (Unicode code point) for which
the directionality property is requested.

isMirrored

public static boolean isMirrored(char ch)

Determines whether the character is mirrored according to the
Unicode specification. Mirrored characters should have their
glyphs horizontally mirrored when displayed in text that is
right-to-left. For example, '\u0028' LEFT
PARENTHESIS is semantically defined to be an opening
parenthesis. This will appear as a "(" in text that is
left-to-right but as a ")" in text that is right-to-left.

true if the char is mirrored, false
if the char is not mirrored or is not defined.

Since:

1.4

isMirrored

public static boolean isMirrored(int codePoint)

Determines whether the specified character (Unicode code point)
is mirrored according to the Unicode specification. Mirrored
characters should have their glyphs horizontally mirrored when
displayed in text that is right-to-left. For example,
'\u0028' LEFT PARENTHESIS is semantically
defined to be an opening parenthesis. This will appear
as a "(" in text that is left-to-right but as a ")" in text
that is right-to-left.

Parameters:

codePoint - the character (Unicode code point) to be tested.

Returns:

true if the character is mirrored, false
if the character is not mirrored or is not defined.

the value 0 if the argument Character
is equal to this Character; a value less than
0 if this Character is numerically less
than the Character argument; and a value greater than
0 if this Character is numerically greater
than the Character argument (unsigned comparison).
Note that this is strictly a numerical comparison; it is not
locale-dependent.

Since:

1.2

reverseBytes

public static char reverseBytes(char ch)

Returns the value obtained by reversing the order of the bytes in the
specified char value.

Returns:

the value obtained by reversing (or, equivalently, swapping)
the bytes in the specified char value.

Submit a bug or featureFor further API reference and developer documentation, see Java SE Developer Documentation. That documentation contains more detailed, developer-targeted descriptions, with conceptual overviews, definitions of terms, workarounds, and working code examples.