Working with Strings

Strings are one of the most common types of objects in Java. Throughout this book are various techniques for working with strings. You've seen how to create string variables, how to concatenate strings, and how to compare strings. But so far, I've only scratched the surface of what you can do with strings. In this chapter, I dive deeper into what Java can do with string. (Hint: Way more than Cat's Cradle.)

I start with a brief review of what I covered so far about strings, so you don't have to go flipping back through the book to find basic information. Then I look at the String class itself and some of the methods it provides for working with strings. Finally, you examine two almost identical classes named StringBuilder and StringBuffer that offer features not found in the basic String class.

Reviewing Strings

To save you the hassle of flipping back through this book, the following paragraphs summarize what is presented in earlier chapters about strings:

Strings are reference types, not value types, such as int or boolean.

As a result, a string variable holds a reference to an object created from the String class, not the value of the string itself.

Even though strings aren't primitive types, the Java compiler has some features designed to let you work with strings almost as if they were. For example, Java lets you assign string literals to string variables, like this:

String line1 = "Oh what a beautiful morning!";

Strings can include escape sequences that consist of a slash followed by another character. The most common escape sequences are for new line and for tab. If you want to include a slash in a string, you must use the escape sequence \.

Strings and characters are different. String literals are marked by quotation marks; character literals are marked by apostrophes. Thus “a” is a string literal that happens to be one character long. In contrast, ‘a’ is a character literal.

You can combine, or concatenate, strings by using the + operator, like this:

String line2 = line1 + "
Oh what a beautiful day!";

You can also use the += operator with strings, like this:

line2 += = "
I've got a beautiful feeling";

When used in a concatenation expression, Java automatically converts primitive types to strings. Thus Java allows the following:

int empCount = 50;
String msg = "Number of employees: " + empCount;

The various primitive wrapper classes (such as integer and double) have parse methods that can convert string values to numeric types. Here's an example:

String s = "50";
int i = Integer.parseInt(s);

You can't compare strings using the equality operator (==). Instead, you should use the equals method. Here's an example:

if (lastName.equals("Lowe"))
System.out.println("This is me!");

The String class also has an equalsIgnoreCase method that compares strings without considering case. Here's an example:

if (lastName.equalsIgnoreCase("lowe"))
System.out.println("This is me again!");

DESIGN PATTERN

The Immutable pattern

Many applications can benefit from classes that describe immutable objects. An immutable object is an object that, once created, can never be changed. The String class is the most commonly known example of an immutable object. After you create a String object, you can't change it.

As an example, suppose you're designing a game where the playing surface has fixed obstacles, such as trees. You can create the Tree class using the Immutable pattern. The constructor for the Tree class could accept parameters that define the size, type, and location of the tree. But once you create the tree, you can't move it.

Follow these three simple rules when creating an immutable object:

Provide one or more constructors that accept parameters to set the initial state of the object.

Do not allow any methods to modify any instance variables in the object. Set instance variables with constructors, and then leave them alone.

Any method that modifies the object should do so by creating a new object with the modified values. This method then returns the new object as its return value.

Using the String Class

The String class is the class used to create string objects. It has a whole gaggle of methods that are designed to let you find out information about the string that's represented by the String class. Table 1-1 lists the most useful of these methods.

Table 1-1: String Class Methods Open table as spreadsheet

Method

Description

char charAt(int)

Returns the character at the specified position in the string.

int compareTo(String)

Compares this string to another string, using alphabetical order. Returns −1 if this string comes before the other string, 0 if the strings are the same, and 1 if this string comes after the other string.

int compareToIgnoreCase(String)

Similar to compareTo but ignores case.

boolean contains (CharSequence)

Returns true if this string contains the parameter value. The parameter can be a String, StringBuilder, or StringBuffer.

boolean endsWith(String)

Returns true if this string ends with the parameter string.

boolean equals(String)

Returns true if this string has the same value as the parameter string.

boolean equalsIgnoreCase (String)

Similar to equals but ignores case.

int indexOf(char)

Returns the index of the first occurrence of the char parameter in this string. Returns −1 if the character isn't in the string.

int indexOf(String)

Returns the index of the first occurrence of the String parameter in this string. Returns −1 if the string isn't in this string.

int indexOf(String, int start)

Similar to indexOf, but starts the search at the specified position in the string.

int lastIndexOf(char)

Returns the index of the last occurrence of the char parameter in this string. Returns −1 if the character isn't in the string.

int lastIndexOf(String)

Returns the index of the last occurrence of the String parameter in this string. Returns −1 if the string isn't in this string.

int lastIndexOf(String, int)

Similar to lastIndexOf, but starts the search at the specified position in the string.

int length()

Returns the length of this string.

String replace(char, char)

Returns a new string that's based on the original string, but with every occurrence of the first parameter replaced by the second parameter.

String replaceAll (String old, String new)

Returns a new string that's based on the original string, but with every occurrence of the first string replaced by the second parameter. Note that the first parameter can be a regular expression.

String replaceFirst (String old, String new)

Returns a new string that's based on the original string, but with the first occurrence of the first string replaced by the second parameter. Note that the first parameter can be a regular expression.

String[] split(String)

Splits the string into an array of strings, using the string parameter as a pattern to determine where to split the strings.

boolean startsWith (String)

Returns true if this string starts with the parameter string.

boolean startsWith (String, int)

Returns true if this string contains the parameter string at the position indicated by the int parameter.

String substring(int)

Extracts a substring from this string, beginning at the position indicated by the int parameter and continuing to the end of the string.

String substring(int, int)

Extracts a substring from this string, beginning at the position indicated by the first parameter and ending at the position one character before the value of the second parameter.

char[] toCharArray()

Converts the string to an array of individual characters.

String toLowerCase()

Converts the string to lowercase.

String toString()

Returns the string as a String. (Pretty pointless if you ask me, but all classes must have a toString method.)

String toUpperCase()

Converts the string to uppercase.

String trim()

Returns a copy of the string but with all leading and trailing white space removed.

String valueOf (primitiveType

Returns a string representation of any primitive) type.

REMEMBER

The most important thing to remember about the String class is that in spite of the fact that it has a bazillion methods, none of those methods lets you alter the string in any way. That's because a String object is immutable, which means it can't be changed.

Although you can't change a string after you create it, you can use methods of the String class to create new strings that are variations of the original string. The following sections describe some of the more interesting things you can do with these methods.

Finding the length of a string

One of the most basic string operations is determining the length of a string. You do that with the length method. For example:

String s = "A wonderful day for a neighbor.";
int len = s.length();

Here len is assigned a value of 30 because the string s consists of 30 characters.

Getting the length of a string isn't usually very useful by itself. But the length method often plays an important role in other string manipulations, as you see throughout the following sections.

Making simple string modifications

Several of the methods of the String class return modified versions of the original string. For example, toLowerCase converts a string to all-lowercase letters:

String s = "Umpa Lumpa";
s = s.toLowerCase();

Here s is set to the string umpa lumpa. The toUpperCase method works the same, but converts strings to all-uppercase letters.

The trim method removes white space characters (spaces, tabs, newlines, and so on) from the start and end of a word. Here's an example:

String s = " Umpa Lumpa ";
s = s.trim();

Here the spaces before and after Umpa Lumpa are removed. Thus the resulting string is ten characters long.

Bear in mind that because strings are immutable, these methods don't actually change the String object. Instead, they create a new String with the modified value. A common mistake-especially for programmers who are new to Java but experienced with other languages-is to forget to assign the return value from one of these methods. For example, the following statement has no effect on s:

s.trim();

Here the trim method trims the string-but then the program discards the result. The remedy is to assign the result of this expression back to s, like this:

s = s.trim();

Extracting characters from a string

You can use the charAt method to extract a character from a specific position in a string. When you do, keep in mind that the index number for the first character in a string is 0, not 1. Also, you should check the length of the string before extracting a character. If you specify an index value that's beyond the end of the string, the exception StringIndexOutOfBoundsException is thrown. (Fortunately, this is an unchecked exception, so you don't have to enclose the charAt method in a try/catch statement.)

Here's an example of a program that uses the charAt method to count the number of vowels in a string entered by the user:

Here the for loop checks the length of the string to make sure the index variable i doesn't exceed the string length. Then, each character is extracted and checked with an if statement to see if it is a vowel. The condition expression in this if statement is a little complicated because it must check for five different vowels, both upper-and lowercase.

Extracting substrings from a string

The substring method lets you extract a portion of a string. This method has two forms. The first accepts a single integer parameter. It returns the substring that starts at the position indicated by this parameter and extends to the rest of the string. (Remember that string positions start with 0, not 1.) Here's an example:

String s = "Baseball";
String b = s.substring(4); // "ball"

Here b is assigned the string ball.

The second version of the substring method accepts two parameters to indicate the start and end of the substring you want to extract. Note that the substring actually ends at the character that's immediately before the position indicated by the second parameter. So to extract the characters at positions 2 through 5, specify 1 as the start position and 6 as the ending position. For example:

String s = "Baseball";
String b = s.substring(2, 6); // "seba"

Here b is assigned the string seba.

The following program uses substrings to replace all the vowels in a string entered by the user with asterisks:

This program uses a for loop and the charAt method to extract each character from the string. Then, if the character is a vowel, a string named front is created that consists of all the characters that appear before the vowel. A second string named back is then created with all the characters that appear after the vowel. Finally, the s string is replaced with a new string that's constructed from the front string, an asterisk, and the back string.

Here's some sample console output from this program so you can see how it works:

Where have all the vowels gone?
Wh*r* h*v* *ll th* v*w*ls g*n*?

Splitting up a string

The split command is especially useful for splitting a string into separate strings based on a delimiter character. For example, suppose you have a string with the parts of an address separated by colons, like this:

1500 N. Third Street:Fresno:CA:93722

With the split method, you can easily separate this string into four strings. In the process, the colons are discarded.

TECHNICAL STAUFF

Unfortunately, the use of the split method requires that you use an array, and arrays are covered in the next chapter. I'm going to plow ahead with this section anyway on the chance that you already know a few basic things about arrays. (If not, you can always come back to this section after you read the next chapter.)

The split method carves a string into an array of strings separated by the delimiter character passed via a string parameter. Here's a routine that splits an address into separate strings, and then prints out all the strings:

If you run this code, the following lines are displayed on the console:

1500 N. Third Street
Fresno
CA
93722

The string passed to the split method is actually a special type of string used for pattern recognition, called a regular expression. You discover regular expressions in Book V. For now, here are a few regular expressions that might be useful when you use the split method:

Open table as spreadsheet

Regular Expression

Explanation

\t

A tab character

\n

A newline character

\|

A vertical bar

\s

Any white space character

\s+

One or more occurrences of any white space character

The last regular expression in this table, \s+, is especially useful for breaking a string into separate words. For example, the following program accepts a string from the user, breaks it into separate words, and then displays the words on separate lines:

Here's a sample of the console output for a typical execution of this program:

Enter a string: This string has several words
This
string
has
several
words

Notice that some of the words in the string entered by the user are preceded by more than one space character. The \s+ pattern used by the split method treats any consecutive white space character as a single delimiter when splitting the words.

Replacing parts of a string

You can use the replaceFirst or replaceAll method to replace a part of a string that matches a pattern you supply with some other text. For example, here's the main method of a program that gets a line of text from the user, and then replaces all occurrences of the string cat with dog:

Enter a string: I love cats. Cats are the best.
I love dogs. Dogs are the best.

As with the split methods, the first parameter of replace methods can be a regular expression that provides a complex matching string. (For more information, see Book V.)

TECHNICAL STAUFF

Once again, don't forget that strings are immutable. As a result, the replace methods don't actually modify the String object itself. Instead, they return a new String object with the modified value.

Using the StringBuilder and StringBuffer Classes

The String class is powerful, but it's not very efficient for programs that require heavy-duty string manipulation. Because String objects are immutable, any method of the String class that modifies the string in any way must create a new String object and copy the modified contents of the original string object to the new string. That's not so bad if it happens only occasionally, but it can be inefficient in programs that do it a lot.

Even string concatenation is inherently inefficient. For example, consider these statements:

“There are”: Created for the literal in the second statement. The msg variable is assigned a reference to this string.

“5”: Created to hold the result of count.toString(). The toString method is implicitly called by the third statement, so count is concatenated with msg.

“There are 5”: Created as a result of the concatenation in the third statement. A reference to this object is assigned to msg.

“apples in the basket.”: Created to hold the literal in the fourth statement.

“There are 5 apples in the basket.”: Created to hold the result of the concatenation in the fourth statement. A reference to this object is assigned to msg.

For programs that do only occasional string concatenation and simple string manipulations, these inefficiencies aren't a big deal. For programs that do extensive string manipulation, however, Java offers two alternatives to the String class: the StringBuilder and StringBuffer classes.

TECHNICAL STAUFF

The StringBuilder and StringBuffer classes are mirror images of each other. Both have the same methods and perform the same string manipulations. The only difference is that the StringBuffer class is safe to use in applications that work with multiple threads. StringBuilder is not safe for threaded applications, but is more efficient than StringBuffer. As a result, you should use the StringBuilder class unless your application uses threads. (Find out how to work with threads in Book V.)

Note

The StringBuilder class was introduced in Java version 1.5. If you're using an older Java compiler, you have to use StringBuffer instead.

Creating a StringBuilder object

You can't assign string literals directly to a StringBuilder object as you can with a String object. However, the StringBuilder class has a constructor that accepts a String as a parameter. So, to create a StringBuilder object, you use a statement such as this:

StringBuilder sb = new StringBuilder("Today is the day!");

Internally, a StringBuilder object maintains a fixed area of memory where it stores a string value. This area of memory is called the buffer. The string held in this buffer doesn't have to use the entire buffer. As a result, a StringBuilder object has both a length and a capacity. The length represents the current length of the string maintained by the StringBuilder, and the capacity represents the size of the buffer itself. Note that the length can't exceed the capacity.

When you create a StringBuilder object, initially the capacity is set to the length of the string plus 16. The StringBuilder class automatically increases its capacity whenever necessary, so you don't have to worry about exceeding the capacity.

Using StringBuilder methods

Table 1-2 lists the most useful methods of the StringBuilder class. Note that the StringBuffer class uses the same methods. So if you have to use StringBuffer instead of StringBuilder, just change the class name and use the same methods.

Table 1-2: StringBuilder Methods Open table as spreadsheet

Method

Description

append(primitiveType)

Appends the string representation of the primitive type to the end of the string.

append(Object)

Calls the object's toString method and appends the result to the end of the string.

append(CharSequence)

Appends the string to the end of the StringBuilder 's string value. The parameter can be a String, StringBuilder, or StringBuffer.

char charAt(int)

Returns the character at the specified position in the string.

delete(int, int)

Deletes characters starting with the first int and ending with the character before the second int.

deleteCharAt(int)

Deletes the character at the specified position.

ensureCapacity(int)

Ensures that the capacity of String-Builder is at least equal to the int value; increases the capacity if necessary.

int capacity()

Returns the capacity of this StringBuilder.

int indexOf(String)

Returns the index of the first occurrence of the specified string. If the string doesn't appear, returns −1.

int indexOf(String, int)

Returns the index of the first occurrence of the specified string, starting the search at the specified index position. If the string doesn't appear, returns −1.

insert(int,primitiveType

Inserts the string representation of the primitive type ) at the point specified by the int argument.

insert(int, Object)

Calls the toString method of the Object parameter, and then inserts the resulting string at the point specified by the int argument.

insert(int, int CharSequence)

Inserts the string at the point specified by the int argument. The second parameter can be a String, StringBuilder, or StringBuffer.

int lastIndexOf(String)

Returns the index of the last occurrence of the specified string. If the string doesn't appear, returns −1.

int lastIndexOf(String, int)

Returns the index of the last occurrence of the specified string, starting the search at the specified index position. If the string doesn't appear, returns −1.

int length()

Returns the length of this string.

replace(int, int, String)

Replaces the substring indicated by the first two parameters with the string provided by the third parameter.

reverse()

Reverses the order of characters.

setCharAt(int, char)

Sets the character at the specified position to the specified character.

setLength(int)

Sets the length of the string. If that length is less than the current length, the string is truncated; if it's greater than the current length, new characters-hexadecimal zeros-are added.

String substring(int)

Extracts a substring, beginning at the position indicated by the int parameter and continuing to the end of the string.

String substring(int, int)

Extracts a substring, beginning at the position indicated by the first parameter and ending at the position one character before the value of the second parameter.

String toString()

Returns the current value as a String.

String trimToSize()

Reduces the capacity of the StringBuffer to match the size of the string.

A StringBuilder example

To illustrate how the StringBuilder class works, here's a StringBuilder version of the MarkVowel program from earlier in this chapter:

This program uses the setCharAt method to directly replace any vowels it finds with asterisks. That's much more efficient than concatenating substrings (which is the way the String version of this program worked).

Using the CharSequence Interface

The Java API includes a useful interface called CharSequence. All three of the classes in this chapter (String, StringBuilder, and StringBuffer) implement this interface. This method exists primarily to let you use String, StringBuilder, and StringBuffer interchangeably.

Toward that end, several of the methods of the String, StringBuilder, and StringBuffer classes use CharSequence as a parameter type. For those methods, you can pass a String, StringBuilder, or StringBuffer object. Note that a string literal is treated as a String object, so you can use a string literal anywhere a CharSequence is called for.

TECHNICAL STAUFF

In case you're interested, the CharSequence interface defines four methods:

char charAt(int): Returns the character at the specified position

int length(): Returns the length of the sequence

subSequence(int start, int end): Returns the substring indicated by the start and end parameters

toString(): Returns a String representation of the sequence

If you're inclined to use CharSequence as a parameter type for a method so the method works with a String, StringBuilder, or StringBuffer object, be advised that you can use only these four methods.