Thinking in Java, 3rd ed. Revision 4.0

12:
The Java I/O System

Creating a good input/output (I/O) system is one of the more difficult tasks for the language designer.

This is evidenced by the number of different approaches. The challenge seems to be in covering all eventualities. Not only are there different sources and sinks of I/O that you want to communicate with (files, the console, network connections, etc.), but you need to talk to them in a wide variety of ways (sequential, random-access, buffered, binary, character, by lines, by words, etc.). Feedback

The Java library designers attacked this problem by creating lots of classes. In fact, there are so many classes for Javas I/O system that it can be intimidating at first (ironically, the Java I/O design actually prevents an explosion of classes). There was also a significant change in the I/O library after Java 1.0, when the original byte-oriented library was supplemented with char-oriented, Unicode-based I/O classes. In JDK 1.4, the nio classes (for new I/O, a name well still be using years from now) were added for improved performance and functionality. As a result, there are a fair number of classes to learn before you understand enough of Javas I/O picture that you can use it properly. In addition, its rather important to understand the evolution history of the I/O library, even if your first reaction is dont bother me with history, just show me how to use it! The problem is that without the historical perspective, you will rapidly become confused with some of the classes and when you should and shouldnt use them. Feedback

This chapter will give you an introduction to the variety of I/O classes in the standard Java library and how to use them. Feedback

The
File class

Before getting into the classes that actually read and write data to streams, well look at a utility provided with the library to assist you in handling file directory issues. Feedback

The File class has a deceiving name; you might think it refers to a file, but it doesnt. It can represent either the name of a particular file or the names of a set of files in a directory. If its a set of files, you can ask for that set using the list( ) method, which returns an array of String. It makes sense to return an array rather than one of the flexible container classes, because the number of elements is fixed, and if you want a different directory listing, you just create a different File object. In fact, FilePath would have been a better name for the class. This section shows an example of the use of this class, including the associated FilenameFilterinterface. Feedback

A
directory lister

Suppose youd like to see a directory listing. The File object can be listed in two ways. If you call list( ) with no arguments, youll get the full list that the File object contains. However, if you want a restricted listfor example, if you want all of the files with an extension of .javathen you use a directory filter, which is a class that tells how to select the File objects for display. Feedback

Heres the code for the example. Note that the result has been effortlessly sorted (alphabetically) using the java.utils.Arrays.sort( ) method and the AlphabeticComparator defined in Chapter 11:

It says all that this type of object does is provide a method called accept( ). The whole reason behind the creation of this class is to provide the accept( ) method to the list( ) method so that list( ) can call back accept( ) to determine which file names should be included in the list. Thus, this structure is often referred to as a callback. More specifically, this is an example of the Strategy Pattern, because list( ) implements basic functionality, and you provide the Strategy in the form of a FilenameFilter in order to complete the algorithm necessary for list( ) to provide its service. Because list( ) takes a FilenameFilter object as its argument, it means that you can pass an object of any class that implements FilenameFilter to choose (even at run time) how the list( ) method will behave. The purpose of a callback is to provide flexibility in the behavior of code. Feedback

DirFilter shows that just because an interface contains only a set of methods, youre not restricted to writing only those methods. (You must at least provide definitions for all the methods in an interface, however.) In this case, the DirFilter constructor is also created. Feedback

The accept( ) method must accept a File object representing the directory that a particular file is found in, and a String containing the name of that file. You might choose to use or ignore either of these arguments, but you will probably at least use the file name. Remember that the list( ) method is calling accept( ) for each of the file names in the directory object to see which one should be included; this is indicated by the boolean result returned by accept( ). Feedback

To make sure the element youre working with is only the file name and contains no path information, all you have to do is take the String object and create a File object out of it, then call getName( ), which strips away all the path information (in a platform-independent way). Then accept( ) uses a regular expression matcher object to see if the regular expression regex matches the name of the file. Using accept( ), the list( ) method returns an array. Feedback

Anonymous inner classes

This example is ideal for rewriting using an anonymous inner class (described in Chapter 8). As a first cut, a method filter( ) is created that returns a reference to a FilenameFilter:

Note that the argument to filter( ) must be final. This is required by the anonymous inner class so that it can use an object from outside its scope. Feedback

This design is an improvement because the FilenameFilter class is now tightly bound to DirList2. However, you can take this approach one step further and define the anonymous inner class as an argument to list( ), in which case its even smaller:

The argument to main( ) is now final, since the anonymous inner class uses args[0] directly. Feedback

This shows you how anonymous inner classes allow the creation of specific, one-off classes to solve problems. One benefit of this approach is that it keeps the code that solves a particular problem isolated together in one spot. On the other hand, it is not always as easy to read, so you must use it judiciously. Feedback

Checking
for and creating directories

The File class is more than just a representation for an existing file or directory. You can also use a File object to create a new directory or an entire directory path if it doesnt exist. You can also look at the characteristics of files (size, last modification date, read/write), see whether a File object represents a file or a directory, and delete a file. This program shows some of the other methods available with the File class (see the HTML documentation from java.sun.com for the full set):

In fileData( ) you can see various file investigation methods used to display information about the file or directory path. Feedback

The first method thats exercised by main( ) is renameTo( ), which allows you to rename (or move) a file to an entirely new path represented by the argument, which is another File object. This also works with directories of any length. Feedback

If you experiment with the preceding program, youll find that you can make a directory path of any complexity, because mkdirs( ) will do all the work for you. Feedback

Input and output

I/O libraries often use the abstraction of a stream, which represents any data source or sink as an object capable of producing or receiving pieces of data. The stream hides the details of what happens to the data inside the actual I/O device. Feedback

The Java library classes for I/O are divided by input and output, as you can see by looking at the class hierarchy in the JDK documentation. By inheritance, everything derived from the InputStream or Reader classes have basic methods called read( ) for reading a single byte or array of bytes. Likewise, everything derived from OutputStream or Writer classes have basic methods called write( ) for writing a single byte or array of bytes. However, you wont generally use these methods; they exist so that other classes can use themthese other classes provide a more useful interface. Thus, youll rarely create your stream object by using a single class, but instead will layer multiple objects together to provide your desired functionality. The fact that you create more than one object to create a single resulting stream is the primary reason that Javas stream library is confusing. Feedback

Its helpful to categorize the classes by their functionality. In Java 1.0, the library designers started by deciding that all classes that had anything to do with input would be inherited from InputStream, and all classes that were associated with output would be inherited from OutputStream. Feedback

Types
of InputStream

InputStreams job is to represent classes that produce input from different sources. These sources can be:

An array of bytes.

A String object.

A file.

A pipe, which works like a physical
pipe: You put things in at one
end and they come out the other.

A sequence of other streams, so you can collect them together into a single
stream.

Other sources, such as an Internet connection. (This is covered in
Thinking in Enterprise Java.) Feedback

Each of these has an associated subclass of InputStream. In addition, the FilterInputStream is also a type of InputStream, to provide a base class for "decorator" classes that attach attributes or useful interfaces to input streams. This is discussed later. Feedback

Table 12-1. Types of InputStream

Class

Function

Constructor Arguments

How to use it

ByteArray-InputStream

Allows a buffer in memory to be used as an InputStream.

The buffer from which to extract the bytes.

As a source of data: Connect it to a FilterInputStream object to provide a useful interface.

StringBuffer-InputStream

Converts a String into an InputStream.

A String. The underlying implementation actually uses a StringBuffer.

As a source of data: Connect it to a FilterInputStream object to provide a useful interface.

File-InputStream

For reading information from a file.

A String representing the file name, or a File or FileDescriptor object.

As a source of data: Connect it to a FilterInputStream object to provide a useful interface.

Piped-InputStream

Produces the data thats being written to the associated PipedOutput-Stream. Implements the piping concept.

PipedOutputStream

As a source of data in multithreading: Connect it to a FilterInputStream object to provide a useful interface.

Sequence-InputStream

Converts two or more InputStream objects into a single InputStream.

Two InputStream objects or an Enumeration for a container of InputStream objects.

As a source of data: Connect it to a FilterInputStream object to provide a useful interface.

Filter-InputStream

Abstract class that is an interface for decorators that provide useful functionality to the other InputStream classes. See Table 12-3.

See Table 12-3.

See Table 12-3.

Types
of OutputStream

This category includes the classes that decide where your output will go: an array of bytes (no String, however; presumably, you can create one using the array of bytes), a file, or a pipe. Feedback

In addition, the FilterOutputStream provides a base class for "decorator" classes that attach attributes or useful interfaces to output streams. This is discussed later. Feedback

Table 12-2. Types of OutputStream

Class

Function

Constructor Arguments

How to use it

ByteArray-OutputStream

Creates a buffer in memory. All the data that you send to the stream is placed in this buffer.

Optional initial size of the buffer.

To designate the destination of your data: Connect it to a FilterOutputStream object to provide a useful interface.

File-OutputStream

For sending information to a file.

A String representing the file name, or a File or FileDescriptor object.

To designate the destination of your data: Connect it to a FilterOutputStream object to provide a useful interface.

Piped-OutputStream

Any information you write to this automatically ends up as input for the associated PipedInput-Stream. Implements the piping concept.

PipedInputStream

To designate the destination of your data for multithreading: Connect it to a FilterOutputStream object to provide a useful interface.

Filter-OutputStream

Abstract class that is an interface for decorators that provide useful functionality to the other OutputStream classes. See Table 12-4.

See Table 12-4.

See Table 12-4.

Adding
attributes and useful interfaces

The use of layered objects to dynamically and transparently add responsibilities to individual objects is referred to as the Decorator pattern. (Patterns[61] are the subject of Thinking in Patterns (with Java) at www.BruceEckel.com.) The decorator pattern specifies that all objects that wrap around your initial object have the same interface. This makes the basic use of the decorators transparentyou send the same message to an object whether it has been decorated or not. This is the reason for the existence of the filter classes in the Java I/O library: The abstract filter class is the base class for all the decorators. (A decorator must have the same interface as the object it decorates, but the decorator can also extend the interface, which occurs in several of the filter classes). Feedback

Decorators are often used when simple subclassing results in a large number of classes in order to satisfy every possible combination that is neededso many classes that it becomes impractical. The Java I/O library requires many different combinations of features, and this is the justification for using the decorator pattern.[62] There is a drawback to the decorator pattern, however. Decorators give you much more flexibility while youre writing a program (since you can easily mix and match attributes), but they add complexity to your code. The reason that the Java I/O library is awkward to use is that you must create many classesthe core I/O type plus all the decoratorsin order to get the single I/O object that you want. Feedback

The classes that provide the decorator interface to control a particular InputStream or OutputStream are the FilterInputStream and FilterOutputStream, which dont have very intuitive names. FilterInputStream and FilterOutputStream are derived from the base classes of the I/O library, InputStream and OutputStream, which is the key requirement of the decorator (so that it provides the common interface to all the objects that are being decorated). Feedback

Reading
from an InputStream with FilterInputStream

The FilterInputStream classes accomplish two significantly different things. DataInputStream allows you to read different types of primitive data as well as String objects. (All the methods start with read, such as readByte( ), readFloat( ), etc.) This, along with its companion DataOutputStream, allows you to move primitive data from one place to another via a stream. These places are determined by the classes in Table 12-1. Feedback

The remaining classes modify the way an InputStream behaves internally: whether its buffered or unbuffered, if it keeps track of the lines its reading (allowing you to ask for line numbers or set the line number), and whether you can push back a single character. The last two classes look a lot like support for building a compiler (that is, they were probably added to support the construction of the Java compiler), so you probably wont use them in general programming. Feedback

Youll need to buffer your input almost every time, regardless of the I/O device youre connecting to, so it would have made more sense for the I/O library to make a special case (or simply a method call) for unbuffered input rather than buffered input. Feedback

Table 12-3. Types of FilterInputStream

Class

Function

Constructor Arguments

How to use it

Data-InputStream

Used in concert with DataOutputStream, so you can read primitives (int, char, long, etc.) from a stream in a portable fashion.

InputStream

Contains a full interface to allow you to read primitive types.

Buffered-InputStream

Use this to prevent a physical read every time you want more data. Youre saying Use a buffer.

InputStream, with optional buffer size.

This doesnt provide an interface per se, just a requirement that a buffer be used. Attach an interface object.

LineNumber-InputStream

Keeps track of line numbers in the input stream; you can call getLineNumber( ) and setLineNumber(int).

InputStream

This just adds line numbering, so youll probably attach an interface object.

Pushback-InputStream

Has a one byte push-back buffer so that you can push back the last character read.

InputStream

Generally used in the scanner for a compiler and probably included because the Java compiler needed it. You probably wont use this.

Writing
to an OutputStreamwith FilterOutputStream

The complement to DataInputStream is DataOutputStream, which formats each of the primitive types and String objects onto a stream in such a way that any DataInputStream, on any machine, can read them. All the methods start with write, such as writeByte( ), writeFloat( ), etc. Feedback

The original intent of PrintStream was to print all of the primitive data types and String objects in a viewable format. This is different from DataOutputStream, whose goal is to put data elements on a stream in a way that DataInputStream can portably reconstruct them. Feedback

The two important methods in PrintStream are print( ) and println( ), which are overloaded to print all the various types. The difference between print( ) and println( ) is that the latter adds a newline when its done. Feedback

PrintStream can be problematic because it traps all IOExceptions (You must explicitly test the error status with checkError( ), which returns true if an error has occurred). Also, PrintStream doesnt internationalize properly and doesnt handle line breaks in a platform-independent way (these problems are solved with PrintWriter, described later). Feedback

BufferedOutputStream is a modifier and tells the stream to use buffering so you dont get a physical write every time you write to the stream. Youll probably always want to use this when doing output. Feedback

Table 12-4. Types of FilterOutputStream

Class

Function

Constructor Arguments

How to use it

Data-OutputStream

Used in concert with DataInputStream so you can write primitives (int, char, long, etc.) to a stream in a portable fashion.

OutputStream, with optional boolean indicating that the buffer is flushed with every newline.

Should be the final wrapping for your OutputStream object. Youll probably use this a lot.

Buffered-OutputStream

Use this to prevent a physical write every time you send a piece of data. Youre saying Use a buffer. You can call flush( ) to flush the buffer.

OutputStream, with optional buffer size.

This doesnt provide an interface per se, just a requirement that a buffer is used. Attach an interface object.

Readers
& Writers

Java 1.1 made some significant modifications to the fundamental I/O stream library. When you see the Reader and Writer classes, your first thought (like mine) might be that these were meant to replace the InputStream and OutputStream classes. But thats not the case. Although some aspects of the original streams library are deprecated (if you use them you will receive a warning from the compiler), the InputStream and OutputStream classes still provide valuable functionality in the form of byte-oriented I/O, whereas the Reader and Writer classes provide Unicode-compliant, character-based I/O. In addition: Feedback

Java
1.1 added new classes into the InputStream and OutputStream
hierarchy, so its obvious those hierarchies werent being
replaced. Feedback

There are times when you must use classes from the byte
hierarchy in combination with classes in the character
hierarchy. To accomplish this, there are adapter classes:
InputStreamReader converts an InputStream to a Reader and
OutputStreamWriter converts an OutputStream to a Writer.
Feedback

The most important reason for the Reader and Writer hierarchies is for internationalization. The old I/O stream hierarchy supports only 8-bit byte streams and doesnt handle the 16-bit Unicode characters well. Since Unicode is used for internationalization (and Javas native char is 16-bit Unicode), the Reader and Writer hierarchies were added to support Unicode in all I/O operations. In addition, the new libraries are designed for faster operations than the old. Feedback

As is the practice in this book, I will attempt to provide an overview of the classes, but assume that you will use the JDK documentation to determine all the details, such as the exhaustive list of methods. Feedback

Sources and sinks of
data

Almost all of the original Java I/O stream classes have corresponding Reader and Writer classes to provide native Unicode manipulation. However, there are some places where the byte-oriented InputStreams and OutputStreams are the correct solution;in particular, the java.util.zip libraries are byte-oriented rather than char-oriented. So the most sensible approach to take is to try to use the Reader and Writer classes whenever you can, and youll discover the situations when you have to use the byte-oriented libraries, because your code wont compile. Feedback

Here is a table that shows the correspondence between the sources and sinks of information (that is, where the data physically comes from or goes to) in the two hierarchies.

Sources & Sinks:Java 1.0 class

Corresponding Java 1.1 class

InputStream

Reader adapter:InputStreamReader

OutputStream

Writer adapter:OutputStreamWriter

FileInputStream

FileReader

FileOutputStream

FileWriter

StringBufferInputStream

StringReader

(no corresponding class)

StringWriter

ByteArrayInputStream

CharArrayReader

ByteArrayOutputStream

CharArrayWriter

PipedInputStream

PipedReader

PipedOutputStream

PipedWriter

In general, youll find that the interfaces for the two different hierarchies are similar if not identical.

Modifying stream
behavior

For InputStreams and OutputStreams, streams were adapted for particular needs using decorator subclasses of FilterInputStream and FilterOutputStream. The Reader and Writer class hierarchies continue the use of this ideabut not exactly. Feedback

In the following table, the correspondence is a rougher approximation than in the previous table. The difference is because of the class organization; although BufferedOutputStream is a subclass of FilterOutputStream, BufferedWriter is not a subclass of FilterWriter (which, even though it is abstract, has no subclasses and so appears to have been put in either as a placeholder or simply so you wouldnt wonder where it was). However, the interfaces to the classes are quite a close match.

Filters:Java 1.0 class

Corresponding Java 1.1 class

FilterInputStream

FilterReader

FilterOutputStream

FilterWriter (abstract class with no subclasses)

BufferedInputStream

BufferedReader(also has readLine( ))

BufferedOutputStream

BufferedWriter

DataInputStream

Use DataInputStream(except when you need to use readLine( ),when you should use a BufferedReader)

PrintStream

PrintWriter

LineNumberInputStream(deprecated)

LineNumberReader

StreamTokenizer

StreamTokenizer(use constructor that takes a Reader instead)

PushBackInputStream

PushBackReader

Theres one direction thats quite clear: Whenever you want to use readLine( ), you shouldnt do it with a DataInputStream (this is met with a deprecation message at compile time), but instead use a BufferedReader. Other than this, DataInputStream is still a preferred member of the I/O library.

To make the transition to using a PrintWriter easier, it has constructors that take any OutputStream object as well as Writer objects. However, PrintWriter has no more support for formatting than PrintStream does; the interfaces are virtually the same. Feedback

The PrintWriter constructor also has an option to perform automatic flushing, which happens after every println( ) if the constructor flag is set. Feedback

Unchanged Classes

Some classes were left unchanged between Java 1.0 and Java 1.1:

Java 1.0 classes without corresponding Java 1.1 classes

DataOutputStream

File

RandomAccessFile

SequenceInputStream

DataOutputStream, in particular, is used without change, so for storing and retrieving data in a transportable format, you use the InputStream and OutputStream hierarchies.

Off by itself:
RandomAccessFile

RandomAccessFile is used for files containing records of known size so that you can move from one record to another using seek( ), then read or change the records. The records dont have to be the same size; you just have to be able to determine how big they are and where they are placed in the file. Feedback

At first its a little bit hard to believe that RandomAccessFile is not part of the InputStream or OutputStream hierarchy. However, it has no association with those hierarchies other than that it happens to implement the DataInput and DataOutput interfaces (which are also implemented by DataInputStream and DataOutputStream). It doesnt even use any of the functionality of the existing InputStream or OutputStream classes; its a completely separate class, written from scratch, with all of its own (mostly native) methods. The reason for this may be that RandomAccessFile has essentially different behavior than the other I/O types, since you can move forward and backward within a file. In any event, it stands alone, as a direct descendant of Object. Feedback

Essentially, a RandomAccessFile works like a DataInputStream pasted together with a DataOutputStream, along with the methods getFilePointer( ) to find out where you are in the file, seek( ) to move to a new point in the file, and length( ) to determine the maximum size of the file. In addition, the constructors require a second argument (identical to fopen( ) in C) indicating whether you are just randomly reading (r) or reading and writing (rw). Theres no support for write-only files, which could suggest that RandomAccessFile might have worked well if it were inherited from DataInputStream. Feedback

The seeking methods are available only in RandomAccessFile, which works for files only. BufferedInputStream does allow you to mark( ) a position (whose value is held in a single internal variable) and reset( ) to that position, but this is limited and not very useful. Feedback

Most, if not all, of the RandomAccessFile functionality is superceded in JDK 1.4 with the niomemory-mapped files, which will be described later in this chapter.

Typical
uses of I/O streams

Although you can combine the I/O stream classes in many different ways, youll probably just use a few combinations. The following example can be used as a basic reference; it shows the creation and use of typical I/O configurations. Note that each configuration begins with a commented number and title that corresponds to the heading for the appropriate explanation that follows in the text.

Here are the descriptions for the numbered sections of the program: Feedback

Input
streams

Parts 1 through 4 demonstrate the creation and use of input streams. Part 4 also shows the simple use of an output stream.

1. Buffered input file

To open a file for character input, you use a FileInputReader with a String or a File object as the file name. For speed, youll want that file to be buffered so you give the resulting reference to the constructor for a BufferedReader. Since BufferedReader also provides the readLine( ) method, this is your final object and the interface you read from. When you reach the end of the file, readLine( ) returns null so that is used to break out of the while loop. Feedback

The String s2 is used to accumulate the entire contents of the file (including newlines that must be added since readLine( ) strips them off). s2 is then used in the later portions of this program. Finally, close( ) is called to close the file. Technically, close( ) will be called when finalize( ) runs, and this is supposed to happen (whether or not garbage collection occurs) as the program exits. However, this has been inconsistently implemented, so the only safe approach is to explicitly call close( ) for files. Feedback

Section 1b shows how you can wrap System.in for reading console input. System.in is an InputStream, and BufferedReader needs a Reader argument, so InputStreamReader is brought in to perform the adaptation. Feedback

2. Input from memory

This section takes the String s2 that now contains the entire contents of the file and uses it to create a StringReader. Then read( ) is used to read each character one at a time and send it out to the console. Note that read( ) returns the next byte as an int and thus it must be cast to a char to print properly. Feedback

3. Formatted memory input

To read formatted data, you use a DataInputStream, which is a byte-oriented I/O class (rather than char-oriented). Thus you must use all InputStream classes rather than Reader classes. Of course, you can read anything (such as a file) as bytes using InputStream classes, but here a String is used. To convert the String to an array of bytes, which is what is appropriate for a ByteArrayInputStream, String has a getBytes( ) method to do the job. At that point, you have an appropriate InputStream to hand to DataInputStream. Feedback

If you read the characters from a DataInputStream one byte at a time using readByte( ), any byte value is a legitimate result, so the return value cannot be used to detect the end of input. Instead, you can use the available( ) method to find out how many more characters are available. Heres an example that shows how to read a file one byte at a time:

Note that available( ) works differently depending on what sort of medium youre reading from; its literally the number of bytes that can be read without blocking. With a file, this means the whole file, but with a different kind of stream this might not be true, so use it thoughtfully. Feedback

You could also detect the end of input in cases like these by catching an exception. However, the use of exceptions for control flow is considered a misuse of that feature. Feedback

4. File output

This example also shows how to write data to a file. First, a FileWriter is created to connect to the file. Youll virtually always want to buffer the output by wrapping it in a BufferedWriter (try removing this wrapping to see the impact on the performancebuffering tends to dramatically increase performance of I/O operations). Then for the formatting its turned into a PrintWriter. The data file created this way is readable as an ordinary text file. Feedback

As the lines are written to the file, line numbers are added. Note that LineNumberInputStream is not used, because its a silly class and you dont need it. As shown here, its trivial to keep track of your own line numbers. Feedback

When the input stream is exhausted, readLine( ) returns null. Youll see an explicit close( ) for out1, because if you dont call close( ) for all your output files, you might discover that the buffers dont get flushed, so theyre incomplete. Feedback

Output
streams

The two primary kinds of output streams are separated by the way they write data; one writes it for human consumption, and the other writes it to be reacquired by a DataInputStream. The RandomAccessFile stands alone, although its data format is compatible with the DataInputStream and DataOutputStream. Feedback

5. Storing and recovering data

A PrintWriter formats data so that its readable by a human. However, to output data for recovery by another stream, you use a DataOutputStream to write the data and a DataInputStream to recover the data. Of course, these streams could be anything, but here a file is used, buffered for both reading and writing. DataOutputStream and DataInputStream are byte-oriented and thus require the InputStreams and OutputStreams. Feedback

If you use a DataOutputStream to write the data, then Java guarantees that you can accurately recover the data using a DataInputStreamregardless of what different platforms write and read the data. This is incredibly valuable, as anyone knows who has spent time worrying about platform-specific data issues. That problem vanishes if you have Java on both platforms.[63]Feedback

When using a DataOutputStream, the only reliable way to write a String so that it can be recovered by a DataInputStream is to use UTF-8 encoding, accomplished in section 5 of the example using writeUTF( ) and readUTF( ). UTF-8 is a variation on Unicode, which stores all characters in two bytes. If youre working with ASCII or mostly ASCII characters (which occupy only seven bits), this is a tremendous waste of space and/or bandwidth, so UTF-8 encodes ASCII characters in a single byte, and non-ASCII characters in two or three bytes. In addition, the length of the string is stored in the first two bytes. However, writeUTF( ) and readUTF( ) use a special variation of UTF-8 for Java (which is completely described in the JDK documentation for those methods) , so if you read a string written with writeUTF( ) using a non-Java program, you must write special code in order to read the string properly. Feedback

With writeUTF( ) and readUTF( ), you can intermingle Strings and other types of data using a DataOutputStream with the knowledge that the Strings will be properly stored as Unicode, and will be easily recoverable with a DataInputStream. Feedback

The writeDouble( ) stores the double number to the stream and the complementary readDouble( ) recovers it (there are similar methods for reading and writing the other types). But for any of the reading methods to work correctly, you must know the exact placement of the data item in the stream, since it would be equally possible to read the stored double as a simple sequence of bytes, or as a char, etc. So you must either have a fixed format for the data in the file, or extra information must be stored in the file that you parse to determine where the data is located. Note that object serialization (described later in this chapter) may be an easier way to store and retrieve complex data structures. Feedback

6. Reading and writing random access files

As previously noted, the RandomAccessFile is almost totally isolated from the rest of the I/O hierarchy, save for the fact that it implements the DataInput and DataOutput interfaces. So you cannot combine it with any of the aspects of the InputStream and OutputStream subclasses. Even though it might make sense to treat a ByteArrayInputStream as a random-access element, you can use RandomAccessFile only to open a file. You must assume a RandomAccessFile is properly buffered since you cannot add that. Feedback

The one option you have is in the second constructor argument: you can open a RandomAccessFile to read (r) or read and write (rw). Feedback

Using a RandomAccessFile is like using a combined DataInputStream and DataOutputStream (because it implements the equivalent interfaces). In addition, you can see that seek( ) is used to move about in the file and change one of the values. Feedback

With the advent of new I/O in JDK 1.4, you may want to consider using memory-mapped files instead of RandomAccessFile. Feedback

Piped streams

The PipedInputStream, PipedOutputStream, PipedReader and PipedWriter have been mentioned only briefly in this chapter. This is not to suggest that they arent useful, but their value is not apparent until you begin to understand multithreading, since the piped streams are used to communicate between threads. This is covered along with an example in Chapter 13. Feedback

File reading & writing
utilities

A very common programming task is to read a file into memory, modify it, and then write it out again. One of the problems with the Java I/O library is that it requires you to write quite a bit of code in order to perform these common operationsthere are no basic helper function to do them for you. Whats worse, the decorators make it rather hard to remember how to open files. Thus, it makes sense to add helper classes to your library that will easily perform these basic tasks for you. Heres one that contains static methods to read and write text files as a single string. In addition, you can create a TextFile class that holds the lines of the file in an ArrayList (so you have all the ArrayList functionality available while manipulating the file contents): Feedback

All methods simply pass IOExceptions out to the caller. read( ) appends each line to a StringBuffer (for efficiency) followed by a newline, because that is stripped out during reading. Then it returns a String containing the whole file. Write( ) opens and writes the text to the file. Both methods remember to close( ) the file when they are done. Feedback

The constructor uses the read( ) method to turn the file into a String, then uses String.split( ) to divide the result into lines along newline boundaries (if you use this class a lot, you may want to rewrite this constructor to improve efficiency). Alas, there is no corresponding join method, so the non-static write( ) method must write the lines out by hand. Feedback

In main( ), a basic test is performed to ensure that the methods work. Although this is a small amount of code, using it can save a lot of time and make your life easier, as youll see in some of the examples later in this chapter. Feedback

Standard I/O

The term standard I/O refers to the Unix concept (which is reproduced in some form in Windows and many other operating systems) of a single stream of information that is used by a program. All the programs input can come from standard input, all its output can go to standard output, and all of its error messages can be sent to standard error. The value of standard I/O is that programs can easily be chained together, and one programs standard output can become the standard input for another program. This is a powerful tool. Feedback

Reading
from standard input

Following the standard I/O model, Java has System.in, System.out, and System.err. Throughout this book, youve seen how to write to standard output using System.out, which is already prewrapped as a PrintStream object. System.err is likewise a PrintStream, but System.in is a raw InputStream with no wrapping. This means that although you can use System.out and System.err right away, System.in must be wrapped before you can read from it. Feedback

Typically, youll want to read input a line at a time using readLine( ), so youll want to wrap System.in in a BufferedReader. To do this, you must convert System.in to a Reader using InputStreamReader. Heres an example that simply echoes each line that you type in:

The reason for the exception specification is that readLine( ) can throw an IOException. Note that System.in should usually be buffered, as with most streams. Feedback

Changing System.out
to a PrintWriter

System.out is a PrintStream, which is an OutputStream. PrintWriter has a constructor that takes an OutputStream as an argument. Thus, if you want, you can convert System.out into a PrintWriter using that constructor:

Its important to use the two-argument version of the PrintWriter constructor and to set the second argument to true in order to enable automatic flushing; otherwise, you may not see the output. Feedback

Redirecting standard
I/O

The Java System class allows you to redirect the standard input, output, and error I/O streams using simple static method calls:

Redirecting output is especially useful if you suddenly start creating a large amount of output on your screen, and its scrolling past faster than you can read it.[64] Redirecting input is valuable for a command-line program in which you want to test a particular user-input sequence repeatedly. Heres a simple example that shows the use of these methods:

This program attaches standard input to a file and redirects standard output and standard error to another file. Feedback

I/O redirection manipulates streams of bytes, not streams of characters, thus InputStreams and OutputStreams are used rather than Readers and Writers. Feedback

New
I/O

The Java new I/O library, introduced in JDK 1.4 in the java.nio.* packages, has one goal: speed. In fact, the old I/O packages have been reimplemented using nio in order to take advantage of this speed increase, so you will benefit even if you dont explicitly write code with nio. The speed increase occurs in both file I/O, which is explored here,[65] and in network I/O, which is covered in Thinking in Enterprise Java. Feedback

The speed comes from using structures that are closer to the operating systems way of performing I/O: channels and buffers. You could think of it as a coal mine; the channel is the mine containing the seam of coal (the data), and the buffer is the cart that you send into the mine. The cart comes back full of coal, and you get the coal from the cart. That is, you dont interact directly with the channel; you interact with the buffer and send the buffer into the channel. The channel either pulls data from the buffer, or puts data into the buffer. Feedback

The only kind of buffer that communicates directly with a channel is a ByteBufferthat is, a buffer that holds raw bytes. If you look at the JDK documentation for java.nio.ByteBuffer, youll see that its fairly basic: You create one by telling it how much storage to allocate, and there are a selection of methods to put and get data, in either raw byte form or as primitive data types. But theres no way to put or get an object, or even a String. Its fairly low-level, precisely because this makes a more efficient mapping with most operating systems. Feedback

Three of the classes in the old I/O have been modified so that they produce a FileChannel: FileInputStream, FileOutputStream, and, for both reading and writing, RandomAccessFile. Notice that these are the byte manipulation streams, in keeping with the low-level nature of nio. The Reader and Writer character-mode classes do not produce channels, but the class java.nio.channels.Channels has utility methods to produce Readers and Writers from channels. Feedback

Heres a simple example that exercises all three types of stream to produce channels that are writeable, read/writeable, and readable:

For any of the stream classes shown here, getChannel( ) will produce a FileChannel. A channel is fairly basic: You can hand it a ByteBuffer for reading or writing, and you can lock regions of the file for exclusive access (this will be described later). Feedback

One way to put bytes into a ByteBuffer is to stuff them in directly using one of the put methods, to put one or more bytes, or values of primitive types. However, as seen here, you can also wrap an existing byte array in a ByteBuffer using the wrap( ) method. When you do this, the underlying array is not copied, but instead is used as the storage for the generated ByteBuffer. We say that the ByteBuffer is backed by the array. Feedback

The data.txt file is reopened using a RandomAccessFile. Notice that you can move the FileChannel around in the file; here, it is moved to the end so that additional writes will be appended. Feedback

For read-only access, you must explicitly allocate a ByteBuffer using the static allocate( ) method. The goal of nio is to rapidly move large amounts of data, so the size of the ByteBuffer should be significantin fact, the 1K used here is probably quite a bit smaller than youd normally want to use (youll have to experiment with your working application to find the best size). Feedback

Its also possible to go for even more speed by using allocateDirect( ) instead of allocate( ) to produce a direct buffer that may have an even higher coupling with the operating system. However, the overhead in such an allocation is greater, and the actual implementation varies from one operating system to another, so again, you must experiment with your working application to discover whether direct buffers will buy you any advantage in speed. Feedback

Once you call read( ) to tell the FileChannel to store bytes into the ByteBuffer, you must call flip( ) on the buffer to tell it to get ready to have its bytes extracted (yes, this seems a bit crude, but remember that its very low-level and is done for maximum speed). And if we were to use the buffer for further read( ) operations, wed also have to call clear( ) to prepare it for each read( ). You can see this in a simple file copying program: Feedback

You can see that one FileChannel is opened for reading, and one for writing. A ByteBuffer is allocated, and when FileChannel.read( ) returns -1 (a holdover, no doubt, from Unix and C), it means that youve reached the end of the input. After each read( ), which puts data into the buffer, flip( ) prepares the buffer so that its information can be extracted by the write( ). After the write( ), the information is still in the buffer, and clear( ) resets all the internal pointers so that its ready to accept data during another read( ). Feedback

The preceding program is not the ideal way to handle this kind of operation, however. Special methods transferTo( ) and transferFrom( ) allow you to connect one channel directly to another: Feedback

You wont do this kind of thing very often, but its good to know about. Feedback

Converting data

If you look back at GetChannel.java, youll notice that, to print the information in the file, we are pulling the data out one byte at a time and casting each byte to a char. This seems a bit primitiveif you look at the java.nio.CharBuffer class, youll see that it has a toString( ) method that says: Returns a string containing the characters in this buffer. Since a ByteBuffer can be viewed as a CharBuffer with the asCharBuffer( ) method, why not use that? As you can see from the first line in the expect( ) statement below, this doesnt work out: Feedback

The buffer contains plain bytes, and to turn these into characters we must either encode them as we put them in (so that they will be meaningful when they come out) or decode them as they come out of the buffer. This can be accomplished using the java.nio.charset.Charset class, which provides tools for encoding into many different types of character sets: Feedback

So, returning to BufferToText.java, if you rewind( ) the buffer (to go back to the beginning of the data) and then use that platforms default character set to decode( ) the data, the resulting CharBuffer will print to the console just fine. To discover the default character set, use System.getProperty("file.encoding"), which produces the string that names the character set. Passing this to Charset.forName( ) produces the Charset object that can be used to decode the string. Feedback

Another alternative is to encode( ) using a character set that will result in something printable when the file is read, as you see in the third part of BufferToText.java. Here, UTF-16BE is used to write the text into the file, and when it is read, all you have to do is convert it to a CharBuffer, and it produces the expected text. Feedback

Finally, you see what happens if you write to the ByteBuffer through a CharBuffer (youll learn more about this later). Note that 24 bytes are allocated for the ByteBuffer. Since each char requires two bytes, this is enough for 12 chars, but Some text only has 9. The remaining zero bytes still appear in the representation of the CharBuffer produced by its toString( ), as you can see in the output. Feedback

Fetching primitives

Although a ByteBuffer only holds bytes, it contains methods to produce each of the different types of primitive values from the bytes it contains. This example shows the insertion and extraction of various values using these methods: Feedback

After a ByteBuffer is allocated, its values are checked to see whether buffer allocation automatically zeroes the contentsand it does. All 1,024 values are checked (up to the limit( ) of the buffer), and all are zero. Feedback

The easiest way to insert primitive values into a ByteBuffer is to get the appropriate view on that buffer using asCharBuffer( ), asShortBuffer( ), etc., and then to use that views put( ) method. You can see this is the process used for each of the primitive data types. The only one of these that is a little odd is the put( ) for the ShortBuffer, which requires a cast (note that the cast truncates and changes the resulting value). All the other view buffers do not require casting in their put( ) methods. Feedback

View buffers

A view buffer allows you to look at an underlying ByteBuffer through the window of a particular primitive type. The ByteBuffer is still the actual storage thats backing the view, so any changes you make to the view are reflected in modifications to the data in the ByteBuffer. As seen in the previous example, this allows you to conveniently insert primitive types into a ByteBuffer. A view also allows you to read primitive values from a ByteBuffer, either one at a time (as ByteBuffer allows) or in batches (into arrays). Heres an example that manipulates ints in a ByteBuffer via an IntBuffer: Feedback

The overloaded put( ) method is first used to store an array of int. The following get( ) and put( ) method calls directly access an int location in the underlying ByteBuffer. Note that these absolute location accesses are available for primitive types by talking directly to a ByteBuffer, as well. Feedback

Once the underlying ByteBuffer is filled with ints or some other primitive type via a view buffer, then that ByteBuffer can be written directly to a channel. You can just as easily read from a channel and use a view buffer to convert everything to a particular type of primitive. Heres an example that interprets the same sequence of bytes as short, int, float, long, and double by producing different view buffers on the same ByteBuffer: Feedback

The ByteBuffer is produced by wrapping an eight-byte array, which is then displayed via view buffers of all the different primitive types. You can see in the following diagram the way the data appears differently when read from the different types of buffers: Feedback

This corresponds to the output from the program.

Endians

Different machines may use different byte-ordering approaches to store data. Big endian places the most significant byte in the lowest memory address, and little endian places the most significant byte in the highest memory address. When storing a quantity that is greater than one byte, like int, float, etc.,you may need to consider the byte ordering. A ByteBuffer stores data in big endian form, and data sent over a network always uses big endian order. You can change the endian-ness of a ByteBuffer using order( ) with an argument of ByteOrder.BIG_ENDIAN or ByteOrder.LITTLE_ENDIAN. Feedback

Consider a ByteBuffer containing the following two bytes:

If you read the data as a short (ByteBuffer.asShortBuffer( )), you will get the number 97 (00000000 01100001), but if you change to little endian, you will get the number 24832 (01100001 00000000). Feedback

Heres an example that shows how byte ordering is changed in characters depending on the endian setting:

The ByteBuffer is given enough space to hold all the bytes in charArray as an external buffer so that that array( ) method can be called to display the underlying bytes. The array( ) method is optional, and you can only call it on a buffer that is backed by an array; otherwise, youll get an UnsupportedOperationException. Feedback

charArray is inserted into the ByteBuffer via a CharBuffer view. When the underlying bytes are displayed, you can see that the default ordering is the same as the subsequent big endian order, whereas the little endian order swaps the bytes. Feedback

Data manipulation with
buffers

The diagram here illustrates the relationships between the nio classes, so that you can see how to move and convert data. For example, if you wish to write a byte array to a file, then you wrap the byte array using the ByteBuffer.wrap( ) method, open a channel on the FileOutputStream using the getChannel( ) method, and then write data into FileChannel from this ByteBuffer. Feedback

Note that ByteBuffer is the only way to move data in and out of channels, and that you can only create a standalone primitive-typed buffer, or get one from a ByteBuffer using an as method. That is, you cannot convert a primitive-typed buffer to a ByteBuffer. However, since you are able to move primitive data into and out of a ByteBuffer via a view buffer, this is not really a restriction. Feedback

Buffer details

A Buffer consists of data and four indexes to access and manipulate this data efficiently: mark, position, limit and capacity. There are methods to set and reset these indexes and to query their value. Feedback

capacity( )

Returns the buffers capacity

clear( )

Clears the buffer, sets the position to zero, and limit to capacity. You call this method to overwrite an existing buffer.

flip( )

Sets limit to position and position to zero. This method is used to prepare the buffer for a read after data has been written into it.

limit( )

Returns the value of limit.

limit(int lim)

Sets the value of limit.

mark( )

Sets mark at position.

position( )

Returns the value of position.

position(int pos)

Sets the value of position.

remaining( )

Returns (limit - position).

hasRemaining( )

Returns true if there are any elements between position and limit.

Methods that insert and extract data from the buffer update these indexes to reflect the changes.

This example uses a very simple algorithm (swapping adjacent characters) to scramble and unscramble characters in a CharBuffer: Feedback

Although you could produce a CharBuffer directly by calling wrap( ) with a char array, an underlying ByteBuffer is allocated instead, and a CharBuffer is produced as a view on the ByteBuffer. This emphasizes that fact that the goal is always to manipulate a ByteBuffer, since that is what interacts with a channel. Feedback

Heres what the buffer looks like after the put( ):

The position points to the first element in the buffer, and the capacity and limit point to the last element. Feedback

In symmetricScramble( ), the while loop iterates until position is equivalent to limit. The position of the buffer changes when a relative get( ) or put( ) function is called on it. You can also call absolute get( ) and put( ) methods that include an index argument, which is the location where the get( ) or put( ) takes place. These methods do not modify the value of the buffers position. Feedback

When the control enters the while loop, the value of mark is set using mark( ) call. The state of the buffer then: Feedback

The two relative get( ) calls save the value of the first two characters in variables c1 and c2. After these two calls, the buffer looks like this: Feedback

To perform the swap, we need to write c2 at position = 0 and c1 at position = 1. We can either use the absolute put method to achieve this, or set the value of position to mark, which is what reset( ) does: Feedback

The two put( ) methods write c2 and then c1:

During the next iteration of the loop, mark is set to the current value of position:

The process continues until the entire buffer is traversed. At the end of the while loop, position is at the end of the buffer. If you print the buffer, only the characters between the position and limit are printed. Thus, if you want to show the entire contents of the buffer you must set position to the start of the buffer using rewind( ). Here is the state of buffer after the rewind( ) call (the value of mark becomes undefined): Feedback

When the function symmetricScramble( ) is called again, the CharBuffer undergoes the same process and is restored to its original state. Feedback

Memory-mapped
files

Memory-mapped files allow you to create and modify files that are too big to bring into memory. With a memory-mapped file, you can pretend that the entire file is in memory and that you can access it by simply treating it as a very large array. This approach greatly simplifies the code you write in order to modify the file. Heres a small example: Feedback

To do both writing and reading, we start with a RandomAccessFile, get a channel for that file, and then call map( ) to produce a MappedByteBuffer, which is a particular kind of direct buffer. Note that you must specify the starting point and the length of the region that you want to map in the file; this means that you have the option to map smaller regions of a large file. Feedback

MappedByteBuffer is inherited from ByteBuffer, so it has all of ByteBuffers methods. Only the very simple uses of put( ) and get( ) are shown here, but you can also use things like asCharBuffer( ), etc. Feedback

The file created with the preceding program is 128 MB long, which is probably larger than the space your OS will allow. The file appears to be accessible all at once because only portions of it are brought into memory, and other parts are swapped out. This way a very large file (up to 2 GB) can easily be modified. Note that the file-mapping facilities of the underlying operating system are used to maximize performance. Feedback

Performance

Although the performance of old stream I/O has been improved by implementing it with nio, mapped file access tends to be dramatically faster. This program does a simple performance comparison:

As seen in earlier examples in this book, runTest( ) is the Template Method that provides the testing framework for various implementations of test( ) defined in anonymous inner subclasses. Each of these subclasses perform one kind of test, so the test( ) methods also give you a prototype for performing the various I/O activities. Feedback

Although a mapped write would seem to use a FileOutputStream, all output in file mapping must use a RandomAccessFile, just as read/write does in the preceding code. Feedback

Note that the test( ) methods include the time for initialization of the various I/O objects, so even though the setup for mapped files can be expensive, the overall gain compared to stream I/O is significant. Feedback

File locking

File locking, introduced in JDK 1.4, allows you to synchronize access to a file as a shared resource. However, the two threads that contend for the same file may be in different JVMs, or one may be a Java thread and the other some native thread in the operating system. The file locks are visible to other operating system processes because Java file locking maps directly to the native operating system locking facility. Feedback

You get a FileLock on the entire file by calling either tryLock( ) or lock( ) on a FileChannel. (SocketChannel, DatagramChannel, and ServerSocketChannel do not need locking since they are inherently single-process entities; you dont generally share a network socket between two processes.) tryLock( ) is non-blocking. It tries to grab the lock, but if it cannot (when some other process already holds the same lock and it is not shared), it simply returns from the method call. lock( ) blocks until the lock is acquired, or the thread that invoked lock( ) is interrupted, or the channel on which the lock( ) method is called is closed. A lock is released using FileLock.release( ). Feedback

It is also possible to lock a part of the file by using

tryLock(long position, long size, boolean shared)

or

lock(long position, long size, boolean shared)

which locks the region (size - position). The third argument specifies whether this lock is shared. Feedback

Although the zero-argument locking methods adapt to changes in the size of a file, locks with a fixed size do not change if the file size changes. If a lock is acquired for a region from position to position+size and the file increases beyond position+size, then the section beyond position+size is not locked. The zero-argument locking methods lock the entire file, even if it grows. Feedback

Support for exclusive or shared locks must be provided by the underlying operating system. If the operating system does not support shared locks and a request is made for one, an exclusive lock is used instead. The type of lock (shared or exclusive) can be queried using FileLock.isShared( ). Feedback

Locking portions of a mapped file

As mentioned earlier, file mapping is typically used for very large files. One thing that you may need to do with such a large file is to lock portions of it so that other processes may modify unlocked parts of the file. This is something that happens, for example, with a database, so that it can be available to many users at once. Feedback

Heres an example that has two threads, each of which locks a distinct portion of a file:

The LockAndModify thread class sets up the buffer region and creates a slice( ) to be modified, and in run( ), the lock is acquired on the file channel (you cant acquire a lock on the bufferonly the channel). The call to lock( ) is very similar to acquiring a threading lock on an objectyou now have a critical section with exclusive access to that portion of the file. Feedback

The locks are automatically released when the JVM exits, or the channel on which it was acquired is closed, but you can also explicitly call release( ) on the FileLock object, as shown here. Feedback

Compression

The Java I/O library contains classes to support reading and writing streams in a compressed format. These are wrapped around existing I/O classes to provide compression functionality. Feedback

These classes are not derived from the Reader and Writer classes, but instead are part of the InputStream and OutputStream hierarchies. This is because the compression library works with bytes, not characters. However, you might sometimes be forced to mix the two types of streams. (Remember that you can use InputStreamReader and OutputStreamWriter to provide easy conversion between one type and another.)

Compression class

Function

CheckedInputStream

GetCheckSum( ) produces checksum for any InputStream (not just decompression).

CheckedOutputStream

GetCheckSum( ) produces checksum for any OutputStream (not just compression).

DeflaterOutputStream

Base class for compression classes.

ZipOutputStream

A DeflaterOutputStream that compresses data into the Zip file format.

GZIPOutputStream

A DeflaterOutputStream that compresses data into the GZIP file format.

InflaterInputStream

Base class for decompression classes.

ZipInputStream

An InflaterInputStream that decompresses data that has been stored in the Zip file format.

GZIPInputStream

An InflaterInputStream that decompresses data that has been stored in the GZIP file format.

Although there are many compression algorithms, Zip and GZIP are possibly the most commonly used. Thus you can easily manipulate your compressed data with the many tools available for reading and writing these formats.

Simple compression with
GZIP

The GZIP interface is simple and thus is probably more appropriate when you have a single stream of data that you want to compress (rather than a container of dissimilar pieces of data). Heres an example that compresses a single file:

The use of the compression classes is straightforward; you simply wrap your output stream in a GZIPOutputStream or ZipOutputStream, and your input stream in a GZIPInputStream or ZipInputStream. All else is ordinary I/O reading and writing. This is an example of mixing the char-oriented streams with the byte-oriented streams; in uses the Reader classes, whereas GZIPOutputStreams constructor can accept only an OutputStream object, not a Writer object. When the file is opened, the GZIPInputStream is converted to a Reader. Feedback

Multifile storage with
Zip

The library that supports the Zip format is much more extensive. With it you can easily store multiple files, and theres even a separate class to make the process of reading a Zip file easy. The library uses the standard Zip format so that it works seamlessly with all the tools currently downloadable on the Internet. The following example has the same form as the previous example, but it handles as many command-line arguments as you want. In addition, it shows the use of the Checksum classes to calculate and verify the checksum for the file. There are two Checksum types: Adler32 (which is faster) and CRC32 (which is slower but slightly more accurate). Feedback

For each file to add to the archive, you must call putNextEntry( ) and pass it a ZipEntry object. The ZipEntry object contains an extensive interface that allows you to get and set all the data available on that particular entry in your Zip file: name, compressed and uncompressed sizes, date, CRC checksum, extra field data, comment, compression method, and whether its a directory entry. However, even though the Zip format has a way to set a password, this is not supported in Javas Zip library. And although CheckedInputStream and CheckedOutputStream support both Adler32 and CRC32 checksums, the ZipEntry class supports only an interface for CRC. This is a restriction of the underlying Zip format, but it might limit you from using the faster Adler32. Feedback

To extract files, ZipInputStream has a getNextEntry( ) method that returns the next ZipEntry if there is one. As a more succinct alternative, you can read the file using a ZipFile object, which has a method entries( ) to return an Enumeration to the ZipEntries. Feedback

In order to read the checksum, you must somehow have access to the associated Checksum object. Here, a reference to the CheckedOutputStream and CheckedInputStream objects is retained, but you could also just hold onto a reference to the Checksum object. Feedback

A baffling method in Zip streams is setComment( ). As shown in ZipCompress.java, you can set a comment when youre writing a file, but theres no way to recover the comment in the ZipInputStream. Comments appear to be supported fully on an entry-by-entry basis only via ZipEntry. Feedback

Of course, you are not limited to files when using the GZIP or Zip librariesyou can compress anything, including data to be sent through a network connection. Feedback

Java ARchives (JARs)

The Zip format is also used in the JAR (Java ARchive) file format, which is a way to collect a group of files into a single compressed file, just like Zip. However, like everything else in Java, JAR files are cross-platform, so you dont need to worry about platform issues. You can also include audio and image files as well as class files. Feedback

JAR files are particularly helpful when you deal with the Internet. Before JAR files, your Web browser would have to make repeated requests of a Web server in order to download all of the files that make up an applet. In addition, each of these files was uncompressed. By combining all of the files for a particular applet into a single JAR file, only one server request is necessary and the transfer is faster because of compression. And each entry in a JAR file can be digitally signed for security (see Chapter 14 for an example of signing). Feedback

A JAR file consists of a single file containing a collection of zipped files along with a manifest that describes them. (You can create your own manifest file; otherwise, the jar program will do it for you.) You can find out more about JAR manifests in the JDK documentation. Feedback

The jar utility that comes with Suns JDK automatically compresses the files of your choice. You invoke it on the command line: Feedback

jar [options] destination [manifest] inputfile(s)

The options are simply a collection of letters (no hyphen or any other indicator is necessary). Unix/Linux users will note the similarity to the tar options. These are:

c

Creates a new or empty archive.

t

Lists the table of contents.

x

Extracts all files.

x file

Extracts the named file.

f

Says: Im going to give you the name of the file. If you dont use this, jar assumes that its input will come from standard input, or, if it is creating a file, its output will go to standard output.

m

Says that the first argument will be the name of the user-created manifest file.

v

Generates verbose output describing what jar is doing.

0

Only store the files; doesnt compress the files (use to create a JAR file that you can put in your classpath).

M

Dont automatically create a manifest file.

If a subdirectory is included in the files to be put into the JAR file, that subdirectory is automatically added, including all of its subdirectories, etc. Path information is also preserved.

Here are some typical ways to invoke jar:

jar cf myJarFile.jar *.class

This creates a JAR file called myJarFile.jar that contains all of the class files in the current directory, along with an automatically generated manifest file. Feedback

jar cmf myJarFile.jar myManifestFile.mf *.class

Like the previous example, but adding a user-created manifest file called myManifestFile.mf. Feedback

Adds the verbose flag to give more detailed information about the files in myJarFile.jar. Feedback

jar cvf myApp.jar audio classes image

Assuming audio, classes, and image are subdirectories, this combines all of the subdirectories into the file myApp.jar. The verbose flag is also included to give extra feedback while the jar program is working. Feedback

If you create a JAR file using the 0 (zero) option, that file can be placed in your CLASSPATH:

The jar tool isnt as useful as a zip utility. For example, you cant add or update files to an existing JAR file; you can create JAR files only from scratch. Also, you cant move files into a JAR file, erasing them as they are moved. However, a JAR file created on one platform will be transparently readable by the jar tool on any other platform (a problem that sometimes plagues zip utilities). Feedback

As you will see in Chapter 14, JAR files are also used to package JavaBeans. Feedback

Object
serialization

Javas object serialization allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be fully restored to regenerate the original object. This is even true across a network, which means that the serialization mechanism automatically compensates for differences in operating systems. That is, you can create an object on a Windows machine, serialize it, and send it across the network to a Unix machine, where it will be correctly reconstructed. You dont have to worry about the data representations on the different machines, the byte ordering, or any other details. Feedback

By itself, object serialization is interesting because it allows you to implement lightweight persistence. Remember that persistence means that an objects lifetime is not determined by whether a program is executing; the object lives inbetween invocations of the program. By taking a serializable object and writing it to disk, then restoring that object when the program is reinvoked, youre able to produce the effect of persistence. The reason its called lightweight is that you cant simply define an object using some kind of persistent keyword and let the system take care of the details (although this might happen in the future). Instead, you must explicitly serialize and deserialize the objects in your program. If you need a more serious persistence mechanism, consider Java Data Objects (JDO) or a tool like Hibernate (http://hibernate.sourceforge.net). For details, see Thinking in Enterprise Java, downloadable from www.BruceEckel.com.Feedback

Object serialization was added to the language to support two major features. Javas Remote Method Invocation (RMI) allows objects that live on other machines to behave as if they live on your machine. When sending messages to remote objects, object serialization is necessary to transport the arguments and return values. RMI is discussed in Thinking in Enterprise Java. Feedback

Object serialization is also necessary for JavaBeans, described in Chapter 14. When a Bean is used, its state information is generally configured at design time. This state information must be stored and later recovered when the program is started; object serialization performs this task. Feedback

Serializing an object is quite simple as long as the object implements the Serializable interface (this is a tagging interface and has no methods). When serialization was added to the language, many standard library classes were changed to make them serializable, including all of the wrappers for the primitive types, all of the container classes, and many others. Even Class objects can be serialized. Feedback

To serialize an object, you create some sort of OutputStream object and then wrap it inside an ObjectOutputStream object. At this point you need only call writeObject( ), and your object is serialized and sent to the OutputStream. To reverse the process, you wrap an InputStream inside an ObjectInputStream and call readObject( ). What comes back is, as usual, a reference to an upcast Object, so you must downcast to set things straight. Feedback

A particularly clever aspect of object serialization is that it not only saves an image of your object, but it also follows all the references contained in your object and saves those objects, and follows all the references in each of those objects, etc. This is sometimes referred to as the web of objects that a single object can be connected to, and it includes arrays of references to objects as well as member objects. If you had to maintain your own object serialization scheme, maintaining the code to follow all these links would be a bit mind-boggling. However, Java object serialization seems to pull it off flawlessly, no doubt using an optimized algorithm that traverses the web of objects. The following example tests the serialization mechanism by making a worm of linked objects, each of which has a link to the next segment in the worm as well as an array of references to objects of a different class, Data:

To make things interesting, the array of Data objects inside Worm are initialized with random numbers. (This way you dont suspect the compiler of keeping some kind of meta-information.) Each Worm segment is labeled with a char thats automatically generated in the process of recursively generating the linked list of Worms. When you create a Worm, you tell the constructor how long you want it to be. To make the next reference, it calls the Worm constructor with a length of one less, etc. The final next reference is left as null, indicating the end of the Worm. Feedback

The point of all this was to make something reasonably complex that couldnt easily be serialized. The act of serializing, however, is quite simple. Once the ObjectOutputStream is created from some other stream, writeObject( ) serializes the object. Notice the call to writeObject( ) for a String, as well. You can also write all the primitive data types using the same methods as DataOutputStream (they share the same interface). Feedback

There are two separate code sections that look similar. The first writes and reads a file and the second, for variety, writes and reads a ByteArray. You can read and write an object using serialization to any DataInputStream or DataOutputStream including, as you can see in Thinking in Enterprise Java, a network. The output from one run was:

You can see that the deserialized object really does contain all of the links that were in the original object. Feedback

Note that no constructor, not even the default constructor, is called in the process of deserializing a Serializable object. The entire object is restored by recovering data from the InputStream. Feedback

Object serialization is byte-oriented, and thus uses the InputStream and OutputStream hierarchies. Feedback

Finding the class

You might wonder whats necessary for an object to be recovered from its serialized state. For example, suppose you serialize an object and send it as a file or through a network to another machine. Could a program on the other machine reconstruct the object using only the contents of the file? Feedback

The best way to answer this question is (as usual) by performing an experiment. The following file goes in the subdirectory for this chapter: Feedback

Even opening the file and reading in the object mystery requires the Class object for Alien; the JVM cannot find Alien.class (unless it happens to be in the Classpath, which it shouldnt be in this example). Youll get a ClassNotFoundException. (Once again, all evidence of alien life vanishes before proof of its existence can be verified!) The JVM must be able to find the associated .class file. Feedback

Controlling
serialization

As you can see, the default serialization mechanism is trivial to use. But what if you have special needs? Perhaps you have special security issues and you dont want to serialize portions of your object, or perhaps it just doesnt make sense for one subobject to be serialized if that part needs to be created anew when the object is recovered. Feedback

You can control the process of serialization by implementing the Externalizable interface instead of the Serializable interface. The Externalizable interface extends the Serializable interface and adds two methods, writeExternal( ) and readExternal( ), that are automatically called for your object during serialization and deserialization so that you can perform your special operations. Feedback

The following example shows simple implementations of the Externalizable interface methods. Note that Blip1 and Blip2 are nearly identical except for a subtle difference (see if you can discover it by looking at the code):

The reason that the Blip2 object is not recovered is that trying to do so causes an exception. Can you see the difference between Blip1 and Blip2? The constructor for Blip1 is public, while the constructor for Blip2 is not, and that causes the exception upon recovery. Try making Blip2s constructor public and removing the //! comments to see the correct results. Feedback

When b1 is recovered, the Blip1 default constructor is called. This is different from recovering a Serializable object, in which the object is constructed entirely from its stored bits, with no constructor calls. With an Externalizable object, all the normal default construction behavior occurs (including the initializations at the point of field definition), and thenreadExternal( ) is called. You need to be aware of thisin particular, the fact that all the default construction always takes placeto produce the correct behavior in your Externalizable objects. Feedback

Heres an example that shows what you must do to fully store and retrieve an Externalizable object: Feedback

The fields s and i are initialized only in the second constructor, but not in the default constructor. This means that if you dont initialize s and i in readExternal( ), s will be null and i will be zero (since the storage for the object gets wiped to zero in the first step of object creation). If you comment out the two lines of code following the phrases You must do this and run the program, youll see that when the object is recovered, s is null and i is zero. Feedback

If you are inheriting from an Externalizable object, youll typically call the base-class versions of writeExternal( ) and readExternal( ) to provide proper storage and retrieval of the base-class components. Feedback

So to make things work correctly you must not only write the important data from the object during the writeExternal( ) method (there is no default behavior that writes any of the member objects for an Externalizable object), but you must also recover that data in the readExternal( ) method. This can be a bit confusing at first because the default construction behavior for an Externalizable object can make it seem like some kind of storage and retrieval takes place automatically. It does not. Feedback

The transient keyword

When youre controlling serialization, there might be a particular subobject that you dont want Javas serialization mechanism to automatically save and restore. This is commonly the case if that subobject represents sensitive information that you dont want to serialize, such as a password. Even if that information is private in the object, once it has been serialized, its possible for someone to access it by reading a file or intercepting a network transmission. Feedback

One way to prevent sensitive parts of your object from being serialized is to implement your class as Externalizable, as shown previously. Then nothing is automatically serialized, and you can explicitly serialize only the necessary parts inside writeExternal( ). Feedback

If youre working with a Serializable object, however, all serialization happens automatically. To control this, you can turn off serialization on a field-by-field basis using the transient keyword, which says Dont bother saving or restoring thisIll take care of it. Feedback

For example, consider a Login object that keeps information about a particular login session. Suppose that, once you verify the login, you want to store the data, but without the password. The easiest way to do this is by implementing Serializable and marking the password field as transient. Heres what it looks like:

You can see that the date and username fields are ordinary (not transient), and thus are automatically serialized. However, the password is transient, so it is not stored to disk; also, the serialization mechanism makes no attempt to recover it. The output is: Feedback

When the object is recovered, the password field is null. Note that toString( ) must check for a null value of password,because if you try to assemble a String object using the overloaded + operator, and that operator encounters a null reference, youll get a NullPointerException. (Newer versions of Java might contain code to avoid this problem.) Feedback

You can also see that the date field is stored to and recovered from disk and not generated anew. Feedback

Since Externalizable objects do not store any of their fields by default, the transient keyword is for use with Serializable objects only. Feedback

An alternative to Externalizable

If youre not keen on implementing the Externalizable interface, theres another approach. You can implement the Serializable interface and add (notice I say add and not override or implement) methods called writeObject( ) and readObject( ) that will automatically be called when the object is serialized and deserialized, respectively. That is, if you provide these two methods, they will be used instead of the default serialization. Feedback

From a design standpoint, things get really weird here. First of all, you might think that because these methods are not part of a base class or the Serializable interface, they ought to be defined in their own interface(s). But notice that they are defined as private, which means they are to be called only by other members of this class. However, you dont actually call them from other members of this class, but instead the writeObject( ) and readObject( ) methods of the ObjectOutputStream and ObjectInputStream objects call your objects writeObject( ) and readObject( ) methods. (Notice my tremendous restraint in not launching into a long diatribe about using the same method names here. In a word: confusing.) You might wonder how the ObjectOutputStream and ObjectInputStream objects have access to private methods of your class. We can only assume that this is part of the serialization magic. Feedback

In any event, anything defined in an interface is automatically public so if writeObject( ) and readObject( ) must be private, then they cant be part of an interface. Since you must follow the signatures exactly, the effect is the same as if youre implementing an interface. Feedback

It would appear that when you call ObjectOutputStream.writeObject( ), the Serializable object that you pass it to is interrogated (using reflection, no doubt) to see if it implements its own writeObject( ). If so, the normal serialization process is skipped and the writeObject( ) is called. The same sort of situation exists for readObject( ). Feedback

Theres one other twist. Inside your writeObject( ), you can choose to perform the default writeObject( ) action by calling defaultWriteObject( ). Likewise, inside readObject( ) you can call defaultReadObject( ). Here is a simple example that demonstrates how you can control the storage and retrieval of a Serializable object:

In this example, one String field is ordinary and the other is transient, to prove that the non-transient field is saved by the defaultWriteObject( ) method and the transient field is saved and restored explicitly. The fields are initialized inside the constructor rather than at the point of definition to prove that they are not being initialized by some automatic mechanism during deserialization. Feedback

If you are going to use the default mechanism to write the non-transient parts of your object, you must call defaultWriteObject( ) as the first operation in writeObject( ), and defaultReadObject( ) as the first operation in readObject( ). These are strange method calls. It would appear, for example, that you are calling defaultWriteObject( ) for an ObjectOutputStream and passing it no arguments, and yet it somehow turns around and knows the reference to your object and how to write all the non-transient parts. Spooky. Feedback

The storage and retrieval of the transient objects uses more familiar code. And yet, think about what happens here. In main( ), a SerialCtl object is created, and then its serialized to an ObjectOutputStream. (Notice in this case that a buffer is used instead of a fileits all the same to the ObjectOutputStream.) The serialization occurs in the line:

o.writeObject(sc);

The writeObject( ) method must be examining sc to see if it has its own writeObject( ) method. (Not by checking the interfacethere isnt oneor the class type, but by actually hunting for the method using reflection.) If it does, it uses that. A similar approach holds true for readObject( ). Perhaps this was the only practical way that they could solve the problem, but its certainly strange. Feedback

Versioning

Its possible that you might want to change the version of a serializable class (objects of the original class might be stored in a database, for example). This is supported, but youll probably do it only in special cases, and it requires an extra depth of understanding that we will not attempt to achieve here. The JDK documents downloadable from java.sun.com cover this topic quite thoroughly. Feedback

You will also notice in the JDK documentation many comments that begin with:

Warning: Serialized objects of this class will not be compatible with future Swing releases. The current serialization support is appropriate for short term storage or RMI between applications ...

This is because the versioning mechanism is too simple to work reliably in all situations, especially with JavaBeans. Theyre working on a correction for the design, and thats what the warning is about. Feedback

Using persistence

Its quite appealing to use serialization technology to store some of the state of your program so that you can easily restore the program to the current state later. But before you can do this, some questions must be answered. What happens if you serialize two objects that both have a reference to a third object? When you restore those two objects from their serialized state, do you get only one occurrence of the third object? What if you serialize your two objects to separate files and deserialize them in different parts of your code? Feedback

One thing thats interesting here is that its possible to use object serialization to and from a byte array as a way of doing a deep copy of any object thats Serializable. (A deep copy means that youre duplicating the entire web of objects, rather than just the basic object and its references.) Object copying is covered in depth in Appendix A. Feedback

Animal objects contain fields of type House. In main( ), a List of these Animals is created and it is serialized twice to one stream and then again to a separate stream. When these are deserialized and printed, you see the following results for one run (the objects will be in different memory locations each run): Feedback

Of course you expect that the deserialized objects have different addresses from their originals. But notice that in animals1 and animals2, the same addresses appear, including the references to the House object that both share. On the other hand, when animals3 is recovered, the system has no way of knowing that the objects in this other stream are aliases of the objects in the first stream, so it makes a completely different web of objects. Feedback

As long as youre serializing everything to a single stream, youll be able to recover the same web of objects that you wrote, with no accidental duplication of objects. Of course, you can change the state of your objects in between the time you write the first and the last, but thats your responsibility; the objects will be written in whatever state they are in (and with whatever connections they have to other objects) at the time you serialize them. Feedback

The safest thing to do if you want to save the state of a system is to serialize as an atomic operation. If you serialize some things, do some other work, and serialize some more, etc., then you will not be storing the system safely. Instead, put all the objects that comprise the state of your system in a single container and simply write that container out in one operation. Then you can restore it with a single method call as well. Feedback

The following example is an imaginary computer-aided design (CAD) system that demonstrates the approach. In addition, it throws in the issue of static fields; if you look at the JDK documentation youll see that Class is Serializable, so it should be easy to store the static fields by simply serializing the Class object. That seems like a sensible approach, anyway. Feedback

The Shape class implements Serializable, so anything that is inherited from Shape is automatically Serializable as well. Each Shape contains data, and each derived Shape class contains a static field that determines the color of all of those types of Shapes. (Placing a static field in the base class would result in only one field, since static fields are not duplicated in derived classes.) Methods in the base class can be overridden to set the color for the various types (static methods are not dynamically bound, so these are normal methods). The randomFactory( ) method creates a different Shape each time you call it, using random values for the Shape data. Feedback

Circle and Square are straightforward extensions of Shape; the only difference is that Circle initializes color at the point of definition and Square initializes it in the constructor. Well leave the discussion of Line for later. Feedback

In main( ), one ArrayList is used to hold the Class objects and the other to hold the shapes. If you dont provide a command-line argument, the shapeTypesArrayList is created and the Class objects are added, and then the shapesArrayList is created and Shape objects are added. Next, all the staticcolor values are set to GREEN, and everything is serialized to the file CADState.out. Feedback

If you provide a command-line argument (presumably CADState.out), that file is opened and used to restore the state of the program. In both situations, the resulting ArrayList of Shapes is printed. The results from one run are:

You can see that the values of xPos, yPos, and dim were all stored and recovered successfully, but theres something wrong with the retrieval of the static information. Its all 3 going in, but it doesnt come out that way. Circles have a value of 1 (RED, which is the definition), and Squares have a value of 0 (remember, they are initialized in the constructor). Its as if the statics didnt get serialized at all! Thats righteven though class Class is Serializable, it doesnt do what you expect. So if you want to serialize statics, you must do it yourself. Feedback

This is what the serializeStaticState( ) and deserializeStaticState( )static methods in Line are for. You can see that they are explicitly called as part of the storage and retrieval process. (Note that the order of writing to the serialize file and reading back from it must be maintained.) Thus to make CADState.java run correctly, you must: Feedback

Add a serializeStaticState( ) and
deserializeStaticState( ) to the shapes.

Remove the ArrayListshapeTypes and all code related to
it.

Add calls to the new serialize and deserialize static methods in the shapes.
Feedback

Another issue you might have to think about is security, since serialization also saves private data. If you have a security issue, those fields should be marked as transient. But then you have to design a secure way to store that information so that when you do a restore you can reset those private variables. Feedback

Preferences

JDK 1.4 introduced the Preferences API, which is much closer to persistence than object serialization because it automatically stores and retrieves your information. However, its use is restricted to small and limited data setsyou can only hold primitives and Strings, and the length of each stored String cant be longer than 8K (not tiny, but you dont want to build anything serious with it, either). As the name suggests, the Preferences API is designed to store and retrieve user preferences and program-configuration settings. Feedback

Preferences are key-value sets (like Maps) stored in a hierarchy of nodes. Although the node hierarchy can be used to create complicated structures, its typical to create a single node named after your class and store the information there. Heres a simple example: Feedback

Here, userNodeForPackage( ) is used, but you could also choose systemNodeForPackage( ); the choice is somewhat arbitrary, but the idea is that user is for individual user preferences, and system is for general installation configuration. Since main( ) is static, PreferencesDemo.class is used to identify the node, but inside a non-static method, youll usually use getClass( ). You dont need to use the current class as the node identifier, but thats the usual practice. Feedback

Once you create the node, its available for either loading or reading data. This example loads the node with various types of items and then gets the keys( ). These come back as a String[], which you might not expect if youre used to keys( ) in the collections library. Here, theyre converted to a List that is used to produce an Iterator for printing the keys and values. Notice the second argument to get( ). This is the default value that is produced if there isnt any entry for that key value. While iterating through a set of keys, you always know theres an entry, so using null as the default is safe, but normally youll be fetching a named key, as in: Feedback

prefs.getInt("Companions", 0));

In the normal case, youll want to provide a reasonable default value. In fact, a typical idiom is seen in the lines:

This way, the first time you run the program, the UsageCount will be zero, but on subsequent invocations it will be nonzero. Feedback

When you run PreferencesDemo.java youll see that the UsageCount does indeed increment every time you run the program, but where is the data stored? Theres no local file that appears after the program is run the first time. The Preferences API uses appropriate system resources to accomplish its task, and these will vary depending on the OS. In Windows, the registry is used (since its already a hierarchy of nodes with key-value pairs). But the whole point is that the information is magically stored for you so that you dont have to worry about how it works from one system to another. Feedback

Theres more to the Preferences API than shown here. Consult the JDK documentation, which is fairly understandable, for further details. Feedback

Regular expressions

To finish this chapter, well look at regular expressions, which were added in JDK 1.4 but have been integral to standard Unix utilities like sed and awk, and languages like Python and Perl (some would argue that they are predominant reason for Perls success). Technically, these are string manipulation tools (previously delegated to the String, StringBuffer, and StringTokenizer classes in Java), but they are typically used in conjunction with I/O, so its not too far-fetched to include them here.[66]Feedback

Regular expressions are powerful and flexible text-processing tools. They allow you to specify, programmatically, complex patterns of text that can be discovered in an input string. Once you discover these patterns, you can then react to them any way you want. Although the syntax of regular expressions can be intimidating at first, they provide a compact and dynamic language that can be employed to solve all sorts of string processing, matching and selection, editing, and verification problems in a completely general way. Feedback

Creating regular
expressions

You can begin learning regular expressions with a useful subset of the possible constructs. A complete list of constructs for building regular expressions can be found in the javadocs for the Pattern class for package java.util.regex. Feedback

Characters

B

The specific character B

\xhh

Character with hex value 0xhh

\uhhhh

The Unicode character with hex representation 0xhhhh

\t

Tab

\n

Newline

\r

Carriage return

\f

Form feed

\e

Escape

The power of regular expressions begins to appear when defining character classes. Here are some typical ways to create character classes, and some predefined classes: Feedback

If you have any experience with regular expressions in other languages, youll immediately notice a difference in the way backslashes are handled. In other languages, \\ means I want to insert a plain old (literal) backslash in the regular expression. Dont give it any special meaning. In Java, \\ means Im inserting a regular expression backslash, so the following character has special meaning. For example, if you want to indicate one or more word characters, your regular expression string will be \\w+. If you want to insert a literal backslash, you say \\\\. However, things like newlines and tabs just use a single backslash: \n\t. Feedback

Whats shown here is only a sampling; youll want to have the java.util.regex.Pattern JDK documentation page bookmarked or on your Start menu so you can easily access all the possible regular expression patterns. Feedback

Logical Operators

XY

X followed by Y

X|Y

X or Y

(X)

A capturing group. You can refer to the ithcaptured group later in the expression with \i

Boundary Matchers

^

Beginning of a line

$

End of a line

\b

Word boundary

\B

Non-word boundary

\G

End of the previous match

As an example, each of the following represent valid regular expressions, and all will successfully match the character sequence "Rudolph":

Rudolph
[rR]udolph
[rR][aeiou][a-z]ol.*
R.*

Quantifiers

A quantifier describes the way that a pattern absorbs input text:

Greedy: Quantifiers
are greedy unless otherwise altered. A greedy expression finds as many possible
matches for the pattern as possible. A typical cause of problems is to assume
that your pattern will only match the first possible group of characters, when
its actually greedy and will keep going. Feedback

Reluctant: Specified
with a question mark, this quantifier matches the minimum necessary number of
characters to satisfy the pattern. Also called lazy, minimal
matching, non-greedy,or ungreedy. Feedback

Possessive: Currently
only available in Java (not in other languages), and it is more advanced, so you
probably wont use it right away. As a regular expression is applied to a
string, it generates many states so that it can backtrack if the match fails.
Possessive quantifiers do not keep those intermediate states, and thus prevent
backtracking. They can be used to prevent a regular expression from running away
and also to make it execute more efficiently. Feedback

Greedy

Reluctant

Possessive

Matches

X?

X??

X?+

X, one or none

X*

X*?

X*+

X, zero or more

X+

X+?

X++

X, one or more

X{n}

X{n}?

X{n}+

X, exactly n times

X{n,}

X{n,}?

X{n,}+

X, at least n times

X{n,m}

X{n,m}?

X{n,m}+

X, at least n but not more than m times

You should be very aware that the expression X will often need to be surrounded in parentheses for it to work the way you desire. For example:

abc+

Might seem like it would match the sequence abc one or more times, and if you apply it to the input string abcabcabc, you will in fact get three matches. However, the expression actually says match ab followed by one or more occurrences of c. To match the entire string abc one or more times, you must say:

(abc)+

You can easily be fooled when using regular expressions; its a new language, on top of Java. Feedback

CharSequence

JDK 1.4 defines a new interface called CharSequence, which establishes a definition of a character sequence abstracted from the String or StringBuffer classes:

The String, StringBuffer, and CharBuffer classes have been modified to implement this new CharSequence interface. Many regular expression operations take CharSequence arguments. Feedback

Pattern and Matcher

As a first example, the following class can be used to test regular expressions against an input string. The first argument is the input string to match against, followed by one or more regular expressions to be applied to the input. Under Unix/Linux, the regular expressions must be quoted on the command line. Feedback

This program can be useful in testing regular expressions as you construct them to see that they produce your intended matching behavior.

Regular expressions are implemented in Java through the Pattern and Matcher classes in the package java.util.regex. A Pattern object represents a compiled version of a regular expression. The static compile( ) method compiles a regular expression string into a Pattern object. As seen in the preceding example, you can use the matcher( ) method and the input string to produce a Matcher object from the compiled Pattern object. Pattern also has a

staticboolean ( regex, input)

for quickly discerning if regex can be found in input, and a split( ) method that produces an array of String that has been broken around matches of the regex. Feedback

A Matcher object is generated by calling Pattern.matcher( ) with the input string as an argument. The Matcher object is then used to access the results, using methods to evaluate the success or failure of different types of matches:

The pattern \\w+ indicates one or more word characters, so it will simply split up the input into words. find( ) is like an iterator, moving forward through the input string. However, the second version of find( ) can be given an integer argument that tells it the character position for the beginning of the searchthis version resets the search position to the value of the argument, as you can see from the output. Feedback

Groups

Groups are regular expressions set off by parentheses that can be called up later with their group number. Group zero indicates the whole expression match, group one is the first parenthesized group, etc. Thus in

A(B(C))D

there are three groups: Group 0 is ABCD, group 1 is BC, and group 2 is C. Feedback

The Matcher object has methods to give you information about groups:

public int groupCount( ) returns the number of groups in this matcher's pattern. Group zero is not included in this count.

public String group( ) returns group zero (the entire match) from the previous match operation (find( ), for example).

public String group(int i) returns the given group number during the previous match operation. If the match was successful, but the group specified failed to match any part of the input string, then null is returned.

public int start(int group) returns the start index of the group found in the previous match operation.

public int end(int group) returns the index of the last character, plus one, of the group found in the previous match operation. Feedback

The poem is the first part of Lewis Carrolls Jabberwocky, from Through the Looking Glass. You can see that the regular expression pattern has a number of parenthesized groups, consisting of any number of non-whitespace characters (\S+) followed by any number of whitespace characters (\s+). The goal is to capture the last three words on each line; the end of a line is delimited by $. However, the normal behavior is to match $ with the end of the entire input sequence, so we must explicitly tell the regular expression to pay attention to newlines within the input. This is accomplished with the (?m) pattern flag at the beginning of the sequence (pattern flags will be shown shortly). Feedback

start( ) and end( )

Following a successful matching operation, start( ) returns the start index of the previous match, and end( ) returns the index of the last character matched, plus one. Invoking either start( ) or end( ) following an unsuccessful matching operation (or prior to a matching operation being attempted) produces an IllegalStateException. The following program also demonstrates matches( ) and lookingAt( ): Feedback

Notice that find( ) will locate the regular expression anywhere in the input, but lookingAt( ) and matches( ) only succeed if the regular expression starts matching at the very beginning of the input. While matches( ) only succeeds if the entire input matches the regular expression, lookingAt( )[67] succeeds if only the first part of the input matches. Feedback

Two characters will be considered to match if, and only if, their full canonical decompositions match. The expression a\u030A, for example, will match the string ? when this flag is specified. By default, matching does not take canonical equivalence into account.

Pattern.CASE_INSENSITIVE(?i)

By default, case-insensitive matching assumes that only characters in the US-ASCII character set are being matched. This flag allows your pattern to match without regard to case (upper or lower). Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.

Pattern.COMMENTS(?x)

In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line. Unix lines mode can also be enabled via the embedded flag expression.

Pattern.DOTALL(?s)

In dotall mode, the expression . matches any character, including a line terminator. By default, the . expression does not match line terminators.

Pattern.MULTILINE(?m)

In multiline mode, the expressions ^ and $ match the beginning and ending of a line, respectively. ^ also matches the beginning of the input string, and $ also matches the end of the input string. By default, these expressions only match at the beginning and the end of the entire input string.

Pattern.UNICODE_CASE(?u)

When this flag is specified, case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII character set are being matched.

Pattern.UNIX_LINES(?d)

In this mode, only the \n line terminator is recognized in the behavior of ., ^, and $.

Particularly useful among these flags are Pattern.CASE_INSENSITIVE, Pattern.MULTILINE, and Pattern.COMMENTS (which is helpful for clarity and/or documentation). Note that the behavior of most of the flags can also be obtained by inserting the parenthesized characters, shown in the table beneath the flags, into your regular expression preceding the place where you want the mode to take effect. Feedback

You can combine the effect of these and other flags through an "OR" (|) operation:

This creates a pattern that will match lines starting with java, Java, JAVA, etc., and attempt a match for each line within a multiline set (matches starting at the beginning of the character sequence and following each line terminator within the character sequence). Note that the group( ) method only produces the matched portion. Feedback

split( )

Splitting divides an input string into an array of String objects, delimited by the regular expression.

The second form of split( ) limits the number of splits that occur. Feedback

Notice that regular expressions are so valuable that some operations have also been added to the String class, including split( ) (shown here), matches( ), replaceFirst( ), and replaceAll( ). These behave like their Pattern and Matcher counterparts. Feedback

Replace operations

Regular expressions become especially useful when you begin replacing text. Here are the available methods:

replaceFirst(String replacement) replaces the first matching part of the input string with replacement. Feedback

replaceAll(String replacement) replaces every matching part of the input string with replacement. Feedback

appendReplacement(StringBuffer sbuf, String replacement) performs step-by-step replacements into sbuf, rather than replacing only the first one or all of them, as in replaceFirst( ) and replaceAll( ), respectively. This is a very important method, because it allows you to call methods and perform other processing in order to produce replacement (replaceFirst( ) and replaceAll( ) are only able to put in fixed strings). With this method, you can programmatically pick apart the groups and create powerful replacements. Feedback

appendTail(StringBuffer sbuf, String replacement) is invoked after one or more invocations of the appendReplacement( ) method in order to copy the remainder of the input string. Feedback

Heres an example that shows the use of all the replace operations. In addition, the block of commented text at the beginning is extracted and processed with regular expressions for use as input in the rest of the example:

The file is opened and read using the TextFile.read( ) method introduced earlier in this chapter. mInput is created to match all the text (notice the grouping parentheses) between /*! and !*/. Then, more than two spaces are reduced to a single space, and any space at the beginning of each line is removed (in order to do this on all lines and not just the beginning of the input, multiline mode must be enabled). These two replacements are performed with the equivalent (but more convenient, in this case) replaceAll( ) thats part of String. Note that since each replacement is only used once in the program, theres no extra cost to doing it this way rather than precompiling it as a Pattern. Feedback

replaceFirst( ) only performs the first replacement that it finds. In addition, the replacement strings in replaceFirst( ) and replaceAll( ) are just literals, so if you want to perform some processing on each replacement they dont help. In that case, you need to use appendReplacement( ), which allows you to write any amount of code in the process of performing the replacement. In the preceding example, a group( ) is selected and processedin this situation, setting the vowel found by the regular expression to upper caseas the resulting sbuf is being built. Normally, you would step through and perform all the replacements and then call appendTail( ), but if you wanted to simulate replaceFirst( ) (or replace n), you would just do the replacement one time and then call appendTail( ) to put the rest into sbuf. Feedback

appendReplacement( ) also allows you to refer to captured groups directly in the replacement string by saying $g where g is the group number. However, this is for simpler processing and wouldnt give you the desired results in the preceding program. Feedback

reset( )

An existing Matcher object can be applied to a new character sequence Using the reset( ) methods:

reset( ) without any arguments sets the Matcher to the beginning of the current sequence. Feedback

Regular expressions and
Java I/O

Most of the examples so far have shown regular expressions applied to static strings. The following example shows one way to apply regular expressions to search for matches in a file. Inspired by Unixs grep, JGrep.java takes two arguments: a filename and the regular expression that you want to match. The output shows each line where a match occurs and the match position(s) within the line. Feedback

The file is opened as a TextFile object (these were introduced earlier in this chapter). Since a TextFile contains the lines of the file in an ArrayList, from that array a ListIterator is produced. The result is an iterator that will allow you to move through the lines of the file (forward and backward). Feedback

Each input line is used to produce a Matcher, and the result is scanned with find( ). Note that the ListIterator.nextIndex( ) keeps track of the line numbers. Feedback

The test arguments open the JGrep.java file to read as input, and search for words starting with [Ssct]. Feedback

Is StringTokenizer
needed?

The new capabilities provided with regular expressions might prompt you to wonder whether the original StringTokenizer class is still necessary. Before JDK 1.4, the way to split a string into parts was to tokenize it with StringTokenizer. But now its much easier and more succinct to do the same thing with regular expressions:

With regular expressions, you can also split a string into parts using more complex patternssomething thats much more difficult with StringTokenizer. It seems safe to say that regular expressions replace any tokenizing classes in earlier versions of Java. Feedback

You can learn much more about regular expressions in Mastering Regular Expressions, 2nd Edition, by Jeffrey E. F. Friedl (OReilly, 2002). Feedback

Summary

The Java I/O stream library does satisfy the basic requirements: you can perform reading and writing with the console, a file, a block of memory, or even across the Internet. With inheritance, you can create new types of input and output objects. And you can even add a simple extensibility to the kinds of objects a stream will accept by redefining the toString( ) method thats automatically called when you pass an object to a method thats expecting a String (Javas limited automatic type conversion). Feedback

There are questions left unanswered by the documentation and design of the I/O stream library. For example, it would have been nice if you could say that you want an exception thrown if you try to overwrite a file when opening it for outputsome programming systems allow you to specify that you want to open an output file, but only if it doesnt already exist. In Java, it appears that you are supposed to use a File object to determine whether a file exists, because if you open it as a FileOutputStream or FileWriter,it will always get overwritten. Feedback

The I/O stream library brings up mixed feelings; it does much of the job and its portable. But if you dont already understand the decorator pattern, the design is not intuitive, so theres extra overhead in learning and teaching it. Its also incomplete; for example, I shouldnt have to write utilities like TextFile, and theres no support for the kind of output formatting that virtually every other languages I/O package supports. Feedback

However, once you do understand the decorator pattern and begin using the library in situations that require its flexibility, you can begin to benefit from this design, at which point its cost in extra lines of code may not bother you as much. Feedback

If you do not find what youre looking for in this chapter (which has only been an introduction and is not meant to be comprehensive), you can find in-depth coverage in Java I/O, by Elliotte Rusty Harold (OReilly, 1999). Feedback

Exercises

Solutions to selected exercises can be found in the electronic document The Thinking in Java Annotated Solution Guide, available for a small fee from www.BruceEckel.com.

Open a text file so that you can read the file one line at a time. Read
each line as a String and place that String object into a
LinkedList. Print all of the lines in the LinkedList in reverse
order. Feedback

Modify Exercise 1 so that the name of the file you read is provided as a
command-line argument. Feedback

Modify Exercise 2 to also open a text file so you can write text into it.
Write the lines in the ArrayList, along with line numbers (do not attempt
to use the LineNumber classes), out to the file. Feedback

Modify Exercise 2 to force all the lines in the ArrayList to
uppercase and send the results to System.out. Feedback

Modify Exercise 2 to take additional command-line arguments of words to
find in the file. Print all lines in which any of the words match. Feedback

Modify DirList.java so that the FilenameFilter actually opens
each file and accepts the file based on whether any of the trailing arguments on
the command line exist in that file. Feedback

Modify DirList.java to produce all the file names in the current
directory and subdirectories that satisfy the given regular expression.
Hint: use recursion to traverse the subdirectories.

Create a class called SortedDirList with a constructor that takes
file path information and builds a sorted directory list from the files at that
path. Create two overloaded list( ) methods that will either produce
the whole list or a subset of the list based on an argument. Add a
size( ) method that takes a file name and produces the size of that
file. Feedback

Modify WordCount.java so that it produces an alphabetic sort
instead, using the tool from Chapter 11. Feedback

Modify WordCount.java so that it uses a class containing a
String and a count value to store each different word, and a Set
of these objects to maintain the list of words. Feedback

Modify IOStreamDemo.java so that it uses LineNumberReader to
keep track of the line count. Note that its much easier to just keep
track programmatically. Feedback

Starting with section 4 of IOStreamDemo.java, write a program that
compares the performance of writing to a file when using buffered and unbuffered
I/O. Feedback

Modify section 5 of IOStreamDemo.java to eliminate the spaces in the
line produced by the first call to in5.readUTF( ). Feedback

In Blips.java, copy the file and rename it to BlipCheck.java
and rename the class Blip2 to BlipCheck (making it
public and removing the public scope from the class Blips in the
process). Remove the //! marks in the file and execute the program
including the offending lines. Next, comment out the default constructor for
BlipCheck. Run it and explain why it works. Note that after compiling,
you must execute the program with java Blips because the
main( ) method is still in class Blips. Feedback

In Blip3.java, comment out the two lines after the phrases
You must do this: and run the program. Explain the result and why
it differs from when the two lines are in the program. Feedback

(Intermediate) In Chapter 8, locate the GreenhouseController.java
example, which consists of four files. GreenhouseController contains a
hard-coded set of events. Change the program so that it reads the events and
their relative times from a text file. (Challenging: use a design patterns
factory method to build the eventssee Thinking in Patterns
(with Java) at www.BruceEckel.com.) Feedback

Create and test a utility method to print the contents of a
CharBuffer up to the point where the characters are no longer printable.
Feedback

Experiment with changing the ByteBuffer.allocate( ) statements
in the examples in this chapter to ByteBuffer.allocateDirect( ).
Demonstrate performance differences, but also notice whether the startup time of
the programs noticeably changes. Feedback

For the phrase Java now has regular expressions evaluate
whether the following expressions will find a match: Feedback

Modify JGrep.java to accept a directory name or a file name as
argument (if a directory is provided, search should include all files in the
directory). Hint: you can generate a list of filenames with: Feedback

[62] Its not clear that this was a good design decision, especially compared to the simplicity of I/O libraries in other languages. But its the justification for the decision.

[63] XML is another way to solve the problem of moving data across different computing platforms, and does not depend on having Java on all platforms. JDK 1.4 contains XML tools in javax.xml.* libraries. These are covered in Thinking in Enterprise Java, at www.MindView.net.

[64] Chapter 13 shows an even more convenient solution for this: a GUI program with a scrolling text area.

[66] A chapter dedicated to strings will have to wait until the 4th edition. Mike Shea contributed to this section.

[67] I have no idea how they came up with this method name, or what its supposed to refer to. But its reassuring to know that whoever comes up with nonintuitive method names is still employed at Sun. And that their apparent policy of not reviewing code designs is still in place. Sorry for the sarcasm, but this kind of thing gets tiresome after a few years.