Working with Streams in Java

A Java program uses a stream to either read data items from a source or to write data items to a destination. This article by Java expert Jeff Friesen discusses the concept of streams, and shows how to work with some of the more commonly used stream classes.

This article is excerpted from Java 2 By Example, Second Edition (Que, 2001), by Jeff Friesen.

This chapter is from the book

This chapter is from the book

A Java program uses a stream to either read data items from a source or to
write data items to a destination. Think of a stream as a conduit by
which a sequence of bytes flows from a source to specific program code or from
specific program code to a destination. That conduit can be likened to a wire on
which an electrical current flows, or to a river of water on which boats and
barrels float. Stream sources include files, memory buffers, network sockets,
threads, and other streams. Stream destinations include the same entities as
stream sources, and other entities (such as printers). When a stream of data
items flows from a source, that stream is referred to as an input stream.
Similarly, when a stream of data items flows to a destination, that stream is
referred to as an output stream. Input and output streams are illustrated in
Figure 1.

Figure 1 Data items flow from a source to specific program code over an
input stream, and flow from specific program code to a destination over an
output stream.

Java divides streams into input and output categories. Java also divides
streams into byte-oriented and character-oriented categories. The basic unit of
a byte-oriented stream is a byte and the basic unit of a character-oriented
stream is a Unicode character.

All byte-oriented input streams are created from objects whose classes derive
from the abstract InputStream class, and all character-oriented input
streams are created from objects whose classes derive from the abstract
Reader class. Those classes share several methods in common, including
a close() method and a no-argument read() method. Similarly,
all byte-oriented output streams are created from objects whose classes derive
from the abstract OutputStream class, and all character-oriented output
streams are created from objects whose classes derive from the abstract
Writer class. As with the InputStream and Reader
classes, OutputStream and Writer share methods in common (such
as close() and flush). Each class is located in the
java.io package.

NOTE

InputStream's and Reader's read()
methods are designed to block (wait) for input if data is not available when
either of those methods is called. InputStream declares an
available() method that can be called to return an integer identifying
the number of bytes that can be read without blocking. Reader has no
such method.

File Stream Classes

If you need to work with files in either a sequential-access or a
random-access manner, you can use the RandomAccessFile class. However,
the intent of the RandomAccessFile class is for its objects to
manipulate record-oriented flat-file databases. If you are interested in reading
an image's bytes, reading the contents of a text file, writing some
configuration information to a file, and so forth, you would not use
RandomAccessFile. Instead, you would work with various file stream
classes: FileInputStream, FileReader,
FileOutputStream, and FileWriter. (Those classes are located
in the java.io package).

TIP

Use the FileInputStream and FileOutputStream classes to
read/write binary data from/to image files, sound files, video files,
configuration files and so on. Also, those classes can be used to read/write
ASCII-based text files. To read/write modern Unicode-based text files, use
FileReader and FileWriter.

The file stream classes include constructors for creating input and output
byte-oriented or character-oriented streams that are connected to files opened
or created by those constructors. If an input stream constructor cannot find a
file to open for input, it will throw a FileNotFoundException object.
Similarly, if an output stream constructor cannot create a file (because of bad
path information, or for some other reason), it will throw an
IOException object.

Because of the various exceptions thrown by their constructors and methods,
the file stream classes might seem difficult to use. However, if you follow a
pattern similar to the usage pattern that the Copy source code in
Listing 1 demonstrates, you should not have trouble.

As its name suggests, Copy is an application that copies data from
one file to another. Copy copies bytes from a file identified by a
source path to a file identified by a destination path. For example, to copy all
bytes contained in Copy.java to Copy.bak, issue the following
command line: java Copy Copy.java Copy.bak.

Notice the pattern that Copy's source code uses when working
with files. First, because Copy is designed to copy byte-oriented
streams instead of character-oriented streams, Copy declares a pair of
FileInputStream and FileOutputStream reference variables, and
initializes those variables to null. Within a Try statement,
Copy attempts to create FileInputStream and
FileOutputStream objects. The FileInputStream constructor
throws a FileNotFoundException object if it cannot locate the source
file and the FileOutputStream constructor throws an
IOException object if it is given bad path information to a destination
file. Assuming both constructors succeed, a While loop statement repeatedly
calls FileInputStream's read() method to read the next
byte, and FileOutputStream's write() method to write that
byte. The read() method continues to read bytes until end-of-file is
encountered. At that time, read() returns -1, and the loop ends.
Regardless of whether or not an exception is thrown, the Finally clause
executes last. By using If decision statements, it checks that
FileInputStream and FileOutputStream objects were created. If
one or both of those objects was created, the object's close()
method is called to close the underlying file. Because close() throws
an IOException object if the underlying file is not open, it is
necessary to place close() method calls within their own Try
statements. If you follow a pattern similar to what you have just read, you
should not experience trouble when working with the file stream classes.

TIP

The FileOutputStream and FileWriter constructors typically
erase existing files when creating files. However, it is possible to append
bytes or characters to existing files by calling the FileOutputStream(String
name, boolean append) and FileWriter(String name, boolean append)
constructors, respectively, with true as the value of the append
argument.

Buffered Stream Classes

Failing to buffer I/O operations is the leading cause of poor I/O
performance. That is not surprising when you consider that disk drives
efficiently read and write large aggregates of bytes but are not very efficient
when it comes to reading and writing small byte aggregates. Because most of
Java's stream classes do not buffer their read and write operations, stream
objects are prone to poor I/O performance.

I/O performance can be radically improved by grouping individual bytes (or
characters) into aggregates before performing a write operation or reading a
large group of bytes (or characters) and returning those bytes (or characters)
on an individual basis from a buffer. That is the goal behind Java's
BufferedInputStream, BufferedReader,
BufferedOutputStream, and BufferedWriter classes. (Those
classes are located in the java.io package.)

BufferedInputStream and BufferedReader objects represent
buffered input streams that are chained to other input streams so that bytes (or
characters) can flow from those other streams into buffered input streams. The
following code fragment demonstrates that input stream chaining.

The code fragment creates a FileInputStream object and chains, to
that object, a BufferedInputStream object, by passing the
FileInputStream object's reference to the
BufferedInputStream constructor. The resulting
BufferedInputStream object's reference assigns to bis.
When bis.read() is called, that read() method checks an
internal buffer (associated with the BufferedInputStream object
assigned to bis) for at least one byte that can be returned. If a byte
exists in that buffer, bis.read() immediately returns. Otherwise,
bis.read() internally calls fis.read(byte [] buffer, int offset,
int length) to read a large chunk of bytes into the bis
object's internal buffer. As long as bis.read() does not have to
call fis.read(byte [] buffer, int offset, int length), performance is
fast. When bis.read() must call fis.read(byte [] buffer, int
offset, int length), performance slows down somewhat, because
fis.read(byte [] buffer, int offset, int length) must access the disk
drive. However, reading a large chunk of bytes via the fis.read(byte []
buffer, int offset, int length) method call is faster than performing many
individual no-argument fis.read() method calls. Therefore, a
bis.read() method call is considerably faster than calls to
fis.read().

NOTE

To be fair, many platforms buffer data that is to be read from or written to
a file. Therefore, the file stream classes do have some sort of buffering at
their disposal. However, not all devices that support Java will buffer data at a
platform level. Therefore, it is not a good idea to rely on such support.
Instead, you should get into the habit of writing code that relies on the
buffered stream classes.

BufferedOutputStream and BufferedWriter objects represent
buffered output streams that are chained to other output streams so that bytes
(or characters) can flow from buffered output streams to those other streams.
The following code fragment demonstrates that output stream chaining.

The code fragment creates a FileOutputStream object and chains, to
that object, a BufferedOutputStream object, by passing the
FileOutputStream object's reference to the
BufferedOutputStream constructor. The resulting
BufferedOutputStream object's reference assigns to bos.
When bos.write ('A'); executes, that method call appends
'A' to the contents of an internal buffer (associated with
the BufferedOutputStream object assigned to bos). After that
buffer fills, bos.write() calls fos.write() to write the
entire buffer to the disk. Because fewer (but larger) writes are made to a disk,
performance improves.

The Copy application in Listing 1 was not as efficient as it could
have been. By adding support for buffering, Copy can become faster.
Listing 2 introduces a BufferedCopy application that uses the
BufferedInputStream and BufferedOutputStream classes to
support buffering.

There is one interesting item to note about BufferedCopy's
source code: bis.close() and bos.close() appear instead of
fis.close() and fos.close(). All of the stream classes thus
far presented contain close() methods. When you chain a buffered stream
to a file stream, you might not know which close() method to call. The
answer, as demonstrated by BufferedCopy, is to call the
close() method on the stream that chains itself to another stream. In
BufferedCopy, those methods are bis.close() and
bos.close().

NOTE

The BufferedInputStream and BufferedReader classes support
the capabilities of marking a particular point in a stream and coming back to
that point to reread a sequence of bytes (or characters). Those capabilities
manifest by way of the mark() and reset() methods. Use
mark() to "remember" a point in the input stream and
reset() to cause all bytes that have been read since the most recent
mark operation to be reread, before new bytes are read from the stream to which
the buffered input stream is chained.

Because the mark() and reset() methods are declared in
InputStream and Reader, you might think every class supports
those methods. However, that is not the case. Although
BufferedInputStream and BufferedReader support mark()
and reset(), many other input streams do not. Before calling those
methods, find out if an input stream supports mark() and
reset(), by calling the markSupported() method. If an input
stream supports the mark() and reset() methods,
markSupported() returns true.

Data Stream Classes

A problem with the FileInputStream and FileOutputStream
classes is that they only work at the byte level. What do you do when you need
to read integers, write floating-point values, and read or write some other
non-byte value from/to a file? The answer is to use Java's
DataInputStream and DataOutputStream classes (located in the
java.io package portion of Java's standard class library).

As with the buffered stream classes, the data stream classes are designed so
that their objects can be chained to other streams. However, you can only chain
data stream objects to byte-oriented streams. For example, you can chain a data
input stream to a FileInputStream object and call the data input
stream's methods to read integer, floating-point, and other data items, but
you cannot directly chain a data input stream object to a FileReader
object.

For a glimpse of using DataOutputStream and DataInputStream
to write and read non-byte-oriented data items to and from underlying
FileOutputStream and FileInputStream objects, examine the
DOSDISDemo source code in Listing 3.

DOSDISDemo introduces the UTF concept, by way of its writeUTF()
and readUTF() method calls. UTF stands for Unicode Text Format and
it is an encoding format used for efficiently storing and retrieving text
characters. According to the format used by Java, which is a slight variant of
UTF-8:

All characters whose Unicode values range from \u0001 to \u007f are
represented by a single byte, with the most significant bit set to 0.

The null character Unicode value (\u0000) and all characters whose
Unicode values range from \u0080 to \u07ff are represented by two bytes, with
the most significant three bits of the most significant byte being 1, 1, and 0
(in a left-to-right order), and the most significant two bits of the least
significant byte being 1 and 0 (in a left-to-right order).

All characters whose Unicode values range from \u0800 to \uffff are
represented by three bytes, with the most significant four bits of the most
significant byte being 1, 1, 1 and 0 (in a left-to-right order) and the most
significant two bits of each of the remaining two bytes being 1 and 0 (in a
left-to-right order).

When run, DOSDISDemo produces the following output:

256
3.141592653589793
Java

NOTE

Objects created from either the buffered stream or the data stream classes
are known as filter streams. That name derives from their use in filtering bytes
(or characters) that flow into a buffered input stream or filtering bytes that
flow into a data input stream. Furthermore, that name derives from their use in
filtering bytes (or characters) that flow out of the buffered output stream or
filtering bytes that flow out of the data output stream. In addition to buffered
and data stream classes, Java's standard class library includes other
classes that are used to perform filtering operations.

Piped Stream Classes

Threads are often required to communicate. A technique that is often used by
threads wishing to communicate involves piped streams.

The idea behind piped streams is to connect a piped output stream to a piped
input stream. Then, one thread writes data to the piped output stream and
another thread reads that data by way of the piped input stream. Although there
are no synchronization problems with piped streams, those streams have limited
sizes. As a result, a writing thread could write more output to a piped output
stream than that stream can accommodate, and the excess output would be lost. To
prevent that from happening, the reading thread must be responsive. To support
piped streams, Java supplies the PipedInputStream,
PipedReader, PipedOutputStream, and PipedWriter
classes in its standard class library. (Those classes are located in the
java.io package.)

CAUTION

Deadlock might occur if a single thread uses a piped output stream connected
to a piped input stream, and performs both writing and reading operations on
that stream.

Creating a piped input stream connected to a piped output stream is not
difficult, as the following code fragment attests:

The code fragment first creates a piped output stream (as represented by the
PipedWriter object) and then creates a piped input stream (as
represented by a PipedReader object) that binds itself to the piped
output stream. When that's done, a writing thread can call
pw.write() to output data to the piped output stream, whereas a reading
thread can call pr.read() to read that output over its piped input
stream.

Zip Stream Classes

Did you know that Java makes it easy to read and write Zip files? Zip support
manifests itself in the standard class library by way of the
ZipInputStream and ZipOutputStream filter stream classes, and
other classes that (along with ZipInputStream and
ZipOutputStream) are part of the java.util.zip package. By
using those classes, it is possible to create a command-line version of the
popular WinZip utility.

To give you a taste for working with Zip stream classes, Listing 5 presents
source code to a ZipReader application. That application uses
ZipInputStream to retrieve all entries in a Zip file. For each entry,
that entry's name prints.

To run ZipReader, you need access to either a Zip file or a Jar file
(which is basically a Zip file with a .jar extension). For example,
assuming the SDK's tools.jar file is placed in the same directory
as ZipReader.class, issue java ZipReader tools.jar to obtain a
list of all packages and classes contained in that Jar file.