Java Programming Tutorial

Basic Input & Output (I/O)

Programming simple I/O operations is easy, which involves only a few classes and methods. You could do it by looking at a few samples. Programming efficient, portable I/O is extremely difficult, especially if you have to deal with different character sets. This explains why there are so many I/O packages (nine in JDK 1.7)!

JDK 1.7 enhances supports for file I/O via the so-called NIO.2 (non-blocking I/O) in new package java.nio.file and its auxiliary packages. It also introduces a new try-with-resources syntax to simplify the coding of close() method.

File and Directory

Class java.io.File (Pre-JDK 7)

The class java.io.File can represent either a file or a directory. [JDK 1.7 introduces a more versatile java.nio.file.Path, which overcomes many limitations of java.io.File.]

A path string is used to locate a file or a directory. Unfortunately, path strings are system dependent, e.g., "c:\myproject\java\Hello.java" in Windows or "/myproject/java/Hello.java" in Unix/Mac.

Windows use back-slash '\' as the directory separator; while Unixes/Mac use forward-slash '/'.

Windows use semi-colon ';' as path separator to separate a list of paths; while Unixes/Mac use colon ':'.

Windows use "\r\n" as line delimiter for text file; while Unixes use "\n" and Mac uses "\r".

The "c:\" or "\" is called the root. Windows supports multiple roots, each maps to a drive (e.g., "c:\", "d:\"). Unixes/Mac has a single root ("\").

A path could be absolute (beginning from the root) or relative (which is relative to a reference directory). Special notations "." and ".." denote the current directory and the parent directory, respectively.

The java.io.File class maintains these system-dependent properties, for you to write programs that are portable:

Path Separator: in static fields File.pathSeparator (as String) and File.pathSeparatorChar. As mentioned, Windows use semi-colon ';' to separate a list of paths; while Unixes/Mac use colon ':'.

You can construct a File instance with a path string or URI, as follows. Take note that the physical file/directory may or may not exist. A file URL takes the form of file://..., e.g., file:///d:/docs/programming/java/test.html.

public File(String pathString)
public File(String parent, String child)
public File(File parent, String child)
// Constructs a File instance based on the given path string.
public File(URI uri)
// Constructs a File instance by converting from the given file-URI "file://...."

For examples,

File file = new File("in.txt"); // A file relative to the current working directory
File file = new File("d:\\myproject\\java\\Hello.java"); // A file with absolute path
File dir = new File("c:\\temp"); // A directory

For applications that you intend to distribute as JAR files, you should use the URL class to reference the resources, as it can reference disk files as well as JAR'ed files , for example,

java.net.URL url = this.getClass().getResource("icon.png");

Verifying Properties of a File/Directory

public boolean exists() // Tests if this file/directory exists.
public long length() // Returns the length of this file.
public boolean isDirectory() // Tests if this instance is a directory.
public boolean isFile() // Tests if this instance is a file.
public boolean canRead() // Tests if this file is readable.
public boolean canWrite() // Tests if this file is writable.
public boolean delete() // Deletes this file/directory.
public void deleteOnExit() // Deletes this file/directory when the program terminates.
public boolean renameTo(File dest) // Renames this file.
public boolean mkdir() // Makes (Creates) this directory.

List Directory

For a directory, you can use the following methods to list its contents:

public String[] list() // List the contents of this directory in a String-array
public File[] listFiles() // List the contents of this directory in a File-array

Example: The following program recursively lists the contents of a given directory (similar to Unix's "ls -r" command).

The list() and listFiles() methods does a call-back to accept() for each of the file/sub-directory produced. You can program your filtering criteria in accept(). Those files/sub-directories that result in a false return will be excluded.

Example: The following program lists only files that meet a certain filtering criteria.

Class java.nio.file.Path (JDK 7)

Stream I/O in Standard I/O (java.io Package)

Programs read inputs from data sources (e.g., keyboard, file, network, memory buffer, or another program) and write outputs to data sinks (e.g., display console, file, network, memory buffer, or another program). In Java standard I/O, inputs and outputs are handled by the so-called streams. A stream is a sequential and contiguous one-way flow of data (just like water or oil flows through the pipe). It is important to mention that Java does not differentiate between the various types of data sources or sinks (e.g., file or network) in stream I/O. They are all treated as a sequential flow of data. Input and output streams can be established from/to any data source/sink, such as files, network, keyboard/console or another program. The Java program receives data from a source by opening an input stream, and sends data to a sink by opening an output stream.
All Java I/O streams are one-way (except the RandomAccessFile, which will be discussed later). If your program needs to perform both input and output, you have to open two streams - an input stream and an output stream.

Stream I/O operations involve three steps:

Open an input/output stream associated with a physical device (e.g., file, network, console/keyboard), by constructing an appropriate I/O stream instance.

Read from the opened input stream until "end-of-stream" encountered, or write to the opened output stream (and optionally flush the buffered output).

Close the input/output stream.

Java's I/O operations is more complicated than C/C++ to support internationalization (i18n). Java internally stores characters (char type) in 16-bit UCS-2 character set. But the external data source/sink could store characters in other character set (e.g., US-ASCII, ISO-8859-x, UTF-8, UTF-16, and many others), in fixed length of 8-bit or 16-bit, or in variable length of 1 to 4 bytes. [Read "Character Sets and Encoding Schemes"]. As a consequence, Java needs to differentiate between byte-based I/O for processing raw bytes or binary data, and character-based I/O for processing texts made up of characters.

Byte-Based I/O & Byte Streams

Byte streams are used to read/write raw bytes serially from/to an external device. All the byte streams are derived from the abstract superclasses InputStream and OutputStream, as illustrated in the class diagram.

Reading from an InputStream

The abstract superclass InputStream declares an abstract method read() to read one data-byte from the input source:

public abstract int read() throws IOException

The read() method:

returns the input byte read as an int in the range of 0 to 255, or

returns -1 if "end of stream" condition is detected, or

throws an IOException if it encounters an I/O error.

The read() method returns an int instead of a byte, because it uses -1 to indicate end-of-stream.

The read() method blocks until a byte is available, an I/O error occurs, or the "end-of-stream" is detected. The term "block" means that the method (and the program) will be suspended. The program will resume only when the method returns.

Two variations of read() methods are implemented in the InputStream for reading a block of bytes into a byte-array. It returns the number of bytes read, or -1 if "end-of-stream" encounters.

Writing to an OutputStream

Similar to the input counterpart, the abstract superclass OutputStream
declares an abstract method write() to write a data-byte to the output sink. write() takes an int. The least-significant byte of the int argument is written out; the upper 3 bytes are discarded. It throws an IOException if I/O error occurs (e.g., output stream has been closed).

public void abstract void write(int unsignedByte) throws IOException

Similar to the read(), two variations of the write() method to write a block of bytes from a byte-array are implemented:

Opening & Closing I/O Streams

You open an I/O stream by constructing an instance of the stream. Both the InputStream and the OutputStream provides a close() method to close the stream, which performs the necessary clean-up operations to free up the system resources.

public void close() throws IOException // close this Stream

It is a good practice to explicitly close the I/O stream, by running close() in the finally clause of try-catch-finally to free up the system resources immediately when the stream is no longer needed. This could prevent serious resource leaks. Unfortunately, the close() method also throws a IOException, and needs to be enclosed in a nested try-catch statement, as follows. This makes the codes somehow ugly.

Flushing the OutputStream

In addition, the OutputStream provides a flush() method to flush the remaining bytes from the output buffer.

public void flush() throws IOException // Flush the output

Implementations of abstractInputStream/OutputStream

InputStream and OutputStream are abstract classes that cannot be instantiated. You need to choose an appropriate concrete subclass to establish a connection to a physical device. For example, you can instantiate a FileInputStream or FileOutputStream to establish a stream to a physical disk file.

Layered (or Chained) I/O Streams

The I/O streams are often layered or chained with other I/O streams, for purposes such as buffering, filtering, or data-format conversion (between raw bytes and primitive types). For example, we can layer a BufferedInputStream to a FileInputStream for buffered input, and stack a DataInputStream in front for formatted data input (using primitives such as int, double), as illustrated in the following diagrams.

File I/O Byte-Streams - FileInputStream & FileOutputStream

FileInputStream and FileOutputStream are concrete implementations to the abstract classes InputStream and OutputStream, to support I/O from disk files.

The read()/write() method in InputStream/OutputStream are designed to read/write a single byte of data on each call. This is grossly inefficient, as each call is handled by the underlying operating system (which may trigger a disk access, or other expensive operations). Buffering, which reads/writes a block of bytes from the external device into/from a memory buffer in a single I/O operation, is commonly applied to speed up the I/O.

FileInputStream/FileOutputStream is not buffered. It is often chained to a BufferedInputStream or BufferedOutputStream, which provides the buffering. To chain the streams together, simply pass an instance of one stream into the constructor of another stream. For example, the following codes chain a FileInputStream to a BufferedInputStream, and finally, a DataInputStream:

FileInputStream fileIn = new FileInputStream("in.dat");
BufferedInputStream bufferIn = new BufferedInputStream(fileIn);
DataInputStream dataIn = new DataInputStream(bufferIn);
// or
DataInputStream in = new DataInputStream(
new BufferedInputStream(
new FileInputStream("in.dat")));

This example copies a file by reading a byte from the input file and writing it to the output file. It uses FileInputStream and FileOutputStream directly without buffering. Notice that most the I/O methods "throws" IOException, which must be caught or declared to be thrown.
The method close() is programmed inside the finally clause. It is guaranteed to be run after try or catch. However, method close() also throws an IOException, and therefore must be enclosed inside a nested try-catch block, which makes the codes a little ugly.

I used System.nanoTime(), which was introduced in JDK 1.5, for a more accurate measure of the elapsed time, instead of the legacy not-so-precise System.currentTimeMillis(). The output shows that it took about 4 seconds to copy a 400KB file.

As mentioned, JDK 1.7 introduces a new try-with-resources syntax, which automatically closes all the resources opened, after try or catch. For example, the above example can be re-written in a much neater manner as follow:

This example again uses FileInputStream and FileOutputStream directly. However, instead of reading/writing one byte at a time, it reads/writes a 4KB block. This program took only 3 millisecond - a more than 1000 times speed-up compared with the previous example.

Larger buffer size, up to a certain limit, generally improves the I/O performance. However, there is a trade-off between speed-up the the memory usage. For file copying, a large buffer is certainly recommended. But for reading just a few bytes from a file, large buffer simply wastes the memory.

I re-write the program using JDK 1.7, and try on various buffer size on a much bigger file of 26MB.

In this example, I chain the FileInputStream with BufferedInputStream, FileOutputStream with BufferedOutputStream, and read/write byte-by-byte. The JRE decides on the buffer size. The program took 62 milliseconds, about 60 times speed-up compared with example 1, but slower than the programmer-managed buffer.

Formatted Data-Streams: DataInputStream & DataOutputStream

The DataInputStream and DataOutputStream can be stacked on top of any InputStream and OutputStream to parse the raw bytes so as to perform I/O operations in the desired data format, such as int and double.

To use DataInputStream for formatted input, you can chain up the input streams as follows:

DataInputStream in = new DataInputStream(
new BufferedInputStream(
new FileInputStream("in.dat")));

The data stored in the disk are exactly in the same form as in the Java program internally (e.g., UCS-2 for characters). The byte-order is big-endian (big byte first, or most significant byte in lowest address).

Network I/O

[In Java Networking]

Character-Based I/O & Character Streams

Java internally stores characters (char type) in 16-bit UCS-2 character set. But the external data source/sink could store characters in other character set (e.g., US-ASCII, ISO-8859-x, UTF-8, UTF-16, and many others), in fixed length of 8-bit or 16-bit, or in variable length of 1 to 4 bytes. [Read "Character Sets and Encoding Schemes"]. Hence, Java has to differentiate between byte-based I/O for processing 8-bit raw bytes, and character-based I/O for processing texts. The character streams needs to translate between the character set used by external I/O devices and Java internal UCS-2 format. For example, the character '您' is stored as "60 A8" in UCS-2 (Java internal), "E6 82 A8" in UTF8, "C4 FA" in GBK/GB2312, and "B1 7A" in BIG5. If this character is to be written to a file uses UTF-8, the character stream needs to translate "60 A8" to "E6 82 A8". The reserve takes place in a reading operation.

The byte/character streams refer to the unit of operation within the Java programs, which does not necessary correspond to the amount of data transferred from/to the external I/O devices. This is because some charsets use fixed-length of 8-bit (e.g., US-ASCII, ISO-8859-1) or 16-bit (e.g., UCS-16), while some use variable-length of 1-4 bytes (e.g., UTF-8, UTF-16, UTF-16-BE, UTF-16-LE, GBK, BIG5). When a character stream is used to read an 8-bit ASCII file, an 8-bit data is read from the file and put into the 16-bit char location of the Java program.

Abstract superclass Reader and Writer

Other than the unit of operation and charset conversion (which is extremely complex), character-based I/O is almost identical to byte-based I/O. Instead of InputStream and OutputStream, we use Reader and Writer for character-based I/O.

The abstract superclass Reader operates on char. It declares an abstract method read() to read one character from the input source. read() returns the character as an int between 0 to 65535 (a char in Java can be treated as an unsigned 16-bit integer); or -1 if end-of-stream is detected; or throws an IOException if I/O error occurs. There are also two variations of read() to read a block of characters into char-array.

The abstract superclass Writer declares an abstract method write(), which writes a character to the output sink. The lower 2 bytes of the int argument is written out; while the upper 2 bytes are discarded.

File I/O Character-Streams - FileReader & FileWriter

FileReader and FileWriter are concrete implementations to the abstract superclasses Reader and Writer, to support I/O from disk files. FileReader/FileWriter assumes that the default character encoding (charset) is used for the disk file. The default charset is kept in the JVM's system property "file.encoding". You can get the default charset via static method java.nio.charset.Charset.defaultCharset() or System.getProperty("file.encoding"). It is probable safe to use FileReader/FileWriter for ASCII texts, provided that the default charset is compatible to ASCII (such as US-ASCII, ISO-8859-x, UTF-8, and many others, but NOT UTF-16, UTF-16BE, UTF-16LE and many others). Use of FileReader/FileWriter is NOT recommended as you have no control of the file encoding charset.

Buffered I/O Character-Streams - BufferedReader & BufferedWriter

BufferedReader and BufferedWriter can be stacked on top of FileReader/FileWriter or other character streams to perform buffered I/O, instead of character-by-character. BufferedReader provides a new method readLine(), which reads a line and returns a String (without the line delimiter). Lines could be delimited by "\n" (Unix), "\r\n" (Windows), or "\r" (Mac).

Character Set (or Charset) - Package java.nio.charset (JDK 1.4)

JDK 1.4 provides a new package java.nio.charset as part of NIO (New IO) to support character translation between the Unicode (UCS-2) used internally in Java program and external devices which could be encoded in any other format (e.g., US-ASCII, ISO-8859-x, UTF-8, UTF-16, UTF-16BE, UTF-16LE, and etc.)

The main class java.nio.charset.Charset provides static methods for testing whether a particular charset is supported, locating charset instances by name, and listing all the available charsets and the default charset.

public static SortedMap<String,Charset> availableCharsets() // lists all the available charsets
public static Charset defaultCharset() // Returns the default charset
public static Charset forName(String charsetName) // Returns a Charset instance for the given charset name (in String)
public static boolean isSupported(String charsetName) // Tests if this charset name is supported

The default charset for file encoding is kept in the system property "file.encoding". To change the JVM's default charset for file encoding, you can use command-line VM option "-Dfile.encoding". For example, the following command run the program with default charset of UTF-8.

> java -Dfile.encoding=UTF-8 TestCharset

Most importantly, the Charset class provides methods to encode/decode characters from UCS-2 used in Java program and the specific charset used in the external devices (such as UTF-8).

public final ByteBuffer encode(String s)
public final ByteBuffer encode(CharBuffer cb)
// Encodes Unicode UCS-2 characters in the CharBuffer/String
// into a "byte sequence" using this charset, and returns a ByteBuffer.
public final CharBuffer decode(ByteBuffer bb)
// Decode the byte sequence encoded using this charset in the ByteBuffer
// to Unicode UCS-2, and return a charBuffer.

The encode()/decode() methods operate on ByteBuffer and CharBuffer introduced also in JDK 1.4, which will be explain in the New I/O section.

Example: The following example encodes some Unicode texts in various encoding scheme, and display the Hex codes of the encoded byte sequences.

Text File I/O - InputStreamReader and OutputStreamWriter

As mentioned, Java internally stores characters (char type) in 16-bit UCS-2 character set. But the external data source/sink could store characters in other character set (e.g., US-ASCII, ISO-8859-x, UTF-8, UTF-16, and many others), in fixed length of 8-bit or 16-bit, or in variable length of 1 to 4 bytes. The FileReader/FileWriter introduced earlier uses the default charset for decoding/encoding, resulted in non-portable programs.

To choose the charset, you need to use InputStreamReader and OutputStreamWriter. InputStreamReader and OutputStreamWriter are considered to be byte-to-character "bridge" streams.

You can choose the character set in the InputStreamReader's constructor:

As the InputStreamReader/OutputStreamWriter often needs to read/write in multiple bytes, it is best to wrap it with a BufferedReader/BufferedWriter.

Example: The following program writes Unicode texts to a disk file using various charsets for file encoding. It then reads the file byte-by-byte (via a byte-based input stream) to check the encoded characters in the various charsets. Finally, it reads the file using the character-based reader.

As seen from the output, the characters 您好 is encoded differently in different charsets. Nonetheless, the InputStreamReader is able to translate the characters into the same UCS-2 used in Java program.

java.io.PrintStream & java.io.PrintWriter

The byte-based java.io.printSteam supports convenient printing methods such as print() and println() for printing primitives and text string. Primitives are converted to their string representation for printing. The printf() and format() were introduced in JDK 1.5 for formatting output with former specifiers. printf() and format() are identical.

A PrintStream never throws an IOException. Instead, it sets an internal flag which can be checked via the checkError() method. A PrintStream can also be created to flush the output automatically. That is, the flush() method is automatically invoked after a byte array is written, one of the println() methods is invoked, or after a newline ('\n') is written.

The standard output and error streams (System.out and System.err) belong to PrintStream.

All characters printed by a PrintStream are converted into bytes using the default character encoding. The PrintWriter class should be used in situations that require writing characters rather than bytes.

The character-stream PrintWriter is similar to PrintStream, except that it write in characters instead of bytes. The PrintWriter also supports all the convenient printing methods print(), println(), printf() and format(). It never throws an IOException and can optionally be created to support automatic flushing.

[TODO] Example to show the difference between PrintStream and PrintWriter.

Object Serialization and Object Streams

Data streams (DataInputStream and DataOutputStream) allow you to read and write primitive data (such as int, double) and String, rather than individual bytes. Object streams (ObjectInputStream and ObjectOutputStream) go one step further to allow you to read and write entire objects (such as Date, ArrayList or any custom objects).

Object serialization is the process of representing a "particular state of an object" in a serialized bit-stream, so that the bit stream can be written out to an external device (such as a disk file or network). The bit-stream can later be re-constructed to recover the state of that object. Object serialization is necessary to save a state of an object into a disk file for persistence or sent the object across the network for applications such as Web Services, Distributed-object applications, and Remote Method Invocation (RMI).

In Java, object that requires to be serialized must implement java.io.Serializable or java.io.Externalizable interface. Serializable interface is an empty interface (or tagged interface) with nothing declared. Its purpose is simply to declare that particular object is serializable.

ObjectInputStream & ObjectOutputStream

The ObjectInputStream and ObjectOutputStream can be used to serialize an object into a bit-stream and transfer it to/from an I/O streams, via these methods:

ObjectInputStream and ObjectOutputStream must be stacked on top of a concrete implementation of InputStream or OutputStream, such as FileInputStream or FileOutputStream.

For example, the following code segment writes objects to a disk file. The ".ser" is the convention for serialized object file type.

ObjectOutputStream out =
new ObjectOutputStream(
new BufferedOutputStream(
new FileOutputStream("object.ser")));
out.writeObject("The current Date and Time is "); // write a String object
out.writeObject(new Date()); // write a Date object
out.flush();
out.close();

To read and re-construct the object back in a program, use the method readObject(), which returns an java.lang.Object. Downcast the Object back to its original type.

The ObjectInputStream and ObjectOutputStream implement DataInput and DataOutput interface respectively. You can used methods such as readInt(), readDouble(), writeInt(), writeDouble() for reading and writing primitive types.

transient & static

static fields are not serialized, as it belongs to the class instead of the particular instance to be serialized.

To prevent certain fields from being serialized, mark them using the keyword transient. This could cut down the amount of data traffic.

The writeObject() method writes out the class of the object, the class signature, and values of non-static and non-transient fields.

java.io.Serializable & Externalizable Interfaces

When you create a class that might be serialized, the class must implement java.io.Serializable interface. The Serializable interface doesn't declare any methods. Empty interfaces such as Serializable are known as tagging interfaces. They identify implementing classes as having certain properties, without requiring those classes to actually implement any methods.

Most of the core Java classes implement Serializable, such as all the wrapper classes, collection classes, and GUI classes. In fact, the only core Java classes that do not implement Serializable are ones that should not be serialized. Arrays of primitives or serializable objects are themselves serializable.

Warning Message "The serialization class does not declare a static final serialVersionUID field of type long" (Advanced)

This warning message is triggered because your class (such as java.swing.JFrame) implements the java.io.Serializable interface. This interface enables the object to be written out to an output stream serially (via method writeObject()); and read back into the program (via method readObject()). The serialization runtime uses a number (called serialVersionUID) to ensure that the object read into the program (during deserialization) is compatible with the class definition, and not belonging to another version. It throws an InvalidClassException otherwise.

You have these options:

Simply ignore this warning message. If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class.

java.io.Externalizable Interface

The Serializable has a sub-interface called Externalizable, which you could used if you want to customize the way a class is serialized. Since Externalizable extends Serializable, it is also a Serializable and you could invoke readObject() and writeObject().

ObjectOutput and ObjectInput are interfaces that are implemented by ObjectOutputStream and ObjectInputStream, which define the writeObject() and readObject() methods, respectively. When an instance of Externalizable is passed to an ObjectOutputStream, the default serialization procedure is bypassed; instead, the stream calls the instance's writeExternal() method. Similarly, when an ObjectInputStream reads a Exteranlizabled instance, it uses readExternal() to reconstruct the instance.

Externalizable is useful if you want complete control on how your objects shall be serialized/deserialized. For example, you could encrypt sensitive data before the object is serialized.

Random Access Files

All the I/O streams covered so far are one-way streams. That is, they are either read-only input stream or write-only output stream. Furthermore, they are all sequential-access (or serial) streams, meant for reading and writing data sequentially. Nonetheless, it is sometimes necessary to read a file record directly as well as modifying existing records or inserting new records. The class RandomAccessFile provides supports for non-sequential, direct (or random) access to a disk file. RandomAccessFile is a two-way stream, supporting both input and output operations in the same stream.

RandomAccessFile can be treated as a huge byte array. You can use a file pointer (of type long), similar to array index, to access individual byte or group of bytes in primitive types (such as int and double). The file pointer is located at 0 when the file is opened. It advances automatically for every read and write operation by the number of bytes processed.

In constructing a RandomAccessFile, you can use flags 'r' or 'rw' to indicate whether the file is "read-only" or "read-write" access, e.g.,

public void seek(long pos) throws IOException;
// Positions the file pointer for subsequent read/write operation.
public int skipBytes(int numBytes) throws IOException;
// Moves the file pointer forward by the specified number of bytes.
public long getFilePointer() throws IOException;
// Gets the position of the current file pointer, in bytes, from the beginning of the file.
public long length() throws IOException;
// Returns the length of this file.

RandomAccessFile does not inherit from InputStream or OutputStream. However, it implements DataInput and DataOutput interfaces (similar to DataInputStream and DataOutputStream). Therefore, you can use various methods to read/write primitive types to the file, e.g.,

Example: Read and write records from a RandomAccessFile. (A student file consists of student record of name (String) and id (int)).

[PENDING]

Compressed I/O Streams

The classes ZipInputStream and ZipOutputStream (in package java.util) support reading and writing of compressed data in ZIP format. The classes GZIPInputStream and GZIPOutputStream (in package java.util) support reading and writing of compressed data in GZIP format.

Formatted-Text Input via java.util.Scanner

JDK 1.5 introduces java.util.Scanner class, which greatly simplifies formatted text input from input source (e.g., files, keyboard, network). Scanner, as the name implied, is a simple text scanner which can parse the input text into primitive types and strings using regular expressions. It first breaks the text input into tokens using a delimiter pattern, which is by default the white spaces (blank, tab and newline). The tokens may then be converted into primitive values of different types using the various nextXxx() methods (nextInt(), nextByte(), nextShort(), nextLong(), nextFloat(), nextDouble(), nextBoolean(), next() for String, and nextLine() for an input line). You can also use the hasNextXxx() methods to check for the availability of a desired input.

The commonly-used constructors are as follows. You can construct a Scanner to parse a byte-based InputStream (e.g., System.in), a disk file, or a given String.

nextXxx() and hasNextXxx()

The Scanner class implements iterator<String> interface. You can use hasNext() coupled with next() to iterate through all the String tokens. You can also directly iterate through the primitive types via methods hasNextXxx() and nextXxx(). Xxx includes all primitive types (byte, short, int, long, float, double and boolean), BigInteger, and BigNumber. char is not included but can be retrieved from String via charAt().

Delimiter

Instead of the default white spaces as the delimiter, you can set the delimiter to a chosen regular expression via these methods:

The regular expression \s*apple\s* matches zero or more white spaces (\s*) followed by "apple" followed by zero or more white spaces (\s*). An additional backslash (\) is needed to use a backslash (\) in Java String's literal. Read "Regular Expression" for more details.

Regexe Pattern Matching

You can use the following methods to find the next occurrence of the specified pattern using regular expressions:

Charset

By default, Scanner uses the default charset to read the character from the input source. You can ask Scanner to read text file which is encoded using a particular charset, by providing the charset name.

Formatted-Text Output via java.util.Formatter Class & format() method

A Formatter is an interpreter for printf-style format strings. It supports layout justification and alignment, common formats for numeric, string, and date/time data, and locale-specific output, via the format specifiers.

String.format()

The Formatter with StringBuilder as the output sink allows you to build up a formatted string progressively. To produce a simple formatted String, you can simply use the static method String.format(). This is handy in the toString() method, which is required to return a String. For example,

File I/O in JDK 1.7

Interface java.nio.file.Path

A path string could be used to locate a file, a directory or a symbolic link. A symbolic link (or symlink) is a special file that references another file. A path string is system dependent, e.g., "c:\myproject\java\Hello.java" in Windows or "/myproject/java/Hello.java" in Unix. Windows uses back-slash '\' as the directory separator; while Unixes use forward-slash '/'. Windows uses semi-colon ';' as path separator; while Unixes use colon ':'. The "c:\" or "\" is called the root. Windows supports multiple roots, each maps to a drive (e.g., "c:\", "d:\"). Unix has single root ("\"). A path could be absolute (beginning from the root) or relative (which is relative to the current working directory). Special notations "." and ".." denote the current directory and the parent directory, respectively.

A java.nio.file.Path instance specifies the location of a file, or a directory, or a symbolic link. Path replaces java.io.File (of the standard I/O), which is less versatile and buggy.

Helper class java.nio.file.Paths

To create a Path, use the static method get() of the helper class java.nio.file.Paths. The helper class Paths contains exclusively static methods for creating Path objects. Paths.get() returns a Path object by converting a given path string or URI.

public static Path get(String first, String... more)
// This method accepts variable number of arguments (varargs).
// It converts a path string, or a sequence of strings that when joined form a path string, to a Path object.
// The location of the Path may or may not exist.
public static Path get(URI uri)
// Converts the given URI to a Path object.

Helper Class java.nio.file.Files

Properties of a File/Directory

You can use staticboolean methods Files.exists(Path) and File.notExists(Path) to verify if a given Path exists or does not exist (as a file, directory or symlink). A Path could be verified to exist, or not exist, or unknown (e.g., the program does not have access to the file). If the status is unknown, the exists() and noExists() returns false.

You could also use staticboolean methods Files.isDirectory(Path), Files.isRegularFile(Path) and Files.isSymbolicLink(Path) to verify whether a Path locates a file, directory, or symlink.

Many of these methods take an optional second argument of LinkOption, which is applicable for symlink only. For example, LinkOption.NOFOLLOW_LINKS specifies do not follow the symlink.

Copying/Moving a File/Directory

You can use static methods copy(Path, Path, CopyOption) or move(Path, Path, CopyOption) to copy or move a file or directory. The methods return the target Path.

The methods accepts an optional third argument of CopyOption. For examples: CopyOption.REPLACE_EXISTING replaces the target if it exists; CopyOption.COPY_ATTRIBUTES copies the file attributes such as the dates; CopyOption.NOFOLLOW_LINKS specifies not to follow symlinks.

Reading/Writing Small Files

For small files, you can use static methods Files.readAllBytes(Path) (byte-based) and Files.readAllLines(Path, Charset) (character-based) to read the entire file. You can use Files.write(Path, byte[]) (byte-based) or Files.write(Path, Iterable, Charset) (character-based) to write to a file.

The optional OpenOption includes: WRITE, APPEND, TRUNCATE_EXISTING (truncates the file to zero bytes), CREATE_NEW (creates a new file and throws an exception if the file already exists), CREATE (opens the file if it exists or creates a new file if it does not), among others.

Buffered Character-based I/O for Text Files

For Reading, use Files.newBufferedReader(Path, Charset) method to open a text file, which returns a BufferedReader. Use BufferedReader.readLine() to read a line, read() to read a char, or read(char[] cbuf, int off, int len) to read into a char-array.

For Writing, use the Files.newBufferedWriter(Path, Charset, OpenOption...) method to open a output text file, which returns a BufferedWriter. Use BufferedWriter.write(int c) to write a character, write(char[] cbuf, int off, int len) or write(String s, int off, int len) to write characters.

Byte-Based Stream I/O

Use Files.newInputStream(Path, OpenOption...) to allocate an InputStream for reading raw bytes; and Files.newOutputStream(Path, OpenOption...) to allocate an OutputStream for writing. The InputStream and OutputStream returned are not buffered.

Example: Similar to the previous program which read/write the entire file, this program read/write via Buffered I/O.

Creating a New File/Directory/Symlink

Beside using the Files.write() method with OpenOption of CREATE or CREATE_NEW, you can also use Files.createFile() method to create an empty file. You can use the default file attributes or optionally define the initial attributes of the file.

public static Path createFile(Path path, FileAttribute<?>... attrs)

The FileAttribute includes: [TODO]

DOS:

Unixes: Nine file permissions: read, write, and execute permissions for the file owner, members of the same group, and "everyone else", e.g., "rwxr-x---".

Random Access File

The Interface SeekableByteChannel supports random access.

pubic long position()
// Returns this channel's position.
public SeekableByteChannel position(long newPosition)
// Sets this channel's position.
public int read(ByteBuffer dest)
// Reads a sequence of bytes from this channel into the given ByteBuffer.
public int write(ByteBuffer source)
// Writes a sequence of bytes to this channel from the given ByteBuffer.
public long size()
// Returns the current size of entity to which this channel is connected.
public SeekableByteChannel truncate(long size)
// Truncates the entity, to which this channel is connected, to the given size.

List all root directories

Listing a directory

You can list the contents of a directory by using the Files.newDirectoryStream(Path) method. The returned DirectoryStream object implements Iterable. You can iterate thru the entries with for-each loop.

The interface DirectoryStream.Filter<T> declares one abstractboolean method accept(), which will be call-back for each entry. Those entries that resulted in false accept() will be discarded.

public boolean accept(T entry) throws IOException

Example: The following program uses an anonymous instance of an anonymous DirectoryStream.Filter sub-class to filter the DirectoryStream. The call-back method accept() returns true for regular files, and discards the rest. Take note that this filtering criterion cannot be implemented in a glob-pattern.

Walking the File Tree - Files.walkFileTree()

You can use static method Files.walkFileTree() ro recursively walk thru all the files from a starting directory.

First of all, you need to create an object that implements interface FileVisitor<? super Path>, which declares these abstract methods:

public FileVisitResult preVisitDirectory(T dir, BasicFileAttributes attrs) throws IOException
// Invoked for a directory before entries in the directory are visited.
// If this method returns CONTINUE, then entries in the directory are visited.
// If this method returns SKIP_SUBTREE or SKIP_SIBLINGS then entries in the directory
// (and any descendants) will not be visited.
public FileVisitResult postVisitDirectory(T dir, IOException ex) throws IOException
// Invoked for a directory after entries in the directory, and all of their descendants,
// have been visited. This method is also invoked when iteration of the directory
// completes prematurely (by a visitFile method returning SKIP_SIBLINGS,
// or an I/O error when iterating over the directory).
public FileVisitResult visitFile(T file, BasicFileAttributes attrs) throws IOException
// Invoked for a file/symlink in a directory.
public FileVisitResult visitFileFailed(T file, IOException ex) throws IOException
// Invoked for a file that could not be visited. This method is invoked
// if the file's attributes could not be read, the file is a directory
// that could not be opened, and other reasons.

These methods return an enum of FileVisitResult, which could take values of CONTINUE, TERMINATE, SKIP_SUBTREES, SKIP_SIBLINGS.

Instead of implementing FileVisitor interface, you could also extend from superclass SimpleFileVisitor, and override the selected methods.

There are two versions of walkFileTree(). The first version take a starting directory and a FileVisitor, and transverse through all the levels, without following the symlinks.

The second version takes 2 additional arguments: the options specifies whether to follow symlink (e.g., EnumSet.noneOf(FileVisitOption.class) or EnumSet.of(FileVisitOption.FOLLOW_LINKS)); the maxDepth specifies the levels to visit (set to Integer.MAX_VALUE for all levels).