This tutorial develops a little command line program to list information
about files and directories - essentially a much simplified version of the POSIX ls or Windows dir
commands. We'll start with the simplest possible version and progress to more
complex functionality. Along the way we'll digress to cover topics you'll need
to know about to understand Boost.Filesystem.

Source code for each of the tutorial programs is available, and you
are encouraged to compile, test, and experiment with it. To conserve space, we won't
always show boilerplate code here, but the provided source is complete and
ready to build.

If the tut1 command outputs "Usage: tut1 path", all
is well. A set of tutorial programs has been copied (by setup) to
boost-root/libs/filesystem/example/test
and then built. You are encouraged to modify and experiment with them as the
tutorial progresses. Just invoke the bld script again to rebuild.

If something didn't work right, here are troubleshooting suggestions:

The bjam program executable isn't being found.
Check your path environmental variable if it should have been found,
otherwise see
Boost
Getting Started.

The Boost.Filesystem file_size
function returns a uintmax_t
containing the size of the file named by the argument. The declaration looks
like this:

uintmax_t file_size(const path& p);

For now, all you need to know is that class path has constructors that take
const char * and many other useful types. (If you can't wait to
find out more, skip ahead to the class path section of
the tutorial.)

Please take a minute to try out tut1 on your system, using a
file that is known to exist, such as tut1.cpp. Here is what the
results look like on two different operating systems:

Boost.Filesystem includes status query functions such as exists,
is_directory, and is_regular_file. These return
bool's, and will return true if the condition
described by their name is met. Otherwise they return false,
including when any element
of the path argument can't be found.

tut2.cpp uses several of the status query functions to cope with non-existent
files and with different kinds of files:

Although tut2 works OK in these tests, the output is less than satisfactory
for a directory. We'd typically like to see a list of the directory's contents. In tut3.cpp
we will see how to iterate over directories.

An exception is thrown; the exact form of the response depends on
Windows system options.

On the Linux system, the test was being run from an account that did not have
permission to access /home/jane/foo. On the Windows system,
e: was a Compact Disc reader/writer that was not ready. End users
shouldn't have to interpret cryptic exceptions reports, so as we move on to tut3.cpp
we will increase the robustness of the code, too.

Boost.Filesystem's
directory_iterator class is just what we need here. It follows the
general pattern of the standard library's istream_iterator. Constructed from
a path, it iterates over the contents of the directory. A default constructed directory_iterator
acts as the end iterator.

The value type of directory_iterator is directory_entry. A
directory_entry object contains a path and file_status
information. A
directory_entry object
can be used directly, but can also be passed to path arguments in function calls.

The other need is increased robustness in the face of the many kinds of
errors that can affect file system operations. We could do that at the level of
each call to a Boost.Filesystem function (see Error
reporting), but it is easier to supply an overall try/catch block.

Give tut3 a try, passing it a path to a directory as a command line argument.
Here is a run on a checkout of the Boost Subversion trunk, followed by a repeat
of the test cases that caused exceptions on Linux and Windows:

The listing would be much easier to read if only the filename was
displayed, rather than the full path.

The Linux listing isn't sorted. That's because the ordering of
directory iteration is unspecified. Ordering depends on the underlying
operating system API and file system specifics. So we need to sort the
results ourselves.

That completes the main portion of this tutorial. If you haven't already
worked through the Class path sections of this tutorial, dig into them now.
The Error reporting section may also be of
interest, although it can be skipped unless you are deeply concerned about
error handling issues.

Note that the exact appearance of the smiling face will depend on the font,
font size, and other settings for your command line window. The above tests were
run with out-of-the-box Ubuntu 9.10 and Windows 7, US Edition. If you don't get
the above results, take a look at the boost-root/libs/filesystem/example/test
directory with your system's GUI file browser, such as Linux Nautilus, Mac OS X
Finder, or Windows Explorer. These tend to be more comfortable with
international character sets than command line interpreters.

Class path takes care of whatever character type or encoding
conversions are required by the particular operating system. Thus as
tut5 demonstrates, it's no problem to pass a wide character string to a
Boost.Filesystem operational function even if the underlying operating system
uses narrow characters, and visa versa. And the same applies to user supplied
functions that take const path& arguments.

Class path also provides path syntax that is portable across operating systems,
element iterators, and observer, composition, decomposition, and query
functions to manipulate the elements of a path. The next section of this
tutorial deals with path syntax.

Class path deals with two different pathname
formats - generic format and native format. For POSIX-like
file systems, these formats are the same. But for users of Windows and
other non-POSIX file systems, the distinction is important. Even
programmers writing for POSIX-like systems need to understand the distinction if
they want their code to be portable to non-POSIX systems.

The generic format is the familiar /my_directory/my_file.txt format used by POSIX-like
operating systems such as the Unix variants, Linux, and Mac OS X. Windows also
recognizes the generic format, and it is the basis for the familiar Internet URL
format. The directory
separator character is always one or more slash characters.

The native format is the format as defined by the particular
operating system. For Windows, either the slash or the backslash can be used as
the directory separator character, so /my_directory\my_file.txt
would work fine. Of course, if you write that in a C++ string literal, it
becomes "/my_directory\\my_file.txt".

If a drive specifier or a backslash appears
in a pathname on a Windows system, it is always treated as the native format.

Class path has observer functions that allow you to
obtain the string representation of a path object in either the native format
or the generic format. See the next section
for how that plays out.

The distinction between generic format and native format is important when
communicating with native C-style API's and with users. Both tend to expect
paths in the native format and may be confused by the generic format. The generic
format is great, however, for writing portable programs that work regardless
of operating system.

The next section covers class path observers, composition,
decomposition, query, and iteration over the elements of a path.

The path_info.cpp program is handy for learning how class path
iterators,
observers, composition, decomposition, and query functions work on your system.
If it hasn't already already been built on your system, please build it now. Run
the examples below on your system, and try some different path arguments as we
go along.

path_info produces several dozen output lines every time it's
invoked. We will only show the output lines we are interested in at each step.

First we'll look at iteration over the elements of a path, and then use
iteration to illustrate the difference between generic and native format paths.

Ubuntu Linux

Microsoft Windows

$ ./path_info /foo/bar/baa.txt
...
elements:
/
foo
bar
baa.txt

>path_info /foo/bar/baa.txt
...
elements:
/
foo
bar
baa.txt

Thus on both POSIX and Windows based systems the path "/foo/bar/baa.txt"
is seen as having four elements.

Native format observers should be used when interacting with the
operating system or with users; that's what they expect.

Generic format observers should be used when the results need to be
portable and uniform regardless of the operating system.

path objects always hold pathnames in the native
format, but otherwise leave them unchanged from their source. The
preferred() function will convert to the
preferred form, if the native format has several forms. Thus on Windows, it will
convert slashes to backslashes.

These are pretty self-evident, but do note the difference in the
result of is_absolute() between Linux and Windows. Because there is
no root name (i.e. drive specifier or network name), a lone slash (or backslash)
is a relative path on Windows.

On to composition!

Class path uses / and /= operators to
append elements. That's a reminder
that these operations append the operating system's preferred directory
separator if needed. The preferred
directory separator is a slash on POSIX-like systems, and a backslash on
Windows-like systems.

path_info.cpp
composes a path by appending each of the command line elements to an initially
empty path:

The only significant difference between the two is how they report errors.

The
first signature will throw exceptions to report errors. A filesystem_error exception will be thrown
on an
operational error. filesystem_error is derived from std::runtime_error.
It has a
member function to obtain the error_code reported by the source
of the error. It also has member functions to obtain the path or paths that caused
the error.

Motivation for the second signature: Throwing exceptions on errors was the entire error reporting story for the earliest versions of
Boost.Filesystem, and indeed throwing exceptions on errors works very well for
many applications. But user reports trickled in that some code became so
littered with try and catch blocks as to be unreadable and unmaintainable. In
some applications I/O errors aren't exceptional, and that's the use case for
the second signature.

Functions with a system::error_code& argument set that
argument to report operational error status, and so do not throw exceptions when I/O
related errors occur. For a full explanation, see
Error reporting in the reference
documentation.