NOTE: These notes are by Allan Gottlieb, and are
reproduced here, with superficial modifications, with his permission.
"I" in this text generally refers to Prof. Gottlieb, except
in regards to administrative matters.

================ Start Lecture #14
(Apr. 2)
================

Since last lecture was about disks, I'll move onto files now, and then
come back to other I/O devices.

Chapter 6: File Systems

Requirements

Size: Store very large amounts of data.

Persistence: Data survives the creating process.

Access: Multiple processes can access the data concurrently.

Solution: Store data in files that together form a file system.

6.1: Files

6.1.1: File Naming

Very important. A major function of the file system.

Does each file have a unique name?
Answer: Often no. We will discuss this below when we study
links.

Extensions, e.g. the ``html'' in ``class-notes.html''.

Conventions just for humans: letter.teq (my convention).

Conventions giving default behavior for some programs.

The emacs editor thinks .html files should be edited in
html mode but
can edit them in any mode and can edit any file
in html mode.

Netscape thinks .html means an html file, but
<html> ... </html> works as well

Gzip thinks .gz means a compressed file but accepts a
--suffix flag

Default behavior for Operating system or window manager or
desktop environment.

Click on .xls file in windows and excel is started.

Click on .xls file in nautilus under linux and gnumeric is
started.

Required extensions for programs

The gnu C compiler (and probably others) requires C
programs be named *.c and assembler programs be named *.s

Required extensions by operating systems

MS-DOS treats .com files specially

Windows 95 requires (as far as I can tell) shortcuts to
end in .lnk.

Case sensitive?
Unix: yes. Windows: no.

6.1.2: File structure

A file is a

Byte stream

Unix, dos, windows (I think).

Maximum flexibility.

Minimum structure.

(fixed size) Record stream: Out of date

80-character records for card images.

133-character records for line printer files. Column 1 was
for control (e.g., new page) Remaining 132 characters were printed.

There can be
several different magic numbers for different types of
executables.

Strongly typed files:

The type of the file determines what you can do with the
file.

This make the easy and (hopefully) common case easier and, more
importantly safer.

It tends to make the unusual case harder. For example, you have a
program that turns out data (.dat) files. But you want to use it to
turn out a java file but the type of the output is data and cannot be
easily converted to type java.

6.1.4: File access

There are basically two possibilities, sequential access and random
access (a.k.a. direct access).
Previously, files were declared to be sequential or random.
Modern systems do not do this.
Instead all files are random and optimizations are applied when the
system dynamically determines that a file is (probably) being accessed
sequentially.

With Sequential access the bytes (or records)
are accessed in order (i.e., n-1, n, n+1, ...).
Sequential access is the most common and
gives the highest performance.
For some devices (e.g. tapes) access ``must'' be sequential.

With random access, the bytes are accessed in any
order. Thus each access must specify which bytes are desired.

6.1.5: File attributes

A laundry list of properties that can be specified for a file
For example:

hidden

do not dump

owner

key length (for keyed files)

6.1.6: File operations

Create:
Essential if a system is to add files. Need not be a separate system
call (can be merged with open).

Delete:
Essential if a system is to delete files.

Open:
Not essential. An optimization in which the translation from file name to
disk locations is perform only once per file rather than once per access.

Close:
Not essential. Free resources.

Read:
Essential. Must specify filename, file location, number of bytes,
and a buffer into which the data is to be placed.
Several of these parameters can be set by other
system calls and in many OS's they are.

Write:
Essential if updates are to be supported. See read for parameters.

Seek:
Not essential (could be in read/write). Specify the
offset of the next (read or write) access to this file.

Get attributes:
Essential if attributes are to be used.

Set attributes:
Essential if attributes are to be user settable.

Rename:
Tanenbaum has strange words. Copy and delete is not acceptable for
big files. Moreover copy-delete is not atomic. Indeed link-delete is
not atomic so even if link (discussed below)
is provided, renaming a file adds functionality.

If the system only has one directory, but allows the character / in
a file name. Then one could fake a tree by having a file named
/allan/gottlieb/courses/arch/class-notes.html
rather than a directory allan, a subdirectory gottlieb, ..., a file
class-notes.html.

Dos (windows) is a forest, unix a tree. In dos there is no common
parent of a:\ and c:\.

But windows explorer makes the dos forest look quite a bit like a
tree. Indeed, the original gnome file manager for linux, looks A LOT
like windows explorer.

You can get an effect similar to (but not the same as) one X per
user by having just one X in the system and having permissions that
permits each user to visit only a subset. Of course if the system
doesn't have permissions, this is not possible.

Today's systems have a tree per system or a forest per system.

6.2.4: Path Names

You can specify the location of a file in the file hierarchy by
using either an absolute versus or a
Relative path to the file

An absolute path starts at the (or a if we have a forest) root.

A relative path starts at the
current (a.k.a working) directory.

The special directories . and .. represent the current directory
and the parent of the current directory respectively.

Homework: 1, 9.

6.2.5: Directory operations

Create: Produces an ``empty'' directory.
Normally the directory created actually contains . and .., so is not
really empty

Delete: Requires the directory to be empty (i.e., to just contain
. and ..). Commands are normally written that will first empty the
directory (except for . and ..) and then delete it. These commands
make use of file and directory delete system calls.

Opendir: Same as for files (creates a ``handle'')

Closedir: Same as for files

Readdir: In the old days (of unix) one could read directories as files
so there was no special readdir (or opendir/closedir). It was
believed that the uniform treatment would make programming (or at
least system understanding) easier as there was less to learn.

However, experience has taught that this was not a good idea since
the structure of directories then becomes exposed. Early unix had a
simple structure (and there was only one). Modern systems have more
sophisticated structures and more importantly they are not fixed
across implementations.

Unlink: Remove a directory entry. This is how a file is deleted.
But if there are many links and just one is unlinked, the file
remains. Discussed in more
detail below.

6.3: File System Implementation

6.3.2; Implementing Files

A disk cannot read or write a single word. Instead it can read or
write a sector, which is often 512 bytes.

Disks are written in blocks whose size is a multiple of the sector
size.

When we study I/O in the next chapter I will bring in some
physically large (and hence old) disks so that we can see what they
look like and understand better sectors (and tracks, and cylinders,
and heads, etc.).

Contiguous allocation

This is like OS/MVT.

The entire file is stored as one piece.

Simple and fast for access, but ...

Problem with growing files

Must either evict the file itself or the file it is bumping
into.

Same problem with an OS/MVT kind of system if jobs grow.

Problem with external fragmentation.

Not used for general purpose systems. Ideal for systems where
files do not change size.

Linked allocation

The directory entry contains a pointer to the first block of the file.