[3.0] C Input & Output

v3.1.1 / chapter 3 of 5 / 01 sep 14 / greg goebel / public domain

* This chapter covers console (keyboard/display) and file I/O. You've
already seen one console-I/O function, "printf()", and there are several
others. C has two separate approaches toward file I/O, one based on library
functions that is similar to console I/O, and a second that uses "system
calls". These topics are discussed in detail below.

* Console I/O in general means communications with the computer's keyboard
and display. However, in most modern operating systems the keyboard and
display are simply the default input and output devices, and user can easily
redirect input from, say, a file or other program and redirect output to,
say, a serial I/O port:

type infile > myprog > com

The program itself, "myprog", doesn't know the difference. The program uses
console I/O to simply read its "standard input (stdin)" -- which might be
the keyboard, a file dump, or the output of some other program -- and print
to its "standard output (stdout)" -- which might be the display or printer
or another program or a file. The program itself neither knows nor cares.

Console I/O requires the declaration:

#include <stdio.h>

Useful functions include:

printf() Print a formatted string to stdout.
scanf() Read formatted data from stdin.
putchar() Print a single character to stdout.
getchar() Read a single character from stdin.
puts() Print a string to stdout.
gets() Read a line from stdin.

PC-based compilers also have an alternative library of console I/O functions.
These functions require the declaration:

#include <conio.h>

The three most useful PC console I/O functions are:

getch() Get a character from the keyboard (no need to press Enter).
getche() Get a character from the keyboard and echo it.
kbhit() Check to see if a key has been pressed.

* The "printf()" function, as explained previously, prints a string that may
include formatted data:

Using the wrong format code for a particular data type can lead to bizarre
output. Further control can be obtained with modifier codes; for example, a
numeric prefix can be included to specify the minimum field width:

%10d

This specifies a minimum field width of ten characters. If the field width
is too small, a wider field will be used. Adding a minus sign:

%-10d

-- causes the text to be left-justified. A numeric precision can also be
specified:

%6.3f

This specifies three digits of precision in a field six characters wide. A
string precision can be specified as well, to indicate the maximum number of
characters to be printed. For example:

There is no "&" in front of "name", since the name of a string is already a
pointer. Input fields are separated by whitespace (space, tab, or newline),
though a count, for example "%10d", can be included to define a specific
field width. Formatting codes are the same as for "printf()", except:

There is no "%g" format code.

The "%f" and "%e" format codes work the same.

There is a "%h" format code for reading short integers.

If characters are included in the format code, "scanf()" will read in the
characters and discard them. For example, if the example above were modified
as follows:

scanf( "%d,%s", &val, name );

-- then "scanf()" will assume that the two input values are comma-separated
and swallow the comma when it is encountered.

If a format code is preceded with an asterisk, the data will be read and
discarded. For example, if the example were changed to:

scanf( "%d%*c%s", &val, name );

-- then if the two fields were separated by a ":", that character would be
read in and discarded.

The "scanf()" function will return the value EOF (an "int"), defined in
"stdio.h", when its input is terminated.

* The "putchar()" and "getchar()" functions handle single character I/O. For
example, the following program accepts characters from standard input one at
a time:

The "getchar" function returns an "int" and also terminates with an EOF.
Notice the neat way C allows a program to get a value and then test it in the
same expression, a particularly useful feature for handling loops.

One word of warning on single-character I/O: if a program is reading
characters from the keyboard, most operating systems won't send the
characters to the program until the user presses the "Enter" key, meaning
it's not possible to perform single-character keyboard I/O this way.

The little program above is the essential core of a character-mode text
"filter", a program that can perform some transformation between standard
input and standard output. Such a filter can be used as an element to
construct more sophisticated applications:

type file.txt > filter1 | filter2 > outfile.txt

The following filter capitalizes the first character in each word in the
input. The program operates as a "state machine", using a variable that can
be set to different values, or "states", to control its operating mode. It
has two states: SEEK, in which it is looking for the first character, and
REPLACE, in which it is looking for the end of a word.

In SEEK state, it scans through whitespace (space, tab, or newline), echoing
characters. If it finds a printing character, it converts it to uppercase
and goes to REPLACE state. In REPLACE state, it converts characters to
lowercase until it hits whitespace, and then goes back to SEEK state.

The program uses the "tolower()" and "toupper()" functions to make case
conversions. These two functions will be discussed in the next chapter.

* The "puts()" function is like a simplified version of "printf()" without
format codes. It prints a string that is automatically terminated with a
newline:

puts( "Hello world!" );

The "gets()" function is particularly useful: it reads a line of text
terminated by a newline, though it doesn't read the newline into the string.
It is much less finicky about its inputs than "scanf()":

The "gets()" function returns a NULL, defined in "stdio.h", on input
termination or error.

* The PC-based console-I/O functions "getch()" and "getche()" operate much as
"getchar()" does, except that "getche()" echoes the character automatically.

The "kbhit()" function is very different in that it only indicates if a key
has been pressed or not. It returns a nonzero value if a key has been
pressed, and zero if it hasn't. This allows a program to poll the keyboard
for input, instead of hanging on keyboard input and waiting for something to
happen. As mentioned, these functions require the "conio.h" header file, not
the "stdio.h" header file.

* The file-I/O library functions are much like the console-I/O functions. In
fact, most of the console-I/O functions can be thought of as special cases of
the file-I/O functions. The library functions include:

fopen() Create or open a file for reading or writing.
fclose() Close a file after reading or writing it.
fseek() Seek to a certain location in a file.
rewind() Rewind a file back to its beginning and leave it open.
rename() Rename a file.
remove() Delete a file.
fprintf() Formatted write.
fscanf() Formatted read.
fwrite() Unformatted write.
fread() Unformatted read.
putc() Write a single byte to a file.
getc() Read a single byte from a file.
fputs() Write a string to a file.
fgets() Read a string from a file.

All these library functions depend on definitions made in the "stdio.h"
header file, and so require the declaration:

#include <stdio.h>

C documentation normally refers to these functions as performing "stream
I/O", not "file I/O". The distinction is that they could just as well handle
data being transferred through a modem as a file, and so the more general
term "data stream" is used rather than "file". However, we'll stay with the
"file" terminology in this document for the sake of simplicity.

The file pointer will be returned with the value NULL, defined in "stdio.h",
if there is an error. The "access modes" are defined as follows:

r Open for reading.
w Open and wipe (or create) for writing.
a Append -- open (or create) to write to end of file.
r+ Open a file for reading and writing.
w+ Open and wipe (or create) for reading and writing.
a+ Open a file for reading and appending.

The "filename" is simply a string of characters.

It is often useful to use the same statements to communicate either with
files or with standard I/O. For this reason, the "stdio.h" header file
includes predefined file pointers with the names "stdin" and "stdout".
There's no need to do an "fopen()" on them -- they can just be assigned to a
file pointer:

fpin = stdin;
fpout = stdout;

-- and any following file-I/O functions won't know the difference.

The "fclose()" function simply closes the file given by its file pointer
parameter. It has the syntax:

fclose( fp );

* The "fseek()" function call allows the byte location in a file to be
selected for reading or writing. It has the syntax:

fseek( <file_pointer>, <offset>, <origin> );

The offset is a "long" and specifies the offset into the file, in bytes. The
"origin" is an "int" and is one of three standard values, defined in
"stdio.h":

SEEK_SET Start of file.
SEEK_CUR Current location.
SEEK_END End of file.

The "fseek()" function returns 0 on success and non-zero on failure.

The "rewind()", "rename()", and "remove()" functions are straightforward.
The "rewind()" function resets an open file back to its beginning. It has
the syntax:

rewind( <file_pointer> );

The "rename()" function changes the name of a file:

rename( <old_file_name_string>, <new_file_name_string> );

The "remove()" function deletes a file:

remove( <file_name_string> )

* The "fprintf()" function allows formatted ASCII data output to a file, and
has the syntax:

fprintf( <file pointer>, <string>, <variable list> );

The "fprintf()" function is identical in syntax to "printf()", except for the
addition of a file pointer parameter. For example, the "fprintf()" call in
this little program:

Field-width specifiers can be used as well. The "fprintf()" function returns
the number of characters it dumps to the file, or a negative number if it
terminates with an error.

The "fscanf()" function is to "fprintf()" what "scanf()" is to "printf()":
it reads ASCII-formatted data into a list of variables. It has the syntax:

fscanf( <file pointer>, <string>, <variable list> );

However, the "string" contains only format codes, no text, and the "variable
list" contains the addresses of the variables, not the variables themselves.
For example, the program below reads back the two numbers that were stored
with "fprintf()" in the last example:

* The "fwrite()" and "fread()" functions are used for binary file I/O. The
syntax of "fwrite()" is as follows:

fwrite( <array_pointer>, <element_size>, <count>, <file_pointer> );

The array pointer is of type "void", and so the array can be of any type.
The element size and count, which give the number of bytes in each array
element and the number of elements in the array, are of type "size_t", which
are equivalent to "unsigned int".

The "fread()" function similarly has the syntax:

fread( <array_pointer>, <element_size>, <count>, <file_pointer> );

The "fread()" function returns the number of items it actually read.

The following program stores an array of data to a file, and then reads it
back using "fwrite()" and "fread()":

* File-I/O through system calls is simpler and operates at a lower level than
making calls to the C file-I/O library. There are seven fundamental file-I/O
system calls:

creat() Create a file for reading or writing.
open() Open a file for reading or writing.
close() Close a file after reading or writing.
unlink() Delete a file.
write() Write bytes to file.
read() Read bytes from file.

These calls were devised for the UNIX operating system and are not part of
the ANSI C spec. Use of these system calls requires a header file named
"fcntl.h":

#include <fcntl.h>

* The "creat()" system call, of course, creates a file. It has the syntax:

<file descriptor variable> = creat( <filename>, <protection bits> );

This system call returns an integer, called a "file descriptor", which is a
number that identifies the file generated by "creat()". This number is used
by other system calls in the program to access the file. Should the
"creat()" call encounter an error, it will return a file descriptor value of
-1.

The "filename" parameter gives the desired filename for the new file. The
"permission bits" give the "access rights" to the file. A file has three
"permissions" associated with it:

Write permission:

Allows data to be written to the file.

Read permission:

Allows data to be read from the file.

Execute permission:

Designates that the file is a program that can be run.

These permissions can be set for three different levels:

User level:

Permissions apply to individual user.

Group level:

Permissions apply to members of user's defined "group".

System level:

Permissions apply to everyone on the system.

For the "creat()" system call, the permissions are expressed in octal, with
an octal digit giving the three permission bits for each level of
permissions. In octal, the permission settings:

0644

-- grant read and write permissions for the user, but only read permissions
for group and system. The following octal number gives all permissions to
everyone:

0777

An attempt to "creat()" an existing file (for which the program has write
permission) will not return an error. It will instead wipe the contents
of the file and return a file descriptor for it.

For example, to create a file named "data" with read and write permission for
everyone on the system would require the following statements:

The "open()" system call opens an existing file for reading or writing. It
has the syntax:

<file descriptor variable> = open( <filename>, <access mode> );

The "open()" call is similar to the "creat()" call in that it returns a file
descriptor for the given file, and returns a file descriptor of -1 if it
encounters an error. However, the second parameter is an "access mode", not
a permission code. There are three modes (defined in the "fcntl.h" header
file):

O_RDONLY Open for reading only.
O_WRONLY Open for writing only.
O_RDWR Open for reading and writing.

For example, to open "data" for writing, assuming that the file had been
created by another program, the following statements would be used:

int fd;
fd = open( "data", O_WRONLY );

A few additional comments before proceeding:

A "creat()" call implies an "open()". There is no need to "creat()" a
file and then "open()" it.

There is an operating-system-dependent limit on the number of files that a
program can have open at any one time.

The file descriptor is no more than an arbitrary number that a program
uses to distinguish one open file for another. When a file is closed,
re-opening it again will probably not give it the same file descriptor.

The "close()" system call is very simple. All it does is "close()" an open
file when there is no further need to access it. The "close()" system call
has the syntax:

close( <file descriptor> );

The "close()" call returns a value of 0 if it succeeds, and returns -1 if it
encounters an error.

The "unlink()" system call deletes a file. It has the syntax:

unlink( <file_name_string> );

It returns 0 on success and -1 on failure.

* The "write()" system call writes data from a open file. It has the syntax:

write( <file descriptor>, <buffer>, <buffer length> );

The file descriptor is returned by a "creat()" or "open()" system call. The
"buffer" is a pointer to a variable or an array that contains the data; and
the "buffer length" gives the number of bytes to be written into the file.

While different data types may have different byte lengths on different
systems, the "sizeof()" statement can be used to provide the proper buffer
length in bytes. A "write()" call could be specified as follows:

float array[10];
...
write( fd, array, sizeof( array ) );

The "write()" function returns the number of bytes it actually writes. It
will return -1 on an error.

The "read()" system call reads data from a open file. Its syntax is exactly
the same as that of the "write()" call:

read( <file descriptor>, <buffer>, <buffer length> );

The "read()" function returns the number of bytes it actually returns.
At the end of file it returns 0, or returns -1 on error.