Redirecting Standard Input

Many commands obtain their input from files specified on the
command-line. Most of these commands also work if no files
are specified. In these cases, the command reads input from
the standard input stream, which by default is the keyboard.
The cat command provides a good example.

$ cat /etc/group
The command opens the file /etc/group and uses
the content of the file for input.
$ cat
Now I am entering data (via the keyboard) that is
going to be used as input to the 'cat' command.
This is the standard input stream. You tell the
shell that you are done entering data by typing
a <CTRL-D> character. [Note: on an ASCII
system, <CTRL-D> is EOT.]
<CTRL-D>
The 'cat' command is executed without any arguments; therefore,
it gets input from the standard input stream.
$ cat < /etc/group
From a user perspective this has the same effect as if
/etc/group was specified as a command-line argument,
but internally the standard input stream was re-directed
from the keyboard to the file /etc/group.

In some cases you want to execute a command and have it get
input from both the standard input stream and a file.

$ cat - /etc/group
This is the content of /etc/group:
==================================
<CTRL-D>
On some systems, the use of - may not be supported by
all commands. The file /dev/stdin can be used instead.
$ cat /dev/stdin /etc/group
This is the content of /etc/group:
==================================
<CTRL-D>

Redirecting Input and Output

You can execute a command and have both the input and output
streams re-directed.

$ cat < /etc/group > foo
Execute the 'cat' command re-directing standard input to
the file /etc/group (i.e. the content of /etc/group
becomes the standard input stream). The output of the
'cat' command is re-directed to the file foo.

Pipes

Recall, the output of a command be redirected by using the
> operator.

Many commands receive their input from a file or from in
the standard input stream. Input can be redirected into
a command by using the < command.

$ sort /etc/group
The sort program displays the content of /etc/group
sorted in alphabetical order.
$ sort < /etc/group
Instead of getting its input from a file, the sort command
sorts the standard input stream (which in this happens to be
the content of the /etc/group file).

In many cases you need to the take the output of command and
use it as input to another command. This can be easily
accomplished by using the | (or pipe) operator.

$ ls -l | grep "Oct 28"
Do a long listing and pipe the output into the grep
command searching for the pattern "Oct 28". The output
of the command sequence will be a long listing on all
files that were created and/or modified on "Oct 28".
$ wc -l /etc/passwd
wc -l counts and prints the number of lines found
in the file argument /etc/passwd.
$ who | wc -l
The output of the who command is piped into the
word count program wc . wc when executed with
the -l option prints the number of lines it finds
in its input.
$ cut -f1 -d":" /etc/passwd | sort | uniq | wc -l
The 'cut' command prints the values of field
one found in the /etc/passwd file having colon
delimited fields. The output is sorted using
the Unix 'sort' command and the sorted output
is sent into the 'uniq' program which eliminates
duplicate values. The unique list of values is
then counted by the 'wc -l' command.
$ cat /etc/passwd | tr [a-z] [A-Z] | grep THURM | wc -l
This script prints the number of times the string
"thurm" is located in the /etc/passwd file. Searching
is not case sensitive. Exercise: describe what is
going on.
$ tail -100 $logs | grep "^csnet:" 2>/dev/null | sort | pr -s | more
Exercise: describe what is going on.

Early Unix History and Evolution

The following was copy/pasted from Ritchie's website.

Pipes appeared in Unix in 1972, well after the PDP-11
version of the system was in operation, at the suggestion
(or perhaps insistence) of M. D. McIlroy, a long-time
advocate of the non-hierarchical control flow that
characterizes coroutines. Some years before pipes
were implemented, he suggested that commands should
be thought of as binary operators, whose left and
right operand specified the input and output files.
Thus a copy utility would be commanded by
inputfile copy outputfile

* An asterisk matches 0 or more characters in a file name.
? A question mark matches any single character.
[ ] Square brackets can surround a choice of characters to match.

Note: asterisk does not match file names that start with a dot
(i.e. hidden files).

$ ls -x
d01.shtml d02.shtml d03.shtml d04.shtml d05
Display all files names found in the current directory.
$ ls -x *.shtml
d01.shtml d02.shtml d03.shtml d04.shtml
Display all file names that end with the string .shtml
$ ls -x ?05
d05
Display all file names that are 3 characters long and
end in 05
$ ls -x d0[1-2]*
d01.shtml d02.shtml
Display all file names that start with d0 and are at
least three characters long. The third character must
fall in the range of 1 to 2 (inclusive) and can be
followed by 0 or more characters.
$ ls -x *5*l
d05.shtml
Display all file names that have a 5 somewhere in them
(except the last character) and that end with a lowercase
ell character.
$ ls -x d0[123].shtml
d01.shtml d02.shtml d03.shtml
Display all file names that start with d0 followed
by either a 1, 2, 3 followed by s.html. Every file
name must be 9 characters long.
$ ls -x ???
d05
Display all file names that are three characters long.
$ cat d*
Display all file names that start with a lowercase dee.
$ rm *.shtml
Removes all files that end with the string .shtml
$ grep "the end" d0?
Searches for the pattern "the end" in all files having names
that are three characters long and start with d0
$ mv *0* /tmp
Moves all files having a 0 in their name to the /tmp directory.
$ mv temp[0-9] /tmp
Moves all files having the name temp followed by a single
digit to the /tmp directory.
$ mv [A-Z]* /tmp
Moves all files beginning with an uppercase letter to
the /tmp directory.

When you enter a command-line that contains meta-characters. The
shell expands the command-line to include the fully named
files. In otherwords, commands see complete file names; they do
not know about the meta-characters.

Assume we have a directory that has the following files:
x.out y.out z.err w.out t.err c.obj
$ rm *.out
The shell expands *.out and the following command-line
is executed: rm x.out y.out w.out
$ rm *.junk
No file names are found ending with .junk therefore *.junk
is the argument that is passed to the 'rm' program. The 'rm'
command will try to open a file named *.junk and will fail
resulting in an error message.

Caution needs to be exercised when using meta-characters on the
command-line. The following has happened to many users:

$ rm x *
The user wants to remove all files starting with x but
the space between the x and * causes all files to be removed.
The shell expands the * to all file names which in turn get
passed onto the 'rm' command.

There is a maximum length that the command-line can end up being. For
example, if you have directory containing a 1000 files having long file
names, then a command like rm * may not work.

What happens if you have file name that has an asterisk in it? For
example, suppose we have directory containing the following files:

Introduction to the grep Command

The grep command is used to search for a pattern
in a file or list of files. The pattern used by grep
is called a regular expression. On some Unix systems,
by default, the grep command supports basic-REs.
[grep: global regular expression print (g/re/p)
or general regular expression parser]

$ grep Unix d01.shtml
Print all lines from the file d01.shtml containing
the pattern Unix.
$ grep -i UNIX d*.shtml
Search all files starting the character 'd' and ending
in the string ".shtml" that contain the pattern UNIX.
The search is not case sensitive; therefore, Unix, UNix,
uniX, UNIX, unix, ..., all match.
$ grep "#include <iostream.h>" *.c
Look for the string #include <iostream.h> in
all files ending in ".c". Since the pattern contains
strings and metacharacters, it must be specified using
double quotes.
$ grep -l "while (" *.cpp
Search all "*.cpp" files for the pattern "while ("
and only display the names of the files with one
more matching lines, not the lines themselves.
$ grep -v UNIX foo
Display only those lines from the file foo that
do not contain the pattern UNIX.
$ grep -v -c UNIX foo
Display a count of the number of lines from the file
foo that do not contain the pattern UNIX.
$ who | grep "^jdoe "
Find out if jdoe is logged in.
$ grep Unix foo >/dev/null 2>&1
$ echo $?
Search the file foo for the pattern Unix and re-direct
the output to /dev/null. Use the exit status of the grep
command to determine if the pattern was found (0 indicates
that it was, 1 implies it wasn't).
$ crypt some_key < roster.done | grep jdoe
Find the password entry for use jdoe in the encrypted
file roster.done that was encrypted using some_key.
$ grep -E "unix|UNIX" foo
$ egrep "unix|UNIX" foo
-E causes grep to run as egrep. egrep supports
extended-REs (Regular Expressions). Search for either
the pattern unix or the pattern UNIX.
$ grep -F hello foo
$ fgrep hello foo
-F causes grep to run as fgrep. If the pattern is
a string literal (i.e. fixed), then fgrep may be used.
$ pgrep -u root
pgrep is a customized grep that is used to search
the process table. -u LOGNAME displays a list of
PIDs for user LOGNAME.

Introduction to the find Command

The find command locates files that match a given set
of criteria in a hierarchy of directories. The criterion may
be filename or a specified property of a file (such as its
modification date, size, or type). You can also direct the
command remove, print, or otherwise act on the file.

find pathname search-options action-option
pathname -- directory from which find begins the search; the
search is recursive (sub-directories, if any, are
also searched)
search-options -- identifies the file you are interested in
action-options -- tells what to do once the file is found
$ find . -print
Begin searching from current working directory and print
the name of all files found.
$ find / -name foo.c -print
Begin searching for a file named foo.c from the root
directory and print its name if found. More than one
file can be found with the name foo.c. Caution: if you
are not super-user, then you will not have permission to
search many directories.
$ find $HOME -name "*.c" -print
Begin searching in the HOME directory for all files
that end in ".c". Print their names, if found.
$ find . -type d -print
Begin searching from current working directory and print
all files that are of type directory.
$ find /tmp -type f -print
Begin searching from the /tmp directory and print
all files that are of regular files.
$ find . -mtime 10 -print
Find and display files last modified exactly 10 days ago.
$ find . -mtime -10 -print
Find and display files last modified less than 10 days ago.
$ find . -mtime +10 -print
Find and display files last modified more than 10 days ago.
$ find . -atime +10 -print
Find and display files last accessed (read) more
than 10 days ago.
$ find . -newer .lastbkup -print
Find and display files modified more recently than
.lastbkup was.
$ find . -user jdoe -size +50 -print
Find and display files owned by jdoe that are
larger than 50 blocks in size.
$ find . -type f -exec chmod 644 {} \;
Find and change permissions on all regular files.
$ find . -name foo.c -mtime +30 -exec rm {} \;
Find and remove all instances of the file foo.c
that are 30 days old.
$ find . -name foo.c -mtime +30 -ok rm {} \;
Find and interactively remove all instances of the
file foo.c that are 30 days old.
$ find . -inum 8888 -print
Find all and display files having i-node number 8888.
This is a handy way to remove files that have goofy
(e.g. non-printable) characters in their names.

What is a Process?

A program is an executable (or binary) file the resides on
some sort of secondary storage device (e.g. hard drive, floppy disk,
tape, CD-ROM, etc.). [Programs are sometimes called applications
or commands.]

A program is typically generated by translating some sort of
source code into a machine language that is readable by the CPU
(Central Processing Unit). Some programs contain source code
that is interpreted by some other program. These are commonly
referred to as scripts.

Typically, program files are located in a common directory or
collection of directories. On many Unix systems, programs are
found in the following locations:

/bin
/usr/bin
/usr/local/bin

Note that the term bin is used to represent the word
binary. Programs are often called binary files.
Binary files have a non-ASCII format to them and cannot
be modified using a regular text editor. Not all binary
files are programs -- in some cases, they are data base
type files.

Sadly, a binary file generated on one version of Unix may
not be executable on a different Unix system even though
the systems may be using the same CPU. [There are different
types of executable formats.]

To execute a program, the binary file must be loaded into
the memory of the computer. Once this is accomplished, the
program now becomes a process. Said another way:
A process is an
instance of a program.

The shell is usually the program responsible for getting
a program loaded into memory. [The shell -- when executing
-- is a process.]

Once a process has been created, it is assigned a PID
or process identifier that is used to track the process
while it executes.

Every process has a parent process to which it belongs.
In many cases, that parent process is the shell.

The following are some crude notes that have not been incorporated
into the lecture note. The are intended for the CSC178 --
Programming in the Unix Environment course.

More on Processes

A program file loaded into memory becomes a process.

Every process is assigned a PID (process id) by the kernel. PIDs
are assigned on a sequential basis and usually wrap when they
read the value 32,767 (2-bytes might be used to store the PID
in the process table).

At the lowest-level, a process is created by a fork
system call. If the new process is a different program file,
then a exec is invoked (fork and exec).
[
Webopedia.com::system call]

<side-bar>
Unix consists of six program system calls.

open, read, write, close, fork, exec

</side-bar>

Every process has a parent process. For shell users,
their shell is the parent process of the commands executed.

To aid with terminology, the following paragraph was taken
from page 656 of the text book for the course.

There is a special metaphor that applies to processes
in the Unix system. The processes have life:
they are alive or dead; they are
spawned (born) or die; they
become zombies or they become orphaned.
They are parents or children, and when you
want to get rid of one, you kill it.

The pkill Command

Daemons

Daemon processes are used to extend
the functionality of the OS. They are not part of
the kernel, but they play important roles in providing
applications not directly supported by the kernel.
[webserver, telnet, mail, line printer, cron, etc.]

A daemon is a process that starts at boot-time
and continues as long as the system is up. Some
daemons start and stop on an as needed basis
and some run at scheduled time periods.

Some daemons can be thought of
as service providers.

Many daemon program names end with a dee ('d').
[httpd, inetd, lpd, crond, ...]

About the Word: daemon

There term daemon was first used in
computing during the early 1960's. At one
time the term meant "an attendant spirit
that influences one's character or personality.
A daemon is neither good or evil; they are
creatures of independent thought and will."

Running Commands in the Background

Unix is a multi-tasking system. Not only can it support multiple
users, but each user can in turn have multiple tasks running.

Usually when you execute a command, the shell takes over your
terminal and you have to wait for the command to end before you
see the shell prompt again. In some instances, the command you
need to execute will take a long time and you don't want to sit
idle waiting for it to finish; in other words, you want to "spawn"
off the command letting it run in the background while you go
ahead and issue more commands. This can be accomplished by
using an ampersand at the end of the command-line.

$ make &
939
$ who
...
$ ps
...
$ date
...
Typically, the 'make' command can take a long time to
execute. Therefore, we start off the command and use
a & on the command-line to spin it off in the background.
This shell displays the PID (process id) of the command
and re-issues the shell prompt. Now we can execute more
commands while the 'make' program runs as a background job.

When you execute commands in the background, you have to be
careful with respect to processing ouput. If all the commands
write data to the standard output stream and/or error streams,
then the data from the various commands will be mixed together.
In many instances, commands executed in the background have
their output re-directed into a file.

$ make 1>make.out 2>make.err &
1411
$ who
...
$ date
...

You need to be careful when executing interactive programs
in the background. When you do, then you can have multiple
programs reading the standard input stream and this doesn't
work [at least two -- the shell and the program executed as
a background process]. Typically, the standard input stream
is re-directed (i.e. input is obtained from an object other
than the keybard) when interactive programs are executed
in the background.

When you use nohup , you should always re-direct
command output. If you don't, then the command
re-directs both output streams to a file called
nohup.out (generally, not a good
idea because what happens when you nohup
a couple of commands?).

A process can be written to ignore or catch signals
except for the SIGKILL (signal value 9).
If the process contains no signal handling logic, then
the default behavior is for the process to terminate.
If you want to terminate a program, then first use
SIGTERM followed by a SIGKILL
if the first kill doesn't work.

Typically, when working at the command-line,
typing a <ctrl-c> causes
a SIGINT (the interrupt signal
value 2) to be sent to a process.

The operating system can send signals to a process
when the process performs an illegal operation (e.g.
attempts a divide-by-zero or access an invalid memory
location).

Many daemon processes read configuration files
upon startup. If the configuration is modified
while the daemon is running, then a signal is
usually sent to the daemon to instruct it to
re-read its configuration files. SIGHUP
is commonly used for this purpose.

The crontab Command

A crontab command is used to maintain a "database"
of files named crontab that in turn are used by the
cron daemon process to execute jobs a specific time.

A daemon is a process that is not connected to a terminal;
they usually run in the background; and are commonly used on Unix
systems.

Each user can have a crontab file. The system
administrator can prohibit you from using the cron facility
by placing your name in the /usr/lib/cron/cron.deny file.
If a /usr/lib/cron/cron.allow file exists, then only
those users listed in that file can use cron. If both
files exist, then /usr/lib/cron/cron.allowed is the
file that is used. If neither file exists, then only root
is allowed to use cron.

You will want to use cron over at when you have
jobs that you want to run more than once (or you don't have
permission to use at).

Typically, when you run a command via cron, command output
is re-directed into a file. If it isn't, then the output is sent
to you via email.

17 5 * * 0 /etc/cleanup > /dev/null
At 5:17 on every Sunday, execute /etc/cleanup.
0 2 * * 0,4 /usr/lib/cron/logchecker
At 2:00 on every Sunday and Thursday, execute the
/usr/lib/cron/logchecker program.
30 12 15 * * ulimit 5000;
/bin/su uucp -c "/usr/lib/uucp/uudemon.clean" > /dev/null
At 12:30 on the 15th day of every month, run the commands
ulimit and uudemon.clean.
0 1 1 * * cp /usr/adm/messages /usr/adm/messages.prev; >/usr/adm/messages
At 1:00 on the 1st day of every month, make a backup copy
of the /usr/adm/messages file and then clear the file.
55 16 1,15 1,4,7,10 * >/usr/spool/mail/uucp
At 16:55 on the 1st and 15th of Jan., Apr., Jul., and Oct.,
clear out the mail file for the uucp account.
0,15,30,45 0-3 * * 1,3,5 /usr/local/bin/happy >/tmp/happy.out 2>&1
Every Monday, Wednesday and Friday, beginning at mid-night
and ending at 3:45, run the command /usr/local/bin/happy
at 0, 15, 30 and 45 minutes after the hour.

It is important to note that cron executes with a
limited environment. In other words, the jobs that you
run using cron should not rely on a particular
environment setup. If they do, then the command needs
to take steps to get that environment established.

[Realworld Experience] I once worked on a system (SCO)
where the cron program magically died on the
56th day of running. This was a major problem because
most customers never reported the problems to the support
group. We could never track the problem down; therefore,
we added a cron entry to cause the machine to reboot
once every month. [This is the worst, but common way to
"fix" a problem -- treat the symptom not the
illness.]

Shell Scripts

The shell -- in addition to being a command-line interpreter --
is also a script programming language.

The shell programming language is a 3rd-generation high-level
programming language that provides constructs for structured
programming; in other words, it supports sequence, selection
and iteration (repetition).

A shell script is a file that contains a series of commands
for a shell program to execute. As a shell script is executed,
each command in the file is interperted by the shell. Shell
script programs are not compiled.

Shell script filenames are typically lower-case, but they
can contain upper-case letters, digits and other characters.
A suffix on a filename can be used to imply file content, but
the shell does not key off of filename suffixes.

After a shell script program has been written and saved to
a file, the file is typically "marked" executable
and it is copied to a "bin" directory that is
contained on the user's PATH.

Within a shell script file, the octothorp character
# is used to start a comment. Comments
are ignored by the shell executing the shell script.
The # starts a comment and it stays in
effect until the end-of-line.

Shell scripts can use variables to store values (data/information).
A variable is a piece of memory given a name to store a value. The
variable has no value until one is assigned to it. The equal sign =
is the assignment operator and it is used to assign a value to a variable.
Spaces are not allowed on either side of the equal sign. Variable names begin
with a letter or an underscore; you can use letters, numbers and underscores
for the rest of the name. The naming of variables is important and the programmer
often has to design and implement a naming convention. Variables do not have to
be given a type by the programmer.

Example Shell Script Programs

Suppose you are always curious as to how many regular files
are in your current working directory. The following shell
script program could be written.

file name: filecnt.sh
----------------------
echo "`pwd` has `ls -l | grep '^-' | wc -l` file(s)"
----------------------
$ sh filecnt.sh
From the shell's perspective, filecnt.sh is a regular file
containing plain-old-text; therefore, to execute it, another
shell is started and the file is supplied as an argument.
The shell command ('sh' in this example) expects a shell
script filename as an argument. If no filename is specified,
then sh issues a command-line prompt. The sh command opens
the file and sequentially executes each line in the file.
$ chmod 755 filecnt.sh
Typically, shell scripts that are created for re-use are marked
executable. This eliminates the need to invoke a shell command
in order to run the script.
$ ./filecnt.sh
The ./ must be prefixed to the command-name to tell sh where
the script file is located. This is not necessary if the current
working directory (.) is part of the PATH. Recall, if . is on
your PATH, then it should be last component.
$ PATH="$PATH:$HOME/bin:."
$ filecnt.sh
Placing our current working directory on our PATH eliminates
the need to prefix the command-name with dot-slash.
$ ln filecnt.sh $HOME/bin/filecnt
Re-usable commands are placed in a directory that
is included in our PATH.
$ filecnt
If this script is to be executed over and over, then it
should be stored in a directory that is on you PATH. If
you do this, then you can execute the command from any
directory. Typically, you do not see the ".sh" suffix
used on executable files.

Example

Let us assume you want to be presented with the date and time
each time you log off, and that you want an entry made in
a log file.

file name: bye.sh
------------------
now=`date`
echo "Good bye... it is now $now"
echo $now >>$HOME/.logout_times
exit
------------------
$ chmod 755 bye.sh
$ ln bye.sh $HOME/bin/bye
$ bye
Good Bye... it is now ...
$
Our expectation is that the exit statement in the
script will result in a logout, but it doesn't. This
is because the shell spawns another shell to execute
the command and the exit statement causes that shell
to terminate.
$ . bye # or we can execute source bye
Good Bye... it is now ...
Connection closed.
The dot and source commands are built-in shell commands
that lets you execute a program in the current shell and
prevents the shell from creating a sub-shell in order to
execute the command-line.

Example

The following is example of using the read command.

file name: whoRU.sh
--------------------
echo "Enter first name, last name, address, city, state."
echo "Example: john doe 2541 N. Whatever, Tempe, AZ"
echo -n " "
read firstname lastname address
echo "You are $lastname, $firstname and your address is $address"
--------------------
By default, whitespace is used to separate words on
the input stream. If there are no variables supplied to
read all the words, then all the remaining words are assigned
to the last variable.