Contents of the TUTOR_C.DOC file

C is a computer language available on the GCOS and UNIXoperating systems at Murray Hill and (in preliminary form)on OS/360 at Holmdel. C lets you write your programsclearly and simply _ it has decent control flow facilitiesso your code can be read straight down the page, withoutlabels or GOTO's; it lets you write code that is compactwithout being too cryptic; it encourages modularity and goodprogram organization; and it provides good data-structuringfacilities.

This memorandum is a tutorial to make learning C aspainless as possible. The first part concentrates on thecentral features of C; the second part discusses those partsof the language which are useful (usually for getting moreefficient and smaller code) but which are not necessary forthe new user. This is "not" a reference manual. Details andspecial cases will be skipped ruthlessly, and no attemptwill be made to cover every language feature. The order ofpresentation is hopefully pedagogical instead of logical.Users who would like the full story should consult the 'CReference Manual' by D. M. Ritchie [1], which should be readfor details anyway. Runtime support is described in [2] and[3]; you will have to read one of these to learn how to com-pile and run a C program.

We will assume that you are familiar with the mysteriesof creating files, text editing, and the like in the operat-ing system you run on, and that you have programmed in somelanguage before.

2. A Simple C Program

main( ) { printf("hello, world"); }

A C program consists of one or more functions, whichare similar to the functions and subroutines of a Fortranprogram or the procedures of PL/I, and perhaps some externaldata definitions. main is such a function, and in fact allC programs must have a main. Execution of the programbegins at the first statement of main. main will usuallyinvoke other functions to perform its job, some coming fromthe same program, and others from libraries.

One method of communicating data between functions isby arguments. The parentheses following the function namesurround the argument list; here main is a function of noarguments, indicated by ( ). The {} enclose the statementsof the function. Individual statements end with a semicolonbut are otherwise free-format.

printf is a library function which will format andprint output on the terminal (unless some other destinationis specified). In this case it prints

hello, world

A function is invoked by naming it, followed by a list ofarguments in parentheses. There is no CALL statement as inFortran or PL/I.

Arithmetic and the assignment statements are much thesame as in Fortran (except for the semicolons) or PL/I. Theformat of C programs is quite free. We can put severalstatements on a line if we want, or we can split a statementamong several lines if it seems desirable. The split may bebetween any of the operators or variables, but NOT in themiddle of a name or operator. As a matter of style, spaces,tabs, and newlines should be used freely to enhance reada-bility.

There are also arrays and structures of these basic types,pointers to them and functions that return them, all ofwhich we will meet shortly.

All variables in a C program must be declared, althoughthis can sometimes be done implicitly by context. Declara-tions must precede executable statements. The declaration

int a, b, c, sum;

declares a, b, c, and sum to be integers.

Variable names have one to eight characters, chosenfrom A-Z, a-z, 0-9, and _, and start with a non-digit.Stylistically, it's much better to use only a single caseand give functions and external variables names that areunique in the first six characters. (Function and externalvariable names are used by various assemblers, some of whichare limited in the size and case of identifiers they canhandle.) Furthermore, keywords and library functions mayonly be recognized in one case.

4. Constants

We have already seen decimal integer constants in theprevious example _ 1, 2, and 3. Since C is often used forsystem programming and bit-manipulation, octal numbers arean important part of the language. In C, any number thatbegins with 0 (zero!) is an octal integer (and hence can'thave any 8's or 9's in it). Thus 0777 is an octal constant,with decimal value 511.

A ``character'' is one byte (an inherently machine-dependent concept). Most often this is expressed as a characterconstant, which is one character enclosed in singlequotes. However, it may be any quantity that fits in abyte, as in flags below:

char quest, newline, flags; quest = '?'; newline = '\n'; flags = 077;

The sequence `\n' is C notation for ``newline charac-ter'', which, when printed, skips the terminal to the begin-ning of the next line. Notice that `\n' represents only asingle character. There are several other ``escapes'' like`\n' for representing hard-to-get or invisible characters,such as `\t' for tab, `\b' for backspace, `\0' for end offile, and `\\' for the backslash itself.

float and double constants are discussed in section 26.

5. Simple I/O _ getchar, putchar, printf

main( ) { char c; c = getchar( ); putchar(c); }

getchar and putchar are the basic I/O library functionsin C. getchar fetches one character from the standard input(usually the terminal) each time it is called, and returnsthat character as the value of the function. When itreaches the end of whatever file it is reading, thereafterit returns the character represented by `\0' (ascii NUL,which has value zero). We will see how to use this veryshortly.

putchar puts one character out on the standard output(usually the terminal) each time it is called. So the pro-gram above reads one character and writes it back out. Byitself, this isn't very interesting, but observe that if weput a loop around this, and add a test for end of file, wehave a complete program for copying one file to another.

printf is a more complicated function for producingformatted output. We will talk about only the simplest useof it. Basically, printf uses its first argument as format-ting information, and any successive arguments as variablesto be output. Thus

printf ("hello, world\n");

is the simplest use _ the string ``hello, world\n'' isprinted out. No formatting information, no variables, sothe string is dumped out verbatim. The newline is necessaryto put this out on a line by itself. (The construction

"hello, world\n"

is really an array of chars. More about this shortly.)

More complicated, if sum is 6,

printf ("sum is %d\n", sum);

prints

sum is 6

Within the first argument of printf, the characters ``%d''signify that the next argument in the argument list is to beprinted as a base 10 number.

Other useful formatting commands are ``%c'' to printout a single character, ``%s'' to print out an entirestring, and ``%o'' to print a number as octal instead ofdecimal (no leading zero). For example,

Notice that there is no newline at the end of the first out-put line. Successive calls to printf (and/or putchar, forthat matter) simply put out characters. No newlines areprinted unless you ask for them. Similarly, on input, char-acters are read one at a time as you ask for them. Eachline is generally terminated by a newline (\n), but there isotherwise no concept of record.

The condition to be tested is any expression enclosedin parentheses. It is followed by a statement. The expres-sion is evaluated, and if its value is non-zero, the state-ment is executed. There's an optional else clause, to bedescribed soon.

The character sequence `==' is one of the relationaloperators in C; here is the complete set:

== equal to (.EQ. to Fortraners) != not equal to > greater than < less than >= greater than or equal to <= less than or equal to

The value of ``expression relation expression'' is 1 ifthe relation is true, and 0 if false. Don't forget that theequality test is `=='; a single `=' causes an assignment,not a test, and invariably leads to disaster.

Tests can be combined with the operators `&&' (AND),`||' (OR), and `!' (NOT). For example, we can test whethera character is blank or tab or newline with

if( c==' ' || c=='\t' || c=='\n' ) ...

C guarantees that `&&' and `||' are evaluated left to right_ we shall soon see cases where this matters.

One of the nice things about C is that the statementpart of an if can be made arbitrarily complicated by enclos-ing a set of statements in {}. As a simple example, supposewe want to ensure that a is bigger than b, as part of a sortroutine. The interchange of a and b takes three statementsin C, grouped together by {}:

if (a < b) { t = a; a = b; b = t; }

As a general rule in C, anywhere you can use a simplestatement, you can use any compound statement, which is justa number of simple or compound ones enclosed in {}. Thereis no semicolon after the } of a compound statement, butthere _i_s a semicolon after the last non-compound statementinside the {}.

The ability to replace single statements by complexones at will is one feature that makes C much more pleasantto use than Fortran. Logic (like the exchange in the previ-ous example) which would require several GOTO's and labelsin Fortran can and should be done in C without any, usingcompound statements.

7. While Statement; Assignment within an Expression; NullStatement

The basic looping mechanism in C is the while state-ment. Here's a program that copies its input to its outputa character at a time. Remember that `\0' marks the end offile.

main( ) { char c; while( (c=getchar( )) != '\0' ) putchar(c); }

The while statement is a loop, whose general form is

while (expression) statement

Its meaning is

(a) evaluate the expression (b) if its value is true (i.e., not zero) do the statement, and go back to (a)

Because the expression is tested before the statement isexecuted, the statement part can be executed zero times,which is often desirable. As in the if statement, theexpression and the statement can both be arbitrarily compli-cated, although we haven't seen that yet. Our example getsthe character, assigns it to c, and then tests if it's a`\0''. If it is not a `\0', the statement part of the whileis executed, printing the character. The while thenrepeats. When the input character is finally a `\0', thewhile terminates, and so does main.

Notice that we used an assignment statement

c = getchar( )

within an expression. This is a handy notational shortcutwhich often produces clearer code. (In fact it is often theonly way to write the code cleanly. As an exercise, re-write the file-copy without using an assignment inside anexpression.) It works because an assignment statement has avalue, just as any other expression does. Its value is thevalue of the right hand side. This also implies that we canuse multiple assignments like

x = y = z = 0;

Evaluation goes from right to left.

By the way, the extra parentheses in the assignmentstatement within the conditional were really necessary: ifwe had said

c = getchar( ) != '\0'

c would be set to 0 or 1 depending on whether the characterfetched was an end of file or not. This is because in theabsence of parentheses the assignment operator `=' isevaluated after the relational operator `!='. When indoubt, or even if not, parenthesize.

Since putchar(c) returns c as its function value, wecould also copy the input to the output by nesting the callsto getchar and putchar:

main( ) { while( putchar(getchar( )) != '\0' ) ; }

What statement is being repeated? None, or technically, thenull statement, because all the work is really done withinthe test part of the while. This version is slightly dif-ferent from the previous one, because the final `\0' iscopied to the output before we decide to stop.

8. Arithmetic

The arithmetic operators are the usual `+', `-', `*',and `/' (truncating integer division if the operands areboth int), and the remainder or mod operator `%':

x = a%b;

sets x to the remainder after a is divided by b (i.e., a modb). The results are machine dependent unless a and b areboth positive.

In arithmetic, char variables can usually be treatedlike int variables. Arithmetic on characters is quitelegal, and often makes sense:

c = c + 'A' - 'a';

converts a single lower case ascii character stored in c toupper case, making use of the fact that corresponding asciiletters are a fixed distance apart. The rule governing thisarithmetic is that all chars are converted to int before thearithmetic is done. Beware that conversion may involvesign-extension _ if the leftmost bit of a character is 1,the resulting integer might be negative. (This doesn't hap-pen with genuine characters on any current machine.)

Characters have different sizes on different machines.Further, this code won't work on an IBM machine, because theletters in the ebcdic alphabet are not contiguous.

9. Else Clause; Conditional Expressions

We just used an else after an if. The most generalform of if is

if (expression) statement1 else statement2

the else part is optional, but often useful. The canonicalexample sets x to the minimum of a and b:

if (a < b) x = a; else x = b;

Observe that there's a semicolon after x=a.

C provides an alternate form of conditional which isoften more concise. It is called the ``conditional expres-sion'' because it is a conditional which actually has avalue and can be used anywhere an expression can. The valueof

ais a if a is less than b; it is b otherwise. In general,the form

expr1 ? expr2 : expr3

means ``evaluate expr1. If it is not zero, the value of thewhole thing is expr2; otherwise the value is expr3.''

If's and else's can be used to construct logic thatbranches one of several ways and then rejoins, a common pro-gramming structure, in this way:

if(...) {...} else if(...) {...} else if(...) {...} else {...}

The conditions are tested in order, and exactly one block isexecuted _ either the first one whose if is satisfied, orthe one for the last else. When this block is finished, thenext statement executed is the one after the last else. Ifno action is to be taken for the ``default'' case, omit thelast else.

For example, to count letters, digits and others in afile, we could write

++n is equivalent to n=n+1 but clearer, particularly when nis a complicated expression. `++' and `--' can be appliedonly to int's and char's (and pointers which we haven't gotto yet).

The unusual feature of `++' and `--' is that they canbe used either before or after a variable. The value of ++kis the value of k AFTER it has been incremented. The valueof k++ is k BEFORE it is incremented. Suppose k is 5. Then

x = ++k;

increments k to 6 and then sets x to the resulting value,i.e., to 6. But

x = k++;

first sets x to to 5, and THEN increments k to 6. Theincrementing effect of ++k and k++ is the same, but theirvalues are respectively 5 and 6. We shall soon see exampleswhere both of these uses are important.

11. Arrays

In C, as in Fortran or PL/I, it is possible to makearrays whose elements are basic types. Thus we can make anarray of 10 integers with the declaration

int x[10];

The square brackets mean subscripting; parentheses are usedonly for function references. Array indexes begin at zero,so the elements of x are

x[0], x[1], x[2], ..., x[9]

If an array has n elements, the largest subscript is n-1.

Multiple-dimension arrays are provided, though not muchused above two dimensions. The declaration and use looklike

Text is usually kept as an array of characters, as wedid with line[ ] in the example above. By convention in C,the last character in a character array should be a `\0'because most programs that manipulate character arraysexpect it. For example, printf uses the `\0' to detect theend of a character array when printing it out with a `%s'.

We can copy a character array s into another t likethis:

i = 0; while( (t[i]=s[i]) != '\0' ) i++;

Most of the time we have to put in our own `\0' at theend of a string; if we want to print the line with printf,it's necessary. This code prints the character count beforethe line:

Here we increment n in the subscript itself, but only afterthe previous value has been used. The character is read,placed in line[n], and only then n is incremented.

There is one place and one place only where C puts inthe `\0' at the end of a character array for you, and thatis in the construction

"stuff between double quotes"

The compiler puts a `\0' at the end automatically. Textenclosed in double quotes is called a _s_t_r_i_n_g; its propertiesare precisely those of an (initialized) array of characters.

13. For Statement

The for statement is a somewhat generalized while thatlets us put the initialization and increment parts of a loopinto a single statement along with the test. The generalform of the for is

for( initialization; expression; increment ) statement

The meaning is exactly

initialization; while( expression ) { statement increment; }

Thus, the following code does the same array copy as theexample in the previous section:

for( i=0; (t[i]=s[i]) != '\0'; i++ );

This slightly more ornate example adds up the elements of anarray:

sum = 0; for( i=0; i sum = sum + array[i];

In the for statement, the initialization can be leftout if you want, but the semicolon has to be there. Theincrement is also optional. It is NOT followed by a semi-colon. The second clause, the test, works the same way asin the while: if the expression is true (not zero) doanother loop, otherwise get on with the next statement. Aswith the while, the for loop may be done zero times. If theexpression is left out, it is taken to be always true, so

for( ; ; ) ...

and

while( 1 ) ...

are both infinite loops.

You might ask why we use a for since it's so much likea while. (You might also ask why we use a while because...)The for is usually preferable because it keeps the codewhere it's used and sometimes eliminates the need for com-pound statements, as in this code that zeros a two-dimensional array:

for( i=0; i for( j=0; j array[i][j] = 0;

14. Functions; Comments

Suppose we want, as part of a larger program, to countthe occurrences of the ascii characters in some input text.Let us also map illegal characters (those with value>127 or<0) into one pile. Since this is presumably an isolatedpart of the program, good practice dictates making it aseparate function. Here is one way:

We have already seen many examples of calling a function, solet us concentrate on how to define one. Since count hastwo arguments, we need to declare them, as shown, givingtheir types, and in the case of buf, the fact that it is anarray. The declarations of arguments go between the argu-ment list and the opening `{'. There is no need to specifythe size of the array buf, for it is defined outside ofcount.

The return statement simply says to go back to the cal-ling routine. In fact, we could have omitted it, since areturn is implied at the end of a function.

What if we wanted count to return a value, say thenumber of characters read? The return statement allows forthis too:

As is often the case, all the work is done by the assignmentstatement embedded in the test part of the for. Again, thedeclarations of the arguments s1 and s2 omit the sizes,because they don't matter to strcopy. (In the section onpointers, we will see a more efficient way to do a stringcopy.)

There is a subtlety in function usage which can trapthe unsuspecting Fortran programmer. Simple variables (notarrays) are passed in C by ``call by value'', which meansthat the called function is given a copy of its arguments,and doesn't know their addresses. This makes it impossibleto change the value of one of the actual input arguments.

There are two ways out of this dilemma. One is to makespecial arrangements to pass to the function the address ofa variable instead of its value. The other is to make thevariable a global or external variable, which is known toeach function by its name. We will discuss both possibili-ties in the next few sections.

15. Local and External Variables

If we say

f( ) { int x; ... } g( ) { int x; ... }

each x is LOCAL to its own routine _ the x in f is unrelatedto the x in g. (Local variables are also called``automatic''.) Furthermore each local variable in a routineappears only when the function is called, and _d_i_s_a_p_p_e_a_r_swhen the function is exited. Local variables have no memoryfrom one call to the next and must be explicitly initializedupon each entry. (There is a static storage class for mak-ing local variables with memory; we won't discuss it.)

As opposed to local variables, external variables aredefined external to all functions, and are (potentially)available to all functions. External storage always remainsin existence. To make variables external we have to definethem external to all functions, and, wherever we want to usethem, make a declaration.

Roughly speaking, any function that wishes to access anexternal variable must contain an extern declaration for it.The declaration is the same as others, except for the addedkeyword extern. Furthermore, there must somewhere be adefinition of the external variables external to all func-tions.

External variables can be initialized; they are set tozero if not explicitly initialized. In its simplest form,initialization is done by putting the value (which must be aconstant) after the definition:

int nchar 0; char flag 'f'; etc.

This is discussed further in a later section.

This ends our discussion of what might be called thecentral core of C. You now have enough to write quite sub-stantial C programs, and it would probably be a good idea ifyou paused long enough to do so. The rest of this tutorialwill describe some more ornate constructions, useful but notessential.

16. Pointers

A pointer in C is the address of something. It is arare case indeed when we care what the specific addressitself is, but pointers are a quite common way to get at thecontents of something. The unary operator `&' is used toproduce the address of an object, if it has one. Thus

int a, b; b = &a;

puts the address of a into b. We can't do much with itexcept print it or pass it to some other routine, because wehaven't given b the right kind of declaration. But if wedeclare that b is indeed a pointer to an integer, we're ingood shape:

int a, *b, c; b = &a; c = *b;

b contains the address of a and `c = *b' means to use thevalue in b as an address, i.e., as a pointer. The effect isthat we get back the contents of a, albeit ratherindirectly. (It's always the case that `*&x' is the same asx if x has an address.)

The most frequent use of pointers in C is for walkingefficiently along arrays. In fact, in the implementation ofan array, the array name represents the address of thezeroth element of the array, so you can't use it on the leftside of an expression. (You can't change the address ofsomething by assigning to it.) If we say

char *y; char x[100];

y is of type pointer to character (although it doesn't yetpoint anywhere). We can make y point to an element of x byeither of

y = &x[0]; y = x;

Since x is the address of x[0] this is legal and consistent.

Now `*y' gives x[0]. More importantly,

*(y+1) gives x[1] *(y+i) gives x[i]

and the sequence

y = &x[0]; y++;

leaves y pointing at x[1].

Let's use pointers in a function length that computeshow long a character array is. Remember that by conventionall character arrays are terminated with a `\0'. (And ifthey aren't, this program will blow up inevitably.) The oldway:

You can now see why we have to say what kind of thing spoints to _ if we're to increment it with s++ we have toincrement it by the right amount.

The pointer version is more efficient (this is almostalways true) but even more compact is

for( n=0; *s++ != '\0'; n++ );

The `*s' returns a character; the `++' increments thepointer so we'll get the next character next time around.As you can see, as we make things more efficient, we alsomake them less clear. But `*s++' is an idiom so common thatyou have to know it.

Going a step further, here's our function strcopy thatcopies a character array s to another t.

strcopy(s,t) char *s, *t; { while(*t++ = *s++); }

We have omitted the test against `\0', because `\0' isidentically zero; you will often see the code this way.(You MUST have a space after the `=': see section 25.)

For arguments to a function, and there only, thedeclarations

char s[ ]; char *s;

are equivalent _ a pointer to a type, or an array ofunspecified size of that type, are the same thing.

If this all seems mysterious, copy these forms untilthey become second nature. You don't often need anythingmore complicated.

17. Function Arguments

Look back at the function strcopy in the previous sec-tion. We passed it two string names as arguments, then pro-ceeded to clobber both of them by incrementation. So howcome we don't lose the original strings in the function thatcalled strcopy?

As we said before, C is a ``call by value'' language:when you make a function call like f(x), the VALUE of x ispassed, not its address. So there's no way to ALTER x frominside f. If x is an array (char x[10]) this isn't a prob-lem, because x IS an address anyway, and you're not tryingto change it, just what it addresses. This is why strcopyworks as it does. And it's convenient not to have to worryabout making temporary copies of the input arguments.

But what if x is a scalar and you do want to change it?In that case, you have to pass the ADDRESS of x to f, andthen use it as a pointer. Thus for example, to interchangetwo integers, we must write

flip(x, y) int *x, *y; { int temp; temp = *x; *x = *y; *y = temp; }

and to call flip, we have to pass the addresses of the vari-ables:

flip (&a, &b);

18. Multiple Levels of Pointers; Program Arguments

When a C program is called, the arguments on the com-mand line are made available to the main program as an argu-ment count argc and an array of character strings argv con-taining the arguments. Manipulating these arguments is oneof the most common uses of multiple levels of pointers(``pointer to pointer to ...''). By convention, argc isgreater than zero; the first argument (in argv[0]) is thecommand name itself.

Step by step: main is called with two arguments, the argu-ment count and the array of arguments. argv is a pointer toan array, whose individual elements are pointers to arraysof characters. The zeroth argument is the name of the com-mand itself, so we start to print with the first argument,until we've printed them all. Each argv[i] is a characterarray, so we use a `%s' in the printf.

You will sometimes see the declaration of argv writtenas

char *argv[ ];

which is equivalent. But we can't use char argv[ ][ ],because both dimensions are variable and there would be noway to figure out how big the array is.

Here's a bigger example using argc and argv. A commonconvention in C programs is that if the first argument is`-', it indicates a flag of some sort. For example, supposewe want a program to be callable as

prog -abc arg1 arg2 ...

where the `-' argument is optional; if it is present, it maybe followed by any combination of a, b, and c.

There are several things worth noticing about thiscode. First, there is a real need for the left-to-rightevaluation that && provides; we don't want to look atargv[1] unless we know it's there. Second, the statements

--argc; ++argv;

let us march along the argument list by one position, so wecan skip over the flag argument as if it had never existed _the rest of the program is independent of whether or notthere was a flag argument. This only works because argv isa pointer which can be incremented.

19. The Switch Statement; Break; Continue

The switch statement can be used to replace the multi-way test we used in the last example. When the tests arelike this:

The case statements label the various actions we want;default gets done if none of the other cases are satisfied.(A default is optional; if it isn't there, and none of thecases match, you just fall out the bottom.)

The break statement in this example is new. It isthere because the cases are just labels, and after you doone of them, you fall through to the next unless you takesome explicit action to escape. This is a mixed blessing.On the positive side, you can have multiple cases on a sin-gle statement; we might want to allow both upper and lower case 'a': case 'A': ...

case 'b': case 'B': ... etc.

But what if we just want to get out after doing case `a' ?We could get out of a case of the switch with a label and agoto, but this is really ugly. The break statement lets usexit without either goto or label.

The break statement also works in for and while statements _it causes an immediate exit from the loop.

The continue statement works _o_n_l_y inside for's andwhile's; it causes the next iteration of the loop to bestarted. This means it goes to the increment part of thefor and the test part of the while. We could have used acontinue in our example to get on with the next iteration ofthe for, but it seems clearer to use break instead.

20. Structures

The main use of structures is to lump together collec-tions of disparate variable types, so they can convenientlybe treated as a unit. For example, if we were writing acompiler or assembler, we might need for each identifierinformation like its name (a character array), its sourceline number (an integer), some type information (a charac-ter, perhaps), and probably a usage count (another integer).

char id[10]; int line; char type; int usage;

We can make a structure out of this quite easily. Wefirst tell C what the structure will look like, that is,what kinds of things it contains; after that we can actuallyreserve storage for it, either in the same statement orseparately. The simplest thing is to define it and allocatestorage all at once:

struct { char id[10]; int line; char type; int usage; } sym;

This defines sym to be a structure with the specifiedshape; id, line, type and usage are members of the struc-ture. The way we refer to any particular member of thestructure is

Although the names of structure members never stand alone,they still have to be unique _ there can't be another id orusage in some other structure.

So far we haven't gained much. The advantages ofstructures start to come when we have arrays of structures,or when we want to pass complicated data layouts betweenfunctions. Suppose we wanted to make a symbol table for upto 100 identifiers. We could extend our definitions like

char id[100][10]; int line[100]; char type[100]; int usage[100];

but a structure lets us rearrange this spread-out informa-tion so all the data about a single identifer is collectedinto one lump:

struct { char id[10]; int line; char type; int usage; } sym[100];

This makes sym an array of structures; each array elementhas the specified shape. Now we can refer to members as

Suppose we now want to write a function lookup(name)which will tell us if name already exists in sym, by givingits index, or that it doesn't, by returning a -1. We can'tpass a structure to a function directly _ we have to eitherdefine it externally, or pass a pointer to it. Let's trythe first way first.

This makes psym a pointer to our kind of structure (the sym-bol table), then initializes it to point to the first ele-ment of sym.

Notice that we added something after the word struct: a``tag'' called symtag. This puts a name on our structuredefinition so we can refer to it later without repeating thedefinition. It's not necessary but useful. In fact wecould have said

struct symtag { ... structure definition };

which wouldn't have assigned any storage at all, and thensaid

struct symtag sym[100]; struct symtag *psym;

which would define the array and the pointer. This could becondensed further, to

struct symtag sym[100], *psym;

The way we actually refer to an member of a structureby a pointer is like this:

ptr -> structure-member

The symbol `->' means we're pointing at a member of a

C Tutorial - 27 -

structure; `->' is only used in that context. ptr is apointer to the (base of) a structure that contains thestructure member. The expression ptr->structure-memberrefers to the indicated member of the pointed-to structure.Thus we have constructions like:

psym->type = 1; psym->id[0] = 'a';

and so on.

For more complicated pointer expressions, it's wise touse parentheses to make it clear who goes with what. Forexample,

struct { int x, *y; } *p; p->x++ increments x ++p->x so does this! (++p)->x increments p before getting x *p->y++ uses y as a pointer, then increments it *(p->y)++ so does this *(p++)->y uses y as a pointer, then increments p

The way to remember these is that ->, . (dot), ( ) and [ ]bind very tightly. An expression involving one of these istreated as a unit. p->x, a[i], y.x and f(b) are namesexactly as abc is.

If p is a pointer to a structure, any arithmetic on ptakes into account the acutal size of the structure. Forinstance, p++ increments p by the correct amount to get thenext element of the array of structures. But don't assumethat the size of a structure is the sum of the sizes of itsmembers _ because of alignments of different sized objects,there may be ``holes'' in a structure.

In main we test the pointer returned by lookup againstzero, relying on the fact that a pointer is by definitionnever zero when it really points at something. The otherpointer manipulations are trivial.

The only complexity is the set of lines like

struct symtag *lookup( );

This brings us to an area that we will treat only hurriedly_ the question of function types. So far, all of our func-tions have returned integers (or characters, which are muchthe same). What do we do when the function returns some-thing else, like a pointer to a structure? The rule is thatany function that doesn't return an int has to say expli-citly what it does return. The type information goes beforethe function name (which can make the name hard to see).Examples:

char f(a) int a; { ... }

int *g( ) { ... }

struct symtag *lookup(s) char *s; { ... }

The function f returns a character, g returns a pointer toan integer, and lookup returns a pointer to a structure thatlooks like symtag. And if we're going to use one of thesefunctions, we have to make a declaration where we use it, aswe did in main above.

Notice th parallelism between the declarations

struct symtag *lookup( ); struct symtag *psym;

C Tutorial - 29 -

In effect, this says that lookup( ) and psym are both usedthe same way _ as a pointer to a strcture _ even though oneis a variable and the other is a function.

21. Initialization of Variables

An external variable may be initialized at compile timeby following its name with an initializing value when it isdefined. The initializing value has to be something whosevalue is known at compile time, like a constant.

This last one is very useful _ it makes keyword an array ofpointers to character strings, with a zero at the end so wecan identify the last element easily. A simple lookup rou-tine could scan this until it either finds a match orencounters a zero keyword pointer:

A complete C program need not be compiled all at once;the source text of the program may be kept in several files,and previously compiled routines may be loaded fromlibraries. How do we arrange that data gets passed from oneroutine to another? We have already seen how to use func-tion arguments and values, so let us talk about externaldata. Warning: the words declaration and definition areused precisely in this section; don't treat them as the samething.

A major shortcut exists for making extern declarations.If the definition of a variable appears BEFORE its use insome function, no extern declaration is needed within thefunction. Thus, if a file contains

f1( ) { ... }

int foo;

f2( ) { ... foo = 1; ... }

f3( ) { ... if ( foo ) ... }

no declaration of foo is needed in either f2 or or f3,because the external definition of foo appears before them.But if f1 wants to use foo, it has to contain the declara-tion

f1( ) { extern int foo; ... }

This is true also of any function that exists onanother file _ if it wants foo it has to use an externdeclaration for it. (If somewhere there is an externdeclaration for something, there must also eventually be anexternal definition of it, or you'll get an ``undefined sym-bol'' message.)

There are some hidden pitfalls in external declarationsand definitions if you use multiple source files. To avoidthem, first, define and initialize each external variableonly once in the entire set of files:

int foo 0;

You can get away with multiple external definitions on UNIX,but not on GCOS, so don't ask for trouble. Multiple ini-tializations are illegal everywhere. Second, at the begin-ning of any file that contains functions needing a variablewhose definition is in some other file, put in an externdeclaration, outside of any function:

extern int foo;

f1( ) { ... } etc.

The #include compiler control line, to be discussedshortly, lets you make a single copy of the externaldeclarations for a program and then stick them into each ofthe source files making up the program.

23. #define, #include

C provides a very limited macro facility. You can say

#define name something

and thereafter anywhere ``name'' appears as a token, ``some-thing'' will be substituted. This is particularly useful inparametering the sizes of arrays:

Now we have meaningful words instead of mysterious con-stants. (The mysterious operators `&' (AND) and `|' (OR)will be covered in the next section.) It's an excellentpractice to write programs without any literal constantsexcept in #define statements.

There are several warnings about #define. First,there's no semicolon at the end of a #define; all the textfrom the name to the end of the line (except for comments)is taken to be the ``something''. When it's put into thetext, blanks are placed around it. Good style typicallymakes the name in the #define upper case _ this makes param-eters more visible. Definitions affect things only afterthey occur, and only within the file in which they occur.Defines can't be nested. Last, if there is a #define in afile, then the first character of the file MUST be a `#', tosignal the preprocessor that definitions exist.

The other control word known to C is #include. Toinclude one file in your source at compilation time, say

#include "filename"

This is useful for putting a lot of heavily used data defin-itions and #define statements at the beginning of a file tobe compiled. As with #define, the first line of a file con-taining a #include has to begin with a `#'. And #includecan't be nested _ an included file can't contain another#include.

24. Bit Operators

C has several operators for logical bit-operations.For example,

x = x & 0177;

forms the bit-wise AND of x and 0177, effectively retainingonly the last seven bits of x. Other operators are

All floating arithmetic is done in double precision.Mixed mode arithmetic is legal; if an arithmetic operator inan expression has both operands int or char, the arithmeticdone is integer, but if one operand is int or char and theother is float or double, both operands are converted todouble. Thus if i and j are int and x is float,

(x+i)/j converts i and j to float x + i/j does i/j integer, then converts

C has a goto statement and labels, so you can branchabout the way you used to. But most of the time goto'saren't needed. (How many have we used up to this point?)The code can almost always be more clearly expressed byfor/while, if/else, and compound statements.

One use of goto's with some legitimacy is in a programwhich contains a long loop, where a while(1) would be tooextended. Then you might write

mainloop: ... goto mainloop;

Another use is to implement a break out of more than onelevel of for or while. goto's can only branch to labelswithin the same function.

28. Acknowledgements

I am indebted to a veritable host of readers who madevaluable criticisms on several drafts of this tutorial.They ranged in experience from complete beginners throughseveral implementors of C compilers to the C languagedesigner himself. Needless to say, this is a wide enoughspectrum of opinion that no one is satisfied (including me);comments and suggestions are still welcome, so that somefuture version might be improved.

References

C is an extension of B, which was designed by D. M.Ritchie and K. L. Thompson [4]. The C language design andUNIX implementation are the work of D. M. Ritchie. The GCOSversion was begun by A. Snyder and B. A. Barres, and com-pleted by S. C. Johnson and M. E. Lesk. The IBM version isprimarily due to T. G. Peterson, with the assistance of M.E. Lesk.