c book

I assume that all of you would have got the setup of running a C program ready. Some clarifications on my previous instructions:

TurboC is a 16 bit compiler made for MS−DOS. Some school student may say that it’s easy to start with but I say it is difficult to program in Turbo C. Now, we are living in the world of 64-bit compilers and so we should learn to program for that.

Why linux? Absolutely no need to use linux for learning C. But I would like to give an overview of the whole computer system (in this chapter itself), and it would be based on linux. As a computer science student if you don’t know how to use linux, you are like a purse without money. So, start using at least now.

Why AWS? It is not needed if you are having some linux distribution on your machine. But otherwise it is a very good idea to use AWS. It gives a shell on a linux system and you have full control of the OS. And it is free also (for normal usage). Moreover, AWS is used by many companies, and this would be a simple exercise to get used to it.

Now, if you are having a gcc or iccicc compiler on windows, its very much okay. But just one clarification: We won’t be using the word “Turbo C” further in any of the discussions.

What a C program does?

First of all why are we using C language? We can start from there:

Digital Circuits

We all know that the most important thing inside our CPU is the processor. It is made up of digital components like flip-flops and those who have studied digital circuits would understand that the digital circuits produce output from the input, when supplied with power and a clock. The speed at which the output is produced is determined by the clock speed of the CPU, because the clock determines the speed of transfer of bits between different units (and that’s why a 3 GHz processor is faster than a 1 GHz processor). But this will be effective only if the processor has the input data available. (Most times there is delay for the CPU to get the input data and that’s why the actual running time of a process on a 1GHz processor is not 3 ×, compared to the same process running on a 3GHz processor.)

Computer Organization

The reason for the above problem is that data is given to processor from memory and the time to take a data from memory to processor is high, say 100 ×, the speed at which the processor works. So, we use many techniques like buffering, caching etc. to minimize this effect.

Okay, so now our aim is to give data to the CPU and it has these circuits called Functional Units to perform the specified tasks. Each functional unit does some job like addition, multiplication etc. and they give the result when provided with the input(s). But these input and output are in binary, as all these units are digital. That is, the input to CPU will be a series of 00‘s and 11‘s and the output will also be a series of 0‘s and 1‘s. So, what happens to the CPU when we give some string of 00‘s and 11‘s? Not all such strings will produce a valid output and this is entirely dependent on the processor. The processor manufacturer clearly specifies what’s the meaning of each combination 0‘s and 1‘s and that is called the language of the processor which is entirely determined by its architecture.

Computer Architecture

Most of our systems are ×86 architecture and it has its own Instruction set. So, lets take an example string of 0‘s and 1‘s in it:

1

0000010100000000000000000000000000000001

This set of strings will add 1 to the content of EAX register (a register is a very fast memory inside the processor for holding data, and EAX is one among them in x86 architecture) which is a 32-bit register . The first byte is 00000101 which is given to the Instruction decode unit inside the processor which tells the CPU that it must add the next 32 bits to the contents of EAX register. Now, the next 32-bits will be given to the ADD unit which will add it to the contents of EAX register. In our example, the 32 bits represents just 1 and so the addition results in an increment of 1.

[Small question: Instead of ADD, if we use INC instruction, it results in better performance. How?]

So, this is how the CPU works- everything is in binary. We can get the meaning of each binary string from the Intel Instruction manual

To make things slightly easier they encode the bit-strings as HEX codes. So, the above bit string will become

1

0500000001

Even then, it is difficult to write a program like this. Because human mind is not good at remembering numbers. So, then came Assembly language. Here, we use mnemonics to represent each operation. For example ADD is the mnemonic to do addition, SUB is the mnemonic to do subtraction etc. So, instead of the above binary string, now we can write

1

ADD EAX,01

and the assembler will translate it into

1

0500000001

Ahh! Much easier world for a programmer. But imagine writing an assembly program to print the factorial of n. How much time is needed to write it using these kinds of mnemonics, for each operation in the algorithm? Wouldn’t it be better if we say the algorithm and then that is translated to binary string by <someone>? Yes, that’s where C language comes. We can straight away represent most algorithms in C, and the C compiler will translate it into the binary string in the same way an assembler would do for assembly language.

So, lets write the C code for the above addition.

1

2

inta;

a=a+1;

This will do the same job as

1

ADD EAX,01

assuming a is the value we had in EAX.

Now, to start executing a program we need an entry point to the code. This is often named as main. The program starts executing from the first instruction inside main. So, to make our C code complete, we do the following

C

1

2

3

4

5

6

#include<stdio.h>

intmain(){

inta;

a=a+1;

printf(" a = %d\n",a);

}

For now lets assume that the printf statement prints the value of a. But, C language has a restriction that before any variable is used, its type must be specified. So, before using printf, we must tell its type. Its type is written in a file called ‘‘stdio.h“ so we can include that file (which will make all the contents of that file as part of our file) or we can just give the correct type of printf as

1

intprintf(constchar*,...);

Directly writing the declaration of printf is not recommended and I just gave it to show the functionality of #include<stdio.h>. Many people still think that code of printing is inside ‘‘stdio.h“‘ and that’s completely wrong. Code of printf is part of standard C library which is available as libc (there must be a file called libc.so in linux and printf code is inside that). The usage of libc.so file is that the same code can be used by many programs (all linking to the same library code) so that reduces the main memory required to run the programs.

So, this usage of sharing libc saves the total memory required when these 4 programs are concurrently executing. (If you want to see how many processes are executing at this moment on your system, just type top in a shell).
By default gcc will link to any function inside libc, if we call them in our program. But if we use any other library function, we have to explicitly tell gcc to link to that library. For example, to use sin(x) in a code we have to link to math library (libm) as follows:

1

2

gcc prog.c-oprog-lm

(liswritten instead of lib,so libm becomes lm,libc becomes lc...)

Once we compile the code gcc will be producing the output in a file called prog which is given with the −o option. If we don’t give any −o−o option, output by default goes to a file called a.out. This will be in binary format and we cannot see it as text. This binary contains the bitstrings to be given to the processor as we discussed in the beginning. But some codes like that of printf is not inside this binary and is at a common location, which is called by our binary. Can we make copy the printf code and other library functions to be inside our binary? Yes, we can with the following command:

1

2

gcc prog.c-oprogS-lm-static

(progS isjustadifferent name)

Now, just see the size difference of the two binaries using ls command

1

ls-lprog

Now, to get the output by running the executable, we have to do

1

2

./prog

(./just tells that prog isinthe current directory)

Once the binary is produced by the compiler, before we get the output, there are many stages:

Loading: Copying the content of the binary file to memory

Linking: Fixing the calls in the binary which are to shared libraries as in the case of printf

Process starts: Now the OS makes a process for our program (it gets a pid) and it’ll be in ready state.

When the turn comes, our process will get executed (it can be stopped in between to give other processes their turn)

When our process is being executed, the binary strings inside it (which are instructions to the processor) are send one by one to the processor, which executes them

Memory Management: For all the memory required by the process, the Memory Management Unit (MMU) ensures that the memory addresses used inside the object code (which are virtual addresses) are properly mapped to physical memory locations on the RAM

So, even after the compilation (which of course includes its own phases) there are so many phases before we get the output of a C program.

We’ll stop this chapter after mentioning about how we get segmentation fault in our programs.

Segmentation Fault

When a process is made by the OS, it allots some memory to it. This can be increased during its execution, upon request to the OS, and can go up to a limit set by the OS. So, when this process goes to execution state, it can only access the memory allotted to it. (This is done by giving a page table to each process and all memory accesses are done through it). Whenever a process tries to access a memory which is not allotted to it, segmentation fault occurs. Segmentation fault also occurs, if a process tries to write something to a read only memory area. For example, a program memory consists of many parts called segments and there are code segment, data segment and stack segment. Of these, the data segment is again divided into Read Only (RO) data segment and Read Write (RW) data segment. Among these segments only the RW data segment and stack segment are allowed to be modified by a process. (Some systems allow code segment also to be writable and can be used for writing self modifiable code) If a process tries to modify any other segment, then also segmentation fault happens. (Segmentation fault also happens due to some special hardware instructions, but we can ignore them as this won’t happen for general programs compiled in a normal way.)

In this first chapter we have skimmed across compilers, memory management, process management, computer organization and computer architecture, which covers the basics of a Computer System. So, in order to run a very simple program itself we require all these. If you understand the basic functioning of these topics, that will be enough for an exam like GATE. From next chapter onward, we’ll go inside C.

* Operator

int*p;//p is a pointer to int. int *p is also syntactically correct but * here is always taken with int as int* and p is the name for the variable. Many take this as *p which is not correct

char*y;//y is a pointer to char

Actually pointer operation is very simple as there are just one operator for using pointer which is the dereferencing operator *.

1

*p->gives the content of the location inp.

What the content is, depends on the type of p. If p is an integer pointer, *p will return 4 bytes from the location contained in p. If p is a char pointer *p will return 1 byte from the location in p.

Example

1

2

3

4

int*p,a;

scanf("%d",&a);//memory address of a is passed to scanf and it stores the keyboard entered value in that location

p=&a;//Memory address of a is stored in p

printf("*p = %d",*p);//Content of p is printed by printf. Since %d is used, 4 bytes (assuming sizeof int is 4) from the memory location given by p is converted to integer and printed.

The below code also does the same job but in a slightly different manner.

1

2

3

4

int*p,a;

p=&a;

scanf("%d",p);

printf("a = %d *p = %d",a,*p);

Pointer Arithmetic

Why do we do pointer arithmetic? It is to access the next or previous elements in an array of elements. For that reason pointer arithmetic is restricted to just addition and subtraction. And though pointers can be represented using integer values, as they hold memory addresses, pointer arithmetic is different from integer arithmetic. Pointer arithmetic works as follows:

1

2

3

4

5

6

7

8

9

10

//Assume p and q are pointers

p+1//p is incremented to the next address for holding the data type for p. i.e; if p is an int pointer p+2 will add 8 to the content of p (assuming sizeof (int) is 4)

p-1//p is decremented to the previous address for holding the data type of p

p-q//If p and q are pointers p-q will work as above and thus will

//return the no. of objects of type of p between the memory addressess of p and q (p and q must be of same data type or else it is compilation error)

p+q//Not allowed and throws compilation error

There are no multiplication or division operators in pointer arithmetic as they have no meaning.

Does a pointer have memory?

Yes, pointer is a data type and so has a memory for it. On a 32 bit system, pointer requires 4 bytes (32 bits) of memory while on a 64 bit system it requires 8 bytes. And it is the same for pointer to any data type. Since pointer has a memory, we can have a pointer to pointer as well and this can be extended to any level.

Example

1

2

3

4

5

6

7

8

9

10

#include <stdio.h>

intmain()

{

inta;

int*p;

p=&a;

int**q;//q is a pointer to a pointer to int

q=&p;

printf("a = %d, *p =%d, **q = %d\n",a,*p,**q);

}

Assigning values to pointers

Since pointers should hold valid memory address (OS allocates certain memory region for a process and it should use only that region and should not try to access other memory region), we should not assign random values to a pointer variable. Assigning some values to a pointer is allowed, but if that value is not in the memory address for the process, dereferencing that pointer using * operator will cause segmentation fault.

One way of assigning value to a pointer variable is by using & operator on an already defined variable. Thus, the pointer variable will now be holding the address of that variable. The other way is to use malloc function which returns an array of dynamic memory created on the heap. This method is usually used to create dynamic arrays.

Example

1

2

3

4

5

6

7

8

9

10

11

#include <stdio.h>

#include <stdlib.h> //contains the declaration for malloc

intmain()

{

inta;

int*p=&a;

*p=6;

p=malloc(10*sizeof(int));

*(p+5)=7;

printf("a = %d, *(p+5) = %d, p[5] = %d\n",a,*(p+5),p[5]);

}

Problems with pointers

Pointers are very powerful but with power comes problems. Since, pointer allows direct access to memory address, if programmer is not careful, pointer usage can cause segmentation faults. The other problem with pointer is that pointer dereferencing is difficult to debug especially when there are more than two levels of indirection (pointer to pointers).

Exercise Questions

1. What will be the output of the following code?

1

2

3

4

5

6

7

8

#include<stdio.h>

intmain()

{

inta=5;

int*p=&a;

printf("%d",++*p);

}

→ p is pointing to the address of a. *p will return the content of a which is 5 and ++ will increment it to 6.

2. What will be the output of the following code?

1

2

3

4

5

6

7

8

#include<stdio.h>

intmain()

{

chara[]="Hello World";

char*p=&a;

printf("%s",p+2);

}

Compiler Error

World

llo World

Runtime Error

→

Since p is a char pointer p+2 will add 2 to p (since sizeof(char) is 1). So, p+2 will be pointing to the string “llo World”.

3. What will be the output of the following code?

1

2

3

4

5

6

7

8

#include<stdio.h>

intmain()

{

inta;

int*p=&a;

printf("%zu",sizeof(*(char*)p));

}

1

2

4

Compile Error

→

p is typecasted to char pointer and then dereferenced. So, returned type will be char and sizeof(char) is 1.

4. Is the following code legal?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

#include<stdio.h>

intmain()

{

inta=1;

if((char*)&a)

{

printf("My machine is little endian");

}

else

{

printf("My machine is big endian\n");

}

}

Yes

No

→

On a little endian machine the lower address will contain the least significant byte. Suppose a is stored at address 1000 and contains 1, then character at 1000 will be 1, if the machine is little endian.

5. What will be the output of the following code?

1

2

3

4

5

6

7

#include<stdio.h>

intmain()

{

int*a=(int*)1;

printf("%d",a);

}

Garbage value

1

Compile error

Segmentation fault

→

Assigning int values to pointer variable is possible. Only when we dereference the variable using *, we get a segmentation fault.

6. What will be the output of the following code?

1

2

3

4

5

6

7

8

9

#include<stdio.h>

intmain()

{

inta=1,*p,**pp;

p=&a;

pp=p;

printf("%d",**pp);

}

Garbage value

1

Compile error

Segmentation fault

→

Here, p is having address of a and the same is copied to pp. So, *pp will give 1 which is contained in a, but **p will use 1 as an address and tries to access the memory location 1, giving segmentation fault.

7. What will be the output of the following code?

1

2

3

4

5

6

7

8

9

#include<stdio.h>

intmain()

{

inta=1,*p,**pp;

p=&a;

pp=p;

printf("%d",*pp);

}

1

Garbage value

Compile error

Segmentation fault

→

Here, p is having address of a and the same is copied to pp. So, *pp will give 1 which is contained in a.

8. Assuming a little endian machine, what will be the output of the following program?

The code is printing the binary equivalent of the input number. Suppose a is stored starting from address 1000. Since, we assume a little endian machine, the LSB of a will be stored at address 1000 and MSB will be stored at address 1003. So, we make a char pointer to 1003 and take out the MSB. Then using shift operator we get the most significant 4 bits from it and then the least significant 4 bits. We repeat the same for the other three bytes of a.

Content

$\text{C}$ is a just a language specification and $\text{C}$ compilers (which translates $\text{C}$ code to machine code) are made by different organizations/ individuals. So, in order to make a $\text{C}$ program work across multiple compilers, $\text{C}$ standard is important.

A little about C standards

Till $\text{ANSI}$ standard which is almost the same as $\text{C90}$ (also called $\text{C89}/ \text{C90}$) was published, the standard specified in $\text{K&R}$ book was used as a reference for C standard. Each new standard aimed at easing the programming difficulty as well as making use of the new hardware features. We’ll be following as much of $\text{C11}$ standard as possible as it adds some significant changes to $\text{C}$ language.

Data types

Data types are used to represent data and data comes from real world. In real world we have the following data types

Integer

Character

Real Numbers

All other data types can be formed from these basic types (for example, a string is just a sequence of characters). So, in $\text{C}$ language we have only these basic types but they are supported in various data sizes as follows

short int

int

long int

long long int

char

float

double

long double

Now, each of these type is supported in unsigned version also. In signed version, one bit is used to identify if the number is positive or negative. So, for a $32$ bit signed integer, it can represent only up to $2^{31} – 1$, while a $32$ bit unsigned integer can represent up to $2^{32} – 1$. A boolean data type is also present in C standard which can be used to hold a bit. This data type can be used using the keyword $\text{bool}$. If we use an $\text{int}$ (or $\text{char}$), to get a boolean value, we need to logical $\text{AND}$ (&) it with $1$. With $\text{bool}$, this conversion is not necessary as the byte representing the boolean will always have the most significant $7$ bits as $0$.

So, let’s see what these data types mean to computer. As we have seen in the previous chapter, all data must be converted into a bit stream before being given to the processor. So, even though we use alphabets and digits while writing programs, they are converted to bits, stored in the memory and then given to the processor. How many bits a data type can have is defined by the size of the data type. In $\text{C}$ language, the operator $sizeof$ gives the number of bytes (i.e., $8$ $*$ number of bits) a data type takes. Since, memory accesses are restricted to multiple of bytes ($\text{RAM}$ doesn’t allow to access data at a granularity lower than 8 bits at a time due to practical reasons) $sizeof$ always return at least $1$ for any data type. (A $\text{bool}$ also take a byte of storage as that is the smallest accessible unit in a memory, though it actually requires single bit of storage)
Now, let us see the size of various data types in $\text{C}$. Since, the data types are directly given to the processor, the size of data types depends on the processor architecture. So, $\text{C}$ standard tells the minimum required size specification, and have let the compiler designers choose the size as per their processor architecture- $\text{C}$ compilers are used in $8$ bit embedded processors to $64$ bit desktop processors. So, this size variation does make sense.

Data Type

Min.Size

char 1

1

short int 2

2

int 2

2

long int 4

4

long long int 8

8

float 4

4

double 8

8

long double

10

The size of $\text{char}$ is $1$ byte as it was sufficient for $\text{ASCII}$ encoding. But for extended $\text{ASCII}$ character support, $wchar$_$t$ which supports up to $16$ bits is defined in $C11$ standard. It also defines $char16\_t$ or $char32\_t$ in “uchar.h” header file, thus supporting Unicode characters which requires up to $21$ bits. So, in today’s world, a $\text{char}$ and an $\text{int}$ take the same size and, the integer variable ($char32$_$t$ of $4$ bytes) is used to store Unicode characters and the data type $\text{char}$ is used mainly to refer to a byte of data than an actual character.

Constants and variables

We have seen the data types, but to use them in a program we need to have a variable. A $\text{variable}$ is a named entity to represent a specific data type. The type of a variable is fixed during the program run, but its value can be changed, and hence the name variable. (In an object oriented language like $\text{CPP}$, a class can be taken as a data type and its instance become a variable). To assign a value to a variable we use $constants$. The following are the example usage of variables and constants.

constant and variable

C

1

2

inta;// 'a' is an int variable

a=5;// 5 is an int constant

Constant Types

In $\text{C}$ language we have the following constants

Integer constant

Decimal constant

Octal constant

Hexadecimal constant

Floating constant

Character constant

Enumeration constant

Integer Constant

C standard supports decimal, octal and hexa-decimal constants being used to assign integer values. Their example usage is as shown below.

Integer Constant

C

1

2

3

4

5

6

7

8

9

10

11

12

#include <stdio.h>

intmain()

{

enummonth{jan=1,feb,mar,apr,may,jun,july,aug,sep,oct,nov,dec};//jan is having int value 1, feb value 2 and so on

inta,b,c;

a=10;//10 is a decimal constant

b=0xa;// a is a hexadecimal constant

c=012;//12 is an octal constant

enummonthd=oct;//oct is an enumeration constant

printf("a =%d, b = %d, c =%d, d =%d ",a,b,c,d);

return0;

}

We can use an $\text{int}$ instead of $\text{enum}$ as both takes same amount of memory. But the use of $\text{enum}$ ensures that a variable can hold only a particular set of integer values rather than the whole range of integers. Thus it leads to less program errors and makes the code more readable by providing a set of defined constants. Here, $a$ is having the decimal value of $10$. So, in memory $a$ will be like

Integer Constant 1

C

1

000...1010

Similarly, $b$ will be in memory like

Integer Constant 1

C

1

000...1010

and $c$ and $d$ will also be like

Integer Constant 1

C

1

000...1010

i.e.; all $a$, $b$, $c$ and $d$ are having same integer values given using different constants. The memory to be allotted to an integer constant is determined by its value, minimum being the $sizeof(int)$. For example, $40$ is allotted the $sizeof(int)$ while 0xfffffffff is allotted $sizeof(long)$ as it won’t fit in $sizeof(int)$ bytes (assuming $sizeof(int)$ is $4$ and $sizeof(long)$ is $8$).

Floating Constant

To understand clearly floating point conversion, let us do a simple example. It is very clear in decimal number system, a real number can be represented in several ways. $23.75$ for instance is the same as

$2375 * 10^-2$

$0.2375 * 10^2$

$2.375*10^1$

Of these, the third manner, where the decimal point is placed after the first non-zero number is called the ‘normalized’ notation. If we were to develop a representation for a real number in decimal system, it should allow to represent the sign of the number along with the following details. For example, for the above real number, $2.375$, is known as the significant, $10$ is the base of the decimal number system, and $1$ is the exponent (can also be negative). Please follow Section 4.1 on number representation for a better understanding.

The representation of a floating point number is implementation specific. $\text{C11}$ do specify $\text{IEC}$ $60559$ format for floating point representation but its not mandatory that all implementations must support them. But most current implementations do support them and hence it’s good to have a look at them. $\text{IEEE}$ format is a good look as $\text{IEEE}$ $754$ is identical to $\text{IEC}$ $60559$

Constant values can be assigned to float or double variables in various ways as shown below. If a constant cannot be exactly representable in the float or double variable the implementation is recommended to show a warning as per C standard. But this is just a recommendation and not a strict requirement.

//%5.2f means the output will have a total of minimum 5 places including 2 decimal digits and a point.

//If lesser digits are there, then the remaining space is filled with white space.

//%05.2f is same as&nbsp;%5.2f except that the remaining space, if any, are filled with 0s than white space

return0;

}

Character Constant

Characters can be assigned value either by using a character in single quotes or by giving the integer value from the character code. And this int value can be given using hex or octal representation as well, as shown below. Escape sequences are applicable to character constants like $’\n’$, $’\t’$ etc. Character code values can also be used in place of a char by using their octal or hexa-decimal representation. Octal values are given by a preceding ‘\’ where the next three octal digits are converted to the corresponding character ($3$ octal digits suffice to get upto $256$). Hexa-decimal codes are given by a preceding ‘\x’ where the next two hexa-decimal digits are converted to the corresponding character ($2$ hexa-decimal digits suffice to get upto $256$).

e='\101';//101 is an octal value whose corresponding ASCII char is assigned to e

printf("a =%d, b =%d, c =%d, d =%c e =%c abc = %s\n",a,b,c,d,e,abc);

return0;

}

Here, $a$ is having the $ASCII$ value of $’a’$ which is $97$. So, in memory $a$ will be like

Character Constant 1

C

1

01100001

Similarly, $b$ and $c$ will be in memory like

Character Constant 2

C

1

00000000//ASCII value of '\0' is 0

and $d$ and $e$ will be like

Character Constant 2

C

1

01000001

Enumeration Constant

Enumeration constants are assigned integer values starting from a given initial value which by default is 00. An example is shown below:

Enumeration Constant 2

C

1

enumplayer{Dhoni=1,Kohli,Yuvraj,Aswin=5,Jadeja,Mishra=10}

Here, Dhoni is having an integer value 1, Kohli 2, Yuvraj 3, Aswin 5, Jadeja 6 and Mishra 10. That is, these names can be used wherever these values are needed.

String Literal

A character string literal is a sequence of zero or more characters enclosed in double-quotes, as in “xyz”. A $\text{UTF-8}$ string literal is the same, except prefixed by $u8$. A wide string literal is the same, except prefixed by the letter $L$, $u$, or $U$. All escape sequences applicable to a character constant is applicable for a string literal except that for $’$ an escape sequence is not mandatory. Any sequence of string literals will be combined into a single string literal during the translation phase of the compiler. Thus, “$abc$” “$de$” is equivalent to “abcde”. Another important property of string literal is that it cannot be modified and is usually stored in the $\text{RO}$ data segment.

String Literal

C

1

2

charp[]="hello world";

char*q="hello world";

Here, individual characters of p can be modified as the characters of the string literal “hello world” are copied to the memory allocated to $p$ which is $12$ bytes. But, individual characters of the content of $q$ can only be read and not modifiable as $p$ is pointing to a string literal, which is stored in the $\text{RO}$ data segment of the program as they are considered as constants. i.e.

String Literal

C

1

2

3

p[2]='p';//valid

charc=q[2];//valid

q[3]='q';//Invalid

The last statement causes segmentation fault.

Implicit Type Conversion

We can round off this chapter with an important point about implicit type conversion. Whenever we do an operation with different data types, the lower ranked data type is promoted to the higher ranked one, as, operations are meant to be performed on same types of data. For example when we add an $\text{int}$ and a $\text{float}$, the $\text{int}$ is promoted to $\text{float}$ and addition of two $floats$ takes place using two floating point registers. Similarly, when we add a $\text{char}$ and an $\text{int}$, the $8$ bits of char is made into $32$ bits (assuming $4$ byte size for $\text{int}$), by padding it with $0’s$.

One important point about implicit type conversion is that, it depends only on the source operands and is independent of the resultant data type. So, if we multiply two $integers$ and store in a $long$, the result will be calculated as $\text{int}$ (usually 4 bytes) and then stored in $long$ (usually 8 bytes). Another common example of this behavior is for division operation. When we divide two $integers$, the result will be $\text{int}$ only, even if we assign it to a $\text{float}$. So, in these cases the programmer has to explicitly cast one operand to the desired output type.

What exactly happens during type conversion?
Before reading the description below, think how it can happen- you won’t think wrong.

$unsigned$ to $signed$ or vice verse: There is no change in the representation in memory. When casting to signed, the most significant bit is taken as a sign bit which would otherwise be used for representing the number. So, this type casting is necessary during conditional checks as a negative number when type casted to unsigned will give a huge $\text{int}$ value.

$\text{char}$ to $\text{int}$: If $\text{int}$ is $4$ bytes- the top $3$ bytes are filled with sign bit if character is signed and with zero if character is unsigned, and bottom most byte is the same byte used to represent the $\text{char}$. The same holds for conversion from $\text{short int}$ to $\text{int}$ or from $\text{int}$ to $\text{long int}$

$\text{int}$ to $\text{char}$: Only the lowermost byte of the $\text{int}$ is taken and made to a $\text{char}$. So, if $\text{int}$ value is $511$, its char value will be $255$.

Implicit Type Conversion

C

1

2

00000000000000000111111111111111//511

//11111111 is 255

$\text{int}$ to $\text{float}$ or $\text{double}$: The fixed integer is converted to a representable floating point value. So, this might cause a change to entire bits used to represent the integer.

$\text{float}$ or $\text{double}$ to $\text{int}$: The integral part of the floating point value is saved as integer. The decimal part is ignored. For example, $5.9$ will be changed to $5$.

$\text{float}$ to $\text{double}$: The extra mantissa bits supported in $\text{double}$ are filled with $0$’s so are the extra exponent bits. The bits of float ($32$ of them) are used without modification.

$\text{double}$ to $\text{float}$: The extra mantissa bits supported in $\text{double}$ are truncated in floating representation so do the extra exponent fields. If the truncated exponent field were non zeroes, it might cause a change to other mantissa bits as well as the number would then need an approximation to fit in a float size.

Exercise Questions

Consider an implementation where int is 4 bytes and long int is 8 bytes. Which of the following initializations are correct?

C

1

2

3

4

5

6

7

8

9

10

#include <stdio.h>

intmain()

{

longinta=0x7fffffff*0x7ffffff;

longintb=0x7ffffffff*0x7ffffff;

longintc=0x7fffffff*0x7fffffff;

longintd=0x7fffffff*0x7fffffffl;

printf("a = %ld, b = %ld, c = %ld, d = %ld\n",a,b,c,d);

return0;

}

Consider an implementation where int is 4 bytes and long int is 8 bytes. What will be the output of the following code?

C

1

2

3

4

5

6

7

#include <stdio.h>

intmain()

{

inti=0;

size_ta=sizeofi,b=sizeof(long);

printf("a = %zd, b = %zd\n",a,b);//If %zd is given the compiler will automatically give it the correct type whether short, long or normal.

//This is useful for special data types like size_t whose size is implementation specific

C

1

2

return0;

}

What will be the output of the following code?

C

1

2

3

4

5

6

7

8

#include <stdio.h>

intmain()

{

unsignedinta=5;

if(a>-1)

printf("5 is > -1\n");

return0;

}

What will be printed by the following code?

C

1

2

3

4

5

6

7

8

#include <stdio.h>

intmain()

{

charbuff[255]="abc\

pee";

printf("%s",buff);

return0;

}

What will be printed by the following code?

C

1

2

3

4

5

6

7

8

#include <stdio.h>

#include <string.h>

intmain()

{

charbuff[]="abc""hello";

printf("%zd\n",strlen(buff));

return0;

}

How can you print the following sentence exactly as it is by changing the assignment to buff? “Hello\\” “World\\”

C

1

2

3

4

5

6

7

#include <stdio.h>

intmain()

{

charbuff[255]="\0";

printf("%s",buff);

return0;

}

Consider an implementation where int is 4 bytes and long int is 8 bytes. Which of the following initializations are correct?