COSC 1315

Programming Fundamentals

Data Types

This lesson was written specifically for the
benefit of my students in COSC 1315, Fundamentals of Programming. The
lesson was written under the assumption that those students have no prior
programming knowledge when they enroll in the course.

Another browser window

I recommend that you open another copy of this document in a separate browser
window so that you can view the code and the discussion of that code at the same
time.

C++, Java, C#, and some other modern programming
languages make heavy use of a concept that we refer to as type, or
data type. We refer to those languages as type-sensitive languages.

Not all languages are type-sensitive languages. In
particular, some languages hide the concept of type from the programmer and
automatically deal with type issues behind the scenes. The web
programming language known as
PHP is one such
language.

So, what do we mean by type?

One analogy that comes to my mind is international
currency. For example, many years ago, I spent a little time in Japan and quite
a long time on an island named Okinawa (I
believe that Okinawa is now part of Japan but that wasn't the case when I was
there).

Types of currency

At that time, as now, the type of currency used
in the United States was the dollar. The type of currency used in Japan
was the yen, and the type of currency used on the island of
Okinawa was also the yen. However, even though two of these currencies
had the same name, they were different types of currency, as determined
by the value relationships between them.

The exchange rate

As I recall, at that time, the exchange rate between
the Japanese yen and the U.S. dollar was 360 yen to the dollar. The exchange
rate between the Okinawan yen and the U.S. dollar was 120 yen to the dollar.

This suggests that the exchange rate between the Japanese yen and the Okinawan
yen would have been three Japanese yen to one Okinawan yen.

Analogous to different types of data

So, why am I telling you this? I am telling you this
to illustrate the concept that different types of currency are roughly analogous
to different data types in programming.

Purchasing transactions were type sensitive

In particular, because these were three different types
of currency, the differences in the types had to be taken into account in any
purchasing transaction to determine the price in that particular currency. In
other words, the purchasing process was sensitive to the type of currency being
used for the purchase (type sensitive).

Different types of data

Type-sensitive programming languages deal with
different types of data.

Some data types conceptually have nothing to do with
numeric values, but conceptually deal with the concept of true or false.

Letters, numbers, and other characters

Some types deal with the concept of the individual letters of the alphabet, individual numeric characters, and
individual
punctuation characters (or groups of such characters known as strings).

Type specification

For every different type of data used with a particular
programming language, there is a specification somewhere that defines two
important characteristics of the type:

What is the set of all possible data values that
can be stored in an instance of the type (we will learn some other
names for instance later)?

Once you have an instance of the type, what are
the operations that you can perform on that instance alone, or in
combination with other instances?

What do I mean by an instance of a
type?

Think of the type specification as being analogous to the plan or blueprint
for a model airplane. Assume that you build three model airplanes from the same
set of plans. You will have created three instances of the plans.

We might say that an instance is the physical manifestation of a plan or a
type.

Using mixed types

Somewhat secondary to this specification, but also
extremely important, is a set of rules that define what happens when you perform
an operation involving mixed types (such as making a purchase using some yen
currency in combination with some dollar currency).

The short data type

There is a data type in C++ known as
short.

Assuming that the short type is maintained as a
16-bit two's complement entity, if you have an instance of the short type, the set of all
possible values that you can store in that instance is the set of all the whole
numbers ranging from -32,768 to +32,767.

There is also a type known as
unsigned short in which you can store all the whole numbers ranging from 0
to 65,535.

65,536 different combinations

This constitutes a set of 65,536 different values,
including the value zero.

No other values allowed

No values other than those listed above can be stored in an instance
of the type short or the type unsigned short.

For example, you cannot store the value 32,768 in an
instance of the type short. If you need to store that value, you will
need to use some type other than short.

Kind of like an odometer

This is somewhat analogous to the odometer in your car
(the thing that records how many miles the car has been driven). For
example, depending on the make and model of car, there is a specified set of
values that can appear in the odometer. The value that appears in the odometer
depends on how many miles your car has been driven.

00000 to 99999

It is fairly common for an odometer to be able to store
and to display the set of all positive values ranging from 00000 to 99999.

If
your odometer is designed to store that set of values and if you drive your car
more than 99999 miles, it is likely that the odometer will roll over and start
back at 00000 after you pass the 99999-mile mark.

Data becomes corrupt

In other words, that
particular odometer does not have the ability to store a value of 100,000
miles. Once you pass the 99999-mile mark, the data stored in the odometer is
corrupt. There is no way of knowing (based on the value in the odometer
alone) how many hundreds of thousands of miles the car has been driven.

Operations on the type named short

Assume that you have two instances of the type short
in a program. What are the operations that you can perform on those
instances? For example:

You can add them together.

You can subtract one from the other.

You can multiply one by the other.

You can divide one by the other.

You can compare one with the other to determine
which is algebraically larger.

There are other operations that are allowed as well.

There is a well defined set of operations that you are allowed to perform
on those instances, and that set of operations is defined in the specification
for the type short.

What if you want to do something different?

However, if you want to perform an operation that is
not allowed by the type specification, then you will have to find another way to
accomplish that purpose.

For example, some programming languages allow you to
raise whole-number types to a power (example: four squared, six cubed, nine
to the fourth power, etc.).

That operation is not allowed by the
C++ specification for the type short. If you need to do that operation
with a data value of the short type, you must find another way to do it.

What this means is that there is a core component to the language that is
always available. Beyond that core component, individual programmers can extend
the language to provide new capabilities.

The primitive types discussed in this
section are the types that are part of the core language.

A later section will
discuss user-defined types that become available when a programmer extends the
language.

More subdivision

It seems that when teaching programming, I constantly find myself subdividing
topics into sub-topics. I am going to subdivide the topic of Primitive Types
into five categories:

Whole-number types

Floating-point types

Character types

Boolean type

String type (not strictly a primitive type, although it has become a
core part of the C++ programming language)

Hopefully this categorization will make it possible for me to explain these
types in a way that is easier for you to understand.

Whole-number types

The whole-number types, often called integer types, are probably the easiest
to understand. These are types that can be used to represent data without
fractional parts.

Applesauce and hamburger

For example, consider purchasing applesauce and hamburger. At the grocery
store where I shop, I am allowed to purchase cans of applesauce only in
whole-number or integer quantities.

Can purchase integer quantities only

For example, the grocer is happy to sell me one can of applesauce and is even
happier to sell me 36 cans of applesauce.

However, she would be very unhappy if
I were to open a can of applesauce in the store and attempt to purchase 6.3 cans
of applesauce.

Counting doesn't require fractional parts

A count of the number of cans of applesauce that I purchase is somewhat
analogous to the concept of whole-number data types. Applesauce is not
available in fractional parts of a can at my grocery store. (However,
there is another grocery store nearby where you can probably scoop applesauce
out of a large container and purchase as much or as little as you want based on
weight.)

Fractional pounds of hamburger are available

On the other hand, the grocer is perfectly willing to sell me 6.3 pounds of
hamburger. This is somewhat analogous to floating-point data types.

Accommodating applesauce and hamburger in a program

Therefore, if I were writing a program dealing with quantities of applesauce
and hamburger, I might elect to:

There are both
signed and unsigned variations on each of the basic types.

The four types
differ primarily in terms of the range of values that they can accommodate and
the amount of computer memory required to store instances of the types.

Differences in operations?

Although there are some subtle differences among the different whole-number
types in terms of the operations that you can perform on them, those differences
are beyond the scope of this lesson.

Like a strange odometer (-128 to +127)

To form a crude analogy, the signedchar type is sort of like a
strange odometer in a new (and unusual) car that shows a mileage value of
-128 when you first purchase the car.

As you drive the car, the negative values
shown on the odometer increment toward zero and then pass zero.

Beyond that
point they increment up toward the value of +127. When the value goes
beyond 127, it starts over at -128.

Oops, numeric overflow!

When the value passes (or attempts to pass) +127 miles, something bad
happens. From that point forward, the value shown on the odometer is not a
reliable indicator of the number of miles that the car has been driven.

Floating-point types

Floating-point types are a little more complicated than whole-number types.
I found the following definition of floating-point in the Free On-Line
Dictionary of Computing at this
URL:

"A number representation consisting of a mantissa, M, an exponent, E, and
an (assumed) radix (or "base") . Thenumber represented is M*R^E
where R is the radix - usually ten but sometimes 2."

So what does this really mean?

Assuming a base or radix of 10, I will attempt to explain it using an
example.

Consider the following value:

623.57185

I can represent this value in any of the following ways (where * indicates
multiplication):

In other words, I can represent the value as a mantissa (62357185)
multiplied by a factor where the purpose of the factor is to represent a
left or right shift in the position of the decimal point.

Now consider the factor

Each of the factors shown above represents the value of ten raised to some
specific power, such as ten squared, ten cubed, ten raised to the fourth power,
etc.

Exponentiation

If we allow the following symbol (^) to represent exponentiation (raising
to a power) and allow the following symbol (/) to represent division, then
we can write the values for the above factors in the following ways. Note in
particular the numbers in blue, which I will refer to later as the exponents.

In the above notation, the term 10^+3 means 10 raised to the third power.

The zeroth power

By definition, the value of any value raised to the zeroth power is 1. (Check this out in your old high-school algebra book.)

The exponent and the factor

Hopefully, at this point you will understand the relationship between the
value shown in blue (the exponent) and the factor introduced earlier.

Different ways to represent the same value

Having reached this point, by using substitution, I can rewrite the original
set of representations of the value 623.57185 in the following ways. It is very
important to for you to understand that these are simply different ways to
represent the same value.

Floating point types represent values as a mantissa containing a decimal
point along with an exponent value that tells how many places to shift the
decimal point to the left or to the right in order to determine the true value.

Positive exponent values mean that the decimal point should be shifted to the
right. Negative exponent values mean that the decimal point should be shifted
to the left.

Maintaining fractional parts

One advantage of floating-point types is that they can be used to maintain
fractional parts in data values.

Accommodating a very large range of values

Another advantage is that a very large range of values can be represented
using a reasonably small amount of computer memory for storage of the values.

Another example

For example (assuming that I counted the number of digits correctly)
the following very large value

62357185000000000000000000000000000000.

can be represented as

6.2357185E+37

Similarly, again assuming that I counted the digits correctly, the following
very small value

.0000000000000000000000000000062357185

can be represented as

6.2357185E-30

When would you use floating-point?

As examples, you will need to use the floating-point types if you are working in an area where you:

Need to keep track of
fractional parts (such as the amount of hamburger in a package)

(Once again, the number of bits, the number of significant digits, and the
range for each type was gleaned from
http://www.cplusplus.com/doc/tutorial/tut1-2.html. Hopefully the
character used above indicating plus or minus will survive the HTML publishing process.)

These three types
differ primarily in terms of the range of values that they can support and the
amount of memory required to store an instance of the type.

Values
of any of the three types can be either positive or negative as indicated by the
± character.

This type is designed as a type to store
international characters of a two-byte (16-bit) character set.

(This
type was recently added by the ANSI-C++ standard and some older compilers may
not support it.)

What are the numeric values representing characters?

As long as the characters that you use in your program appear on your
keyboard, you usually don't have a need to know the numeric value associated
with the different characters.

Representing a character symbolically

Insofar as keyboard characters are concerned, you usually represent a
character to the program by surrounding it with apostrophes as follows: 'A'.

Programming tools know how to cross reference that specific character symbol
against a character table to obtain the corresponding numeric value.

(A
discussion of the representation of characters that don't appear on your
keyboard is beyond the scope of this lesson.)

Boolean types

Thebool type is conceptually the simplest type supported by C++. It can have only
two values:

true

false

In C++, the bool type is actually represented by a numeric value, 0 for
false and 1 for true. (Although I'm not certain, I believe that any
non-zero value is interpreted to be true.)

As a result, it is possible to perform arithmetic
operations on variables of type bool(but the resulting code may be
very confusing).

The bool type is commonly used in some sort of a
test to determine what to do next, such as:

if some test returns true, then
do this
otherwise
do that

The string type

Thestring type is also relatively new to C++. Although it probably
isn't a true primitive type, it has become part of the core language and can be
used much the same way primitive types are used provided that the string
header file is included in the program using a compiler directive such as:

#include <string>

The purpose of the string type is to store strings of characters, such as in
the following statements:

As mentioned earlier, C++ is an extensible programming
language. There is a core component to the language that is always available.

Beyond the core component, different programmers can extend the language in different ways to
meet their individual needs.

Creating new types

One of the ways that individual programmers can extend the
language is to create new types.

When creating a new type, the user must define:

The set of values that can be stored in an instance of the type

The
operations that can be performed on instances of the type.

No magic involved

While this might initially seem like magic, once you
get to the heart of the matter, it is really pretty straight forward.

New types
are created by combining instances of primitive types and instances of
previously defined user-defined types.

An example

For example, the string type, which can be used
to represent a person's last name is just a grouping of a bunch of instances of
the primitive char or character type.

A user-defined Person type, which could be used
to represent a person's first name and last name, might simply be a
grouping of two instances of the string type.

The company telephone book

A programmer responsible for producing the company
telephone book might create a new type that can be used to store the first and
last names along with the telephone number of an individual.

Using this new type, the programmer could create an
instance for each employee in the company.

(At this point, let me sneak a little jargon in and tell you that we will be
referring to such instances as objects.)

The set of allowable values

The set of allowable values that could be stored in an instance of the type
would be all possible combinations of allowable first and last names and
allowable telephone numbers.

In order to limit the size of the set of
allowable values, for example, restrictions may be established relative to:

The maximum
lengths of the names

The characters in the names

The digits in the telephone numbers

A comparison operation

The programmer might define one of the allowable
operations for the new type to be:

A comparison between two objects of the new
type to determine which is greater in an alphabetic sorting sense.

This
operation could be used to sort the set of objects representing all of the
employees into alphabetic order.

The set of sorted objects could then be used
to print a new telephone book.

A name-change operation

Another allowable operation that the programmer might
define would be:

The ability to change the name stored in an object representing
an employee.

For example when Suzie Smith marries Tom Jones, she
might elect to thereafter be known as:

Suzie Smith

Suzie Jones

Suzie Smith-Jones

Suzie
Jones-Smith

Some other variation on the names of her and her new spouse

In this case, there would be a need to modify the
object that represents her to reflect her newly-elected surname.

(Or perhaps Tom Jones
might elect to thereafter be known as Tom Smith or Tom Jones-Smith, in which case it would be
necessary to modify the object that represents him.)

A print operation

Still another useful operation might be an operation that formats the contents of
each object in a way that is suitable for printing the telephone book.

For example, the printing format might be:

Last name, First name, telephone number

An updated telephone book

Using these operations, the programmer could:

Use the name-changing operation to
modify the object

Use the sorting operation to re-sort the set of
objects

Use the print operation to print and distribute a modified version of the telephone book.

Many user-defined types already exist

Unlike the primitive types which are predefined, I am
unable to give you much in the way of specific information about user-defined
types, simply because they don't exist until a user defines them.

I can tell you, however, that when you obtain modern
object-oriented programming tools for different programming languages, you not
only receive the core language containing the primitive types, you also usually
receive a large library containing several thousand user-defined types that have
already been defined.

(Note that in this case, the user who defines a new type may actually
be an employee working for the company from which you obtain your
programming tools. This person's job may be to create a large library
of standard user-defined types for use by you and other customers.)

A large documentation package is usually also available to
help you determine the individual characteristics of those user-defined types.

The most important thing

At this stage in your development as a programmer, the
most important thing for you to know about user-defined types is that they are
possible.

You can define new types in object-oriented programming.

Unlike earlier procedural
programming languages such as C and early versions of Pascal, you are no longer forced to adapt
your problem to the available tools.

Rather, you now have the opportunity to
extend the tools to make them better suited to solve your problem.

The class definition

One specific C++ mechanism that makes it possible for
you to define new types is a mechanism known as the class definition.

In C++, whenever you define a new class, you are at the
same time defining a new type. Your new type can be as simple, or as complex
as you want it to be.

An object of your new type can contain a very small
amount of data, or it can contain a very large amount of data.

The operations that you allow to be performed on an
object of your new type can be rudimentary, or they can be very powerful.

It is all up to you

Whenever you define a new class (type) you not
only have the opportunity to define the data definition and the
operations, you also have a responsibility to do so.

Much to learn and much to do

But, you still have much to learn and much to do before
you will need to define new types.

There are a lot of fundamental programming concepts
that you will need to learn before you seriously embark on a study involving the
definition of new types.

For the present then, simply remember that such a
capability exists, and if you work to expand your knowledge of modern
object-oriented programming one small step at a time, you will be ready and
eager to define new types when you reach that point in your education.

(Note that the character shown in the first line of output above is the
character from the Extended ASCII character set that is represented by the
numeric value 242 (see
http://www.lookuptables.com/). This character is not correctly depicted in the
comments at the beginning of the program in Listing 1, probably because the text
editor in Dev C++ doesn't recognize it. Also
note that the ± character used earlier is represented by the numeric value
241.)

The <string> header file

Several different things are worthy of note in Listing 1. For example,
the <string> header file must be included in order to use the insertion
operator along with cout to display a variable of type string when
the program is compiled using Visual Studio 6.0.
Otherwise, a compilation error results.

Not required for Dev C++

However, this is not the case when the program is compiled using Dev C++ and
run from the command line. With Dev C++, the program will compile and
execute successfully even if the <string> header file is not
included. This is another example of the kinds of compatibility issues
that you will encounter when programming in C++.

Type char displays as a character

The char type is an eight-bit signed or unsigned integer type that can be used
to store the eight bits that represent characters in the
Extended ASCII character set.

The char type can also be used as a general-purpose eight-bit type on which you
can perform arithmetic.

When you use the insertion operator along with
cout to display a variable of type char, the character represented by
the numeric value is displayed instead of the numeric value itself.

Integer overflow in a twos-complement system

If you add one to a long variable containing the value 2147483647, it
overflows and becomes the erroneous value -2147483648.

This is a
characteristic of the twos-complement binary number system.

Integer
overflow in the positive direction wraps around to the negative side.

Similarly integer overflow (or should I say underflow) in the negative direction wraps around to the
positive side.

Representation of type int

Although type int may be represented differently on different
platforms and operating systems, it appears to be represented the same as type long
on my machine running Windows XP and Visual Studio 6.0. This also appears
to be the case when running Dev C++ on my machine.

Wide character display

When the insertion operator is used with cout to display a variable of
the wide character type wchar_t, it is the numeric value and not the
character that is displayed. This is just the opposite of the result of
displaying a variable of type char.

Display of floating-point types

When the insertion operator is used along with cout to display any of
the three floating point types, the number of significant digits that is
displayed is the same as type float even though types double and
long double maintain more significant digits than type float in
their internal representation.

Not strictly a boolean type

The boolean type named bool is not strictly a boolean type.

You
can initialize it using the literal values true and false.

You can also initialize it with a numeric value where 0 represents false and any
non-zero numeric value represents true.

Also, it is possible to perform
arithmetic using a variable of type bool but you need to be very careful
in this regard. If the variable was initialized using a non-zero numeric
value with a fractional part, the arithmetic results are likely to something
other than what you might expect.

Type bool displays as 1 or 0

When you use the insertion operator along with cout to display the
value of a variable of type bool, the result is either 1 or 0, and is not
true or false as you might expect.

Copyright 2005, Richard G. Baldwin. Reproduction in whole or in part in any
form or medium without express written permission from Richard Baldwin is
prohibited.

About the author

Richard Baldwin is a
college professor (at Austin Community College in Austin, TX) and private
consultant whose primary focus is a combination of Java, C#, and XML. In
addition to the many platform and/or language independent benefits of Java and
C# applications, he believes that a combination of Java, C#, and XML will become
the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects and he
frequently provides onsite training at the high-tech companies located in and
around Austin, Texas. He is the author of Baldwin's Programming
Tutorials, which have gained a
worldwide following among experienced and aspiring programmers. He has also
published articles in JavaPro magazine.

In addition to his programming expertise, Richard has many years of
practical experience in Digital Signal Processing (DSP). His first job after he
earned his Bachelor's degree was doing DSP in the Seismic Research Department of
Texas Instruments. (TI is still a world leader in DSP.) In the following
years, he applied his programming and DSP expertise to other interesting areas
including sonar and underwater acoustics.

Richard holds an MSEE degree from Southern Methodist University and has
many years of experience in the application of computer technology to real-world
problems.