In my previous
article, I introduced the C# programming language and
explained how it works in the context of the Mono environment, an open-source
implementation of Microsoft's .NET framework. I will now go on to some details
on the data types supported by the C# programming language.

In the subsequent discussion I will use the following diagrammatic notation
to represent variables and objects:

The variable diagram is a cubic figure that depicts three traits (name, value
and type) relevant during the compilation and execution of a program. In the von
Neumann architecture tradition, we will consider a variable as a chunk of memory
in which we hold a value that can be read or overwritten. The object diagram is
a rounded edge rectangle that denotes an object created at runtime and allocated
in a garbage collectable heap. For any object in a certain point in time, we
know what type (class) it is, and the current values of its instance
variables.

Type Categories

In the C# programming language, types are divided in three categories:

value types

reference types

pointer types

In a variable that holds a value type, the data itself is directly
contained within the memory allotted to the variable. For example, the following
code

int x = 5;

declares an 32-bit signed integer variable, called x,
initialized with a value of 5. The following figure represents the corresponding
variable diagram:

Note how the value 5 is contained within the variable itself.

On the other hand, a variable that holds a reference type contains the
address of an object stored in the heap. The following code declares a variable
called y of type object which gets initialized, thanks
to the new operator, so that it refers to a new heap allocated object
instance (object is the base class of all C# types, but more of
this latter).

object y = new object();

The corresponding variable/object diagram would be:

In this case, we can observe that the "value" part of the variable
diagram contains the start of an arrow that points to the referred object.
This arrow represents the address of the object inside the memory
heap.

Now, let us analyze what happens when we introduce two new variables and do
some copying from the original variables. Assume we have the following code:

int a = x;
object b = y;

The result is displayed below:

As can be observed, a has a copy of the value of x.
If we modify the value of one of these variables, the other variable would
remain unchanged. In the case of y and b, both
variables refer to the same object. If we alter the state of the object using variable
y, then the resulting changes will be observable using variable b,
and vice versa.

Aside from references into the heap, a reference type variable may also
contain the special value null, which denotes a nonexistent object.
Continuing with the last example, if we have the statements

y = null;
b = null;

then variables y and b no longer refer to any
specific object, as shown below:

As can be seen, all references to the object instance have been lost. This
object has now turned into "garbage" because no other live reference to it
exists. As noted before, in C# the heap is garbage collected, which means that
the memory occupied by these "dead" objects is at sometime
automatically disposed and recycled by the runtime system. Other languages, such
as C++ and Pascal, do not have this kind of automatic memory management scheme.
Programmers for these languages must explicitly free any heap allocated memory
chunks that the program no longer requires. Failing to do so gives place to
memory leaks, in which certain portions of memory in a program are wasted
because they haven't been signaled for reuse. Experience has shown that explicit memory de-allocation is
cumbersome and error prone. This is why many modern programming languages (such as Java,
Python, Scheme and Smalltalk, just to name a few) also incorporate garbage
collection as part of their runtime environment.

Finally, a pointer type gives you similar capabilities as those found
with pointers in languages like C and C++. It is important to understand that
both pointers and references actually represent memory addresses, but that's
where their similarities end. References are tracked by the garbage collector,
pointers are not. You can perform pointer arithmetic on pointers, but not on
references. Because of the unwieldy nature associated to pointers, they can only
be used in C# within code marked as unsafe. This is an advanced
topic and I won't go
deeper into this matter at this time.

Predefined Types

C# has a rich set of predefined data types which you can use in your
programs. The following figure illustrates the hierarchy of the predefined data types
found in C#:

Here is a brief summary of each of these types:

Type

Size in
Bytes

Description

bool

1

Boolean value. The only valid literals are true
and false.

sbyte

1

Signed byte integer.

byte

1

Unsigned byte integer.

short

2

Signed short integer.

ushort

2

Unsigned short integer.

int

4

Signed integer. Literals may be in decimal (default)
or hexadecimal notation (with an 0x prefix). Examples: 26,
0x1A

C#'s has a unified type system such that a value of any type can
be treated as an object. Every type in C# derives, directly or indirectly, from
the object class. Reference types are treated as objects simply by
viewing them as object types. Value types are treated as objects by
performing boxing and unboxing operations. I will go deeper into
these concepts in my next article.

Classes and Structures

C# allows you to define new reference and value types. Reference types are
defined using the class construct, while value types are defined
using struct. Lets see them both in action in the following
program:

First we have the structure ValType. It defines two instance
variables, i and d of type int and double,
respectively. They are declared as public, which means they can be accessed
from any part of the program where this structure is visible. The structure
defines a constructor, which has the same name as the structure itself and,
contrary to method definitions, has no return type. Our constructor is in charge
of the initialization of the two instance variables. The keyword this
is used here to obtain a reference to the instance being created and has to be
used explicitly in order to avoid the ambiguity generated when a parameter name
clashes with the an instance variable name. The structure also defines a method
called ToString, that returns the external representation of a
structure instance as a string of characters. This method overrides the ToString
method (thus the use of the override modifier) defined in this
structure's base type (the object class). The body of this method
uses the string concatenation operator (+) to generate a string of the form
"(i, d)", where i and d represent the
current value of those instance variables, and finally returns the expected
result.

As can be observed, the RefType class has basically the same
code as ValType. Let us examine the runtime behavior of variables
declared using both types so we can further understand their differences. The Test
class has a Main method that establishes the program entry point.
In the first part of the program (marked with the "PART 1" comment) we
have one value type variable and one reference type variable. This is how they
look after the assignments:

The value type variable, v1, has its instance variables contained
within the variable itself. The new operator used in the assignment

v1 = new ValType(3, 4.2);

does not allocate any memory in the heap as we've learned from other
languages. Because ValType is a value type, the new operator is
only used in this context to call its constructor and this way initialize the
instance variables. Because v1 is a local variable, it's actually
stored as part of the method's activation record (stack frame), and it exists
just because it's declared.

Objects referred by reference type variables have to be created explicitly at
some point in the program. In the assignment

r1 = new RefType(4, 5.1);

the new operator does the expected dynamic memory allocation
because in this case RefType is a reference type. The corresponding
constructor gets called immediately afterwards. Variable v2 is also
stored in the method's activation record (because it's also a local variable)
but it's just big enough to hold the reference (address) of the newly created
instance. All the instance's data is in fact stored in the heap.

Now lets check what happens when the second part of the program (marked after
the "PART 2" comment) is executed. Two new variable are introduced and
they are assigned the values of the two original ones. Then, each of the
instance variables of the new variables are incremented by one (using the ++
operator).

When v1 is copied into v2, each individual instance
variable of the source is copied individually into the destination, thus
producing totally independent values. So any modification done over v2
doesn't affect v1 at all. This is not so with r1 and r2
in which only the reference (address) is copied. Any change to the object
referred by r2 is immediately seen by r1, because they
both refer in fact to the same object.

If you check the type hierarchy diagram above, you will notice that simple
data types such as int, bool and char are
actually struct value types, while object and string
are class reference types.

If you want to compile and run the source
code of the above example, type at the Linux shell prompt: