Recent Posts

Meta

Brief Notes on Data Types

I’ve mentioned data types a couple of times by now, and will probably mention them again. Thinking about them is interesting because it’s a window, albeit small, into the way programming languages work. Here are some notes on the nature of data types I’ve thought about, and an invitation to think further.

Data

Data is the plural of datum, and datum is a unit of information. All sensory input we receive is data, and it’s stored in our brains in the form of neural connections. We can pass it along in the form of sound waves (talking), symbols (writing), and generally through the encoding in a medium using a predefined system of symbols. In the case of computers, we store the data in the form of arranged matter, usually moved through magnetized equipment or electrical current in patterns of two states. Thus, binary (encoded) data. Also called digital information, although I find the name too generic (digital information can cover any data encoded using a discrete set values, as opposed to analog data, so while binary data is digital, not al digital data is binary).

Processors

Processors are collections of electronic components arranged in a way that they react transforming collections of electric signals predictably. This is one of the main components of modern computer architectures. When you hear about an 8-, 16-, 32- or 64-bit computer, it usually refers to the size of the collection of signals that the processor handles: a wire (or binary datum) is a bit: a certain voltage represents the a 0, and another one represents a 1. Those digits can be interpreted as part of an n-bit number. They can also be interpreted as instructions. Instructions are numbers are configurations of physical media are any kind of data, because the only medium to express any and all data in a computer is through discrete digits. Understanding this is central to understanding data types.

Data Types

Data types are what happen when you decide to treat sequences of data in a special way. In other words, when you make certain patterns’ meaning and your actions on them dependent on other, predetermined patterns. You need, first, to know what transformations the processor should do on the data (sequences of instructions to be applied on the data) if the data has a particular type. If we didn’t have any programming languages, we’d do this by hand – as instructions are numbers are configurations of physical media, we could do this by setting up a particular medium arranged with data that the processor interprets the way we want. This is what a computer BIOS is, by the way. So, we’d know what data we put where, and what we want to treat it like. If it’s letters, we can make capitalize them or lower their case. If it’s numbers, we can add or subtract them.

We assign arbitrary values to particular arrangements of data – thus the integer number 64 is a sequence 01000000 in the medium is an ‘@’ if interpreted in the context of ASCII encoding. Agreeing on what a particular string of symbols mean is the root of many, many issues in modern computing (read about: ISO-8859-1, ASCII, Unicode). In any case, this interpretation and the set of operations we define on a particular datum is what makes a data type.

Programming Languages & Data Types

As we do have programming languages, the work of determining what to do with a particular datum in a particular moment, what is allowed and what is not, is taken from our hands. All languages enforce rules on what can be done with a particular datum; for instance: what datum can be transformed into which other one and how to go about doing it, or what happens if a datum that was stored as a number needs to be added to a datum that was stored as a letter.

The C programming language offers a very close experience to what you’d see working on assembly language, as its restrictions on what to do with data are very low. Want to add (or subtract, or compare) a number and a character? Go ahead. The operations are executed on the numeric values represented by the data.

Other languages, like Javascript, collapse in a particular way. Want to add a number and a character? Your number is now a character, and you get a collection of characters (i.e.: 1+’a’ = ‘1a’); Want to multiply them? You get NaN (Not a Number).

Some languages prevent you from doing this, and warn you that you’re trying to execute some ambiguous operation, like Python. (It is surprising that 2*’a’ = ‘aa’, but that’s a solid, unambiguous way to interpret it. 2+’a’, though… is the answer ‘2a’ or 99? Python asks you to transform the character into numbers, or viceversa, and refuses to guess).

Epilogue

Some data types are not represented in such a straightforward way as ASCII characters or integer numbers (see: Floating Point numbers). Other data types are built from the group of types the language already knows how to work with; those are called composite, or compound, data types. Perhaps future musings will cover these ones.