Copyright

For information about republishing or referencing material on this site, please email hello@endjin.com

Learning to Program – A Beginners Guide – Part Seven – Representing Numbers

June 26, 2013 by Matthew Adams

In this section we’re going to look at the Data Memory in more detail.

So far, we’ve assumed we know what “store 1 in the memory location at offset 4” means. If we’ve told it to store 1 at offset 4, when we read back the value at offset 4, we get 1 as expected.

We know, then, that a memory location can hold a whole number greater than or equal to zero (the “positive integers”). Is there a maximum number that we can hold in that location?

Let’s write a short program to find out. This uses a loop like the one we saw in the previous exercise to increment r0 all the way up to 255. When r0 reaches 255, we end the loop and proceed to add another 1 to the result, and write it back to the memory at offset 0.

(Don’t forget that you can get the tool we use to run these “assembly language” programs from here. The setup instructions for this F# environment are here.)

load r0 0 ; set the result to 0add r0 1 ; add 1 to the result (compare r0 255 ; is the result 255?jumpne -2 ; if not, jump back up 2add r0 1 ; add another 1 to the result (255 + 1=??)write 0 r0 ; write the result to the memory at offset 0exit

If you run this, you'll see R0 increase one-by-one, until it reaches 255. Then the final few lines of the output are as follows:

You can see that when we reached the value 255 in R0, the comparison set the FL register to 0, so the JUMPNE instruction did not decrement the program counter, and we went on to execute the ADD instruction.

When we added 1 again, R0 seemed to reset back to 0!

What's going on?

Well, it's all to do with the fact that each memory location in our computer is a fixed size called a byte (remember that the x86 MOV to memory instruction referred to a BYTE in the previous section). What's a byte? And what's the maximum number we can represent with a byte? To understand, we need to look at binary arithmetic.

Binary Arithmetic

Remember in the "How does a computer work" section we talked about transistors being switches that can be On and Off. It turns out that we can represent integers just using On and Off values.

Representing numbers in different bases

We're all familiar with decimal arithmetic - counting 0…1…2…3…4…5…6…7…8…9… When we run out of the symbols we use for numbers, we then add a second digit 10…11…12…13…14…15…16…17…18…19, then 20…21…22…23…24…25…26…27…28…29 and so on, until we run out of symbols again. Then we add a third digit 100…101…102…103…104… and so on. We start out by counting up digit1 (0-9). Then, when we add a second digit, the number is (ten x digit2) + digit1. When we add a third digit, the number is (ten x ten x digit3) + (ten x digit2) + digit1. And so on.

Here are a couple of examples (like you need examples of the number system you use every day!). But notice specifically that each digit represents an additional power of 10. So we call this "base 10" (or decimal). (Remember that anything to the power 0 is 1.)

Digit Multiplier

100

10

1

(As power)

102

101

100

Base 10: 234

2

3

4

Base 10: 84

0

8

4

This is so familiar to us, we've usually forgotten that we were once taught how to interpret numbers like this. (Often, this was done with cubes of wood or plastic in blocks of 1, 10, 100 etc. when we were very tiny.)

Imagine, instead, that we only had symbols up to 7.

We'd count 1…2…3…4…5…6…7… Then, when we ran out of symbols, we would have to add a digit and start counting again 11…12…13…14…15…16…17… and then 20…21…22…23…24…25…26…27… and so on until 100…101…102…103…104…105…106…107…

As with decimal, we start out by counting up digit1 (0-7). Then, when we add a second digit, the number is (eight x digit2) + digit1. When we add a third digit, the number is (eight x eight x digit3) + (eight x digit2) + digit1. And so on.

We call this counting system "base 8" (or octal). As you might expect, each digit represents an increasing power of 8.

Here are some examples of values in base 8

Digit Multiplier

64

8

1

(As power)

82

81

80

Base 10: 256Base 8: 400

4

0

0

Base 10: 84Base 8: 124

1

2

4

What if we had more symbols for numbers than the 10 (0-9) that we're familiar with? What if we had sixteen, say? We call this "base 16" or hexadecimal (hex for short). Rather than resort to Wingdings or Smiley Faces for the extra symbols, it is conventional to use the letters A-F as the 'numbers' greater than 9.

Digit Multiplier

256

16

1

(As power)

162

161

160

Base 10: 256Base 16: 100

1

0

0

Base 10: 84Base 16: 54

0

5

4

Base 10: 255Base 16: FF

0

F

F

Base 10: 12Base 16: C

0

0

C

Now, imagine we're really symbol-poor. We've only got 0 and 1 (the "off" and "on" of the transistor-switches in our processor). It works just the same. First, we count units: 0…1 We've run out of symbols, so we add a digit and count on 10…11. Out of symbols again, so add another digit: 100…101…110…111. We run out of symbols really quickly, so we need another digit: 1000…1001…1010…1011…1100…1101…1110…1111 and so on.

Here are some examples in "base 2" (or binary).

Digit multiplier

128

64

32

16

8

4

2

1

(As power of 2)

27

26

25

24

23

22

21

20

Base 10: 255

1

1

1

1

1

1

1

1

Base 10: 84

0

1

0

1

0

1

0

0

Look at the first example in that table. The decimal (base 10) number 255 is represented by a full set of 1s in each of 8 binary digits.

(Saying "binary digit" is a bit long winded, so we've shortened it in the jargon to the word bit.)

We call a number made up of 8 bits a byte.

We number the bits from the right-most (0) to the left-most (7). Because the right-most represents the smallest power of 2, we call it the least significant bit (LSB). The left-most bit represents the largest power of 2, so we call it the most significant bit (MSB).

MSB

LSB

Bit number

7

6

5

4

3

2

1

0

Power of 2

27

26

25

24

23

22

21

20

As we've seen, the maximum (decimal) number we can store in a byte is 255 (a '1' in all 8 bits).

MSB

LSB

Bit number

7

6

5

4

3

2

1

0

Power of 2

27

26

25

24

23

22

21

20

Base 10: 255

1

1

1

1

1

1

1

1

The fact that something screwy happens when we exceed this 8-bit maximum value hints that our register R0 is probably 1 byte in size.

But why does something screwy happen?

To understand that, we need to learn about binary addition.

Binary Addition

We're so used to doing decimal addition, that we probably don't even think about it. But let's remind ourselves how we add up regular decimal numbers by hand.

We write the numbers to be added in a stack, with each equivalent digit lined up in columns. Then, starting from the right hand column (which, remember, we call the least significant) we add up the total of that column and record it at the bottom. If the total is greater than 9 (the biggest number we can represent in a single column), we "carry" that number into the next column, and include it in the addition for that column.

If a particular number has no value in the column, we treat it as a 0

Here are a couple of examples

054 + 163 ----- 217carry 1

0099 +0999 ---- 1098carry 111

We can do exactly the same thing for binary addition. Here's an example of two bytes being added together:

00000101 +00001011 -------- 00010000carry 1111

Now, let's see what happens when we add 1 to a byte containing the decimal value 255:

11111111 +00000001 -------- 100000000carry 11111111

But, hang on a minute - that's 9 bits! And a byte can contain only 8 bits. We can't just make up a new '9th' bit.

What happens is that we're left with the least significant 8 bits (i.e. the right-most 8 bits) - and they are binary 00000000 - or 0 in decimal!

And that's why our program wrapped round to 0 when we added 1 to 255.

It's all very well knowing this limitation, but how can we represent larger numbers?

One way is to increase the number of bits in your storage.

It turns out that our computer has some larger registers called WR0, WR1, and WR2 that can do just that. Each of these registers can store values up to 16 bits (or two bytes) in size. We sometimes call a 16 bit value a short (and on computers with a 16-bit heritage, a word).

Spot test: What's the largest number you can represent with 16 bits?Hint: 28 = 256

Answer: If the largest decimal you can store in 8 bits is 255 (= 28 - 1), then the largest you can store in 16 bits is (216 - 1) = 65535

With n bits, you can store a positive decimal integer up to (2n - 1).

Let's load up a number larger than 255 into one of these 16 bit registers. We'll create a new program to do that.

LOAD WR0 16384EXIT

If you run that, you should see the following output. 16384 is loaded into the WR0 register.

It is rather like these two memory locations are storing the number in base 256!

We've seen that we can hold the numbers 0-255 in 1 byte, so when we need to store a larger number, we add a second byte, and count in the usual manner (256 * second byte) + (first byte). You could imagine storing a 24 bit number by adding a third byte, and a 32 bit number by adding a fourth byte and so on. (We sometimes call a 32-bit number a dword, from 'double word' and a 64-bit number a qword, from 'quadruple word'.)

Byte

1

0

(As power of 256)

2561

2560

Base 10: 256

1

0

Base 10: 16384

64

0

As with our Most Significant Bit and Least Significant Bit, we call the byte that stores the largest power of 256 the Most Significant Byte, and the other the Least Significant Byte - or sometimes High Byte and Low Byte.

Ordering the bytes

When we were ordering the bits in our byte, we numbered them from right-to-left, low-to-high. You'll notice that when we store the 16bit number in our 2 8-bit memory locations, we're storing the high byte in the memory location at offset 1, and the low byte in the memory location at offset 0.

High (most-significant) byte

Low (least-significant) byte

Offset in memory

1

0

(As power of 256)

2561

2560

We could equally well have chosen to do that the other way around!

Computers that choose this particular byte ordering are called little endian memory architectures. Our computer does it this way, as does Intel's x86 series. The Z80 and PowerPC are both big endian - they store the high byte in the lower memory offset, and the low byte in the higher memory offset. Some architectures (like the ARM) let the programmer choose which to use.

(Another quick jargon update: when we send data over a network, it is often encoded in a big-endian way, so you sometimes see big-endian ordering called network ordering.)

This is an example of encoding a value into computer memory, in this case, a positive integer. Notice that even with something as simple as a positive integer, we have to think about how it is represented in the computer! We'll see this need for encoding again and again for decimals, text, images and all sorts of data. We need to encode whenever we have to represent some information in a computer. Most of the time, higher-level languages or other people's code will take care of the details for us, but if we forget about it entirely, we open ourselves up to all sorts of seemingly-mysterious behaviour and bugs.

Displaying multi-byte values

Sometimes, displaying numbers in decimal is not the most convenient way to read them - particularly when we are looking at multi-byte values.

Here's the decimal number 65534 represented as two decimal bytes

Low byte

High byte

254

255

Now, look what happens if we represent it hexadecimal (base 16) instead of decimal.

Here's 65534 in hex:

FFFE

And here it is represented as two hexadecimal bytes in memory

Low byte

High byte

FE

FF

Notice that they are represented using exactly the same digits (barring the byte-ordering). Whereas the decimal values "255 254" look nothing like "65534".

It can be very convenient to get used to reading numbers in base 16 (hex), because each byte is always just a pair of hex digits, and their representation in memory is very similar to the way you would write them on the page.

We've got a handy switch in our program that lets us start to represent the output in hex, instead of decimal.

What if we want to write the values in our program in hex? How do we distinguish between the decimal value 99, and the hex value 99 (=153 decimal)? Different languages use different syntax, but our model computer supports two of the most common - both of which use a prefix. You either add the prefix 0x, or use the prefix #.

(You might have seen that # prefix used when specifying colours in HTML mark-up)

LOAD WR0 #FFFEWRITE 0 WR0EXIT

Or

LOAD WR0 0xFFFEWRITE 0 WR0EXIT

Try updating the program to use one of these hex representations and run it again.

If you want to, you can also use a binary notation to specify a number. For that we use the prefix 0b
(This is a less common notation, and is not supported across all languages, but it is quite useful.)

Representing negative numbers

So far, (almost) all of the numbers that we've represented have been the positive integers.

How do we represent negative numbers?

This is much the same question as "how do we do subtraction". Why? You'll probably recall from basic maths that subtracting the number B from the number A is equivalent to adding the negative of B to A. You might have seen that written down in this form:

(A - B) = A + (-B)

So, perhaps we can use the idea of subtraction to help us to represent a negative number?

We've already seen a sort of example of subtraction: what happened when we saw the addition of 8-bit numbers carry beyond an 8-bit value? Let's remind ourselves.

For 8-bit values, if we try to add two numbers such that the result would be greater than 255, we are left with the least significant 8-bits, and we drop the carried-over 9th bit.

Here are a few examples

255 + 1 = 0 (with carry)

255 + 2 = 1 (with carry)

255 + 3 = 2 (with carry)

(You can write a short program to test that, if you like.)

load r0 255load r1 255load r2 255add r0 1add r1 2add r2 3exit

What we know about basic arithmetic tells us that:

255 - 255 = 0
255 - 254 = 1
255 - 253 = 2

If we replace the right-hand-side of the first expression with the left-hand-side of the second expression, we get

255 + 1 = 255 - 255
255 + 2 = 255 - 254
255 + 3 = 255 - 253

This would seem to imply that

1 = (- 255)
2 = (-254)
3 = (-253)

How is this possible?!

Well, we've already answered that question: it is precisely because there is a maximum number we can represent in a fixed number of bits, and the way that larger numbers carry over.

Let's represent those numbers in binary:

00000001 = -(11111111)
00000010 = -(11111110)
00000011 = -(11111101)

A quick rearrangement of the expression shows us that that that is really true!

So, for every positive number that we can represent in binary, there is a complementary negative number.

We call this representation of a negative number the two's complement representation.

It is fairly easy to calculate - you just take the binary representation of the absolute number (the number without its sign) and flip all of the 0s to 1s, and 1s to 0s (we call this the one's complement of the number), then add 1, to make the two's complement.

Let's have an example of that: what's the two's complement representation of the number -27, in an 8-bit store?

First, we need to represent 27 in binary:

00011011

Then we flip all the bits to get the one's complement representation:

11100100

Finally, we add 1 to get the two's complement representation:

11100101

So, -27 in two's complement binary notation is 11100101

We can test this

In decimal, 60 - 27 = 33

60 in binary is 00111100
-27 in two's complement binary notation is 11100101

So, 60 - 27 in binary is 00111100 + 11100101

That is 00100001 (with a carry), which is 33 in decimal, as required!

Originally, we stressed that we could store positive integers from 0 - 255 in an 8-bit value.

Now, we have added the possibility of storing both positive and negative integers in an 8-bit value, by the use of a two's complement representation.

So, every number has an equivalent two's complement "negative" number. How do we tell the difference between 11100101 when it is meant to be -27 and 11100101 when it is meant to be 229?

The short answer is that you can't - it is up to the programmer to determine when a result might be negative and when it might be positive.

We could specify (by documentation, for example) that a particular value we were storing was unsigned (i.e. a positive integer) or signed (i.e. any integer, positive or negative).

It turns out we can remove the ambiguity in the case of the signed integer by limiting the maximum and minimum values we can represent.

Think about the decimal numbers 1…127.

These can be represented in binary as 0b00000001…0b01111111.

Now, think about the decimal numbers -127…-1.

These can be represented in binary as 0b10000001…0b11111111.

Notice how the positive numbers all have a 0 in the top bit, and the negative numbers all have a 1 in the top bit. We often call this the sign bit.

If we choose to limit the numbers we represent in a single byte to numbers that can be represented in 7 bits, rather than 8, then we have room for this sign bit, and there is no ambiguity about what we mean. But how can we tell whether some memory is signed or not? The short answer is that we can't. The only way to tell is to document how the storage is being used. We're reaching the limits of what we can conveniently express in such a low-level language. We need to move to a higher-level of abstraction, and a richer language, to help us with that.

Copying a value from a smaller representation into a larger representation

There's a little wrinkle that happens when we copy a value from a smaller representation. Here's decimal 1 as a byte
0b00000001. And here's decimal 1 as a 16-bit value: 0b0000000000000001. So far so good.

What about -127? That's 0b10000001 in a byte, but it's 0b1111111110000001 when stored in a word.

Notice that in each case, the sign bit gets extended across the whole of the most-significant byte when you copy from a byte to a word (the zero for the case of a positive number, or the 1 for a negative number.)

This sign extension when you copy between storage sizes is very important: 0b10000001 is -127 decimal when stored in a byte, but naively copy that into a word and it becomes 0b00000000100000001, which is 129 decimal!

Copying a value from a larger representation into a smaller representation

There's an obvious danger with copying numbers from a larger representation (e.g. a word) to a smaller one (e.g. a byte) and that's called truncation.

There's no problem if the number can be fully represented in the number of bits of the target representation e.g. 1 is 0x0001 in 16 bits, and 0x01 in 8 bits - you just lose the high byte. Similarly, -1 is 0xFFFF in 16 bits, and 0xFF in 8 bits - you just lose the high byte again, and it remains correct. But what about 258? That's 0x0102 in 16-bits, but if you lose the high byte, it becomes 0x02 - not the number you were thinking of at all!

Most higher level languages will warn you about possible truncation (or even prevent you from doing it directly). But it is a very common source of bugs in programs.

Summary

In this section, we've looked at how we can represent numbers (specifically the integers) in data memory.

We've learned how we can convert the number between different bases - in particular base 10 (decimal), base 16 (hexadecimal) and base 2 (binary).

We then looked at binary arithmetic, and saw what happens when we add binary numbers together.

We investigated the largest value we can store in one of our registers, and determined that it represented an 8-bit value (which we call a byte).

We then looked at what happens when we try to store a number larger than that which a particular register can hold, and introduced the concept of a 16-bit number.

Next, we looked at how a 16-bit (or larger) number can be represented in memory, including the concepts of big-endian and little-endian memory layouts. We also switched our computer over to displaying values using hex instead of decimal, and looked at why we might want to do that.

Finally, we looked at how to store negative numbers in a consistent way, such that we could perform basic addition and subtraction, by using the two's complement representation, and how to constrain the range of numbers we can store, to remove the ambiguity when storing positive and negative numbers.

Next time, we'll look at how we can bring logic to bear on our problems.

If you enjoyed this post, please consider sharing it on social media using the buttons below. Thanks!

Related

About the author

Matthew was CTO of a venture-backed technology start-up in the UK & US for 10 years, and is now a Founder of Endjin Ltd, which provides technology strategy, experience and development services to its clients who are seeking to take advantage of Microsoft Azure and the Cloud.
You can follow Matthew on twitter.