It's the endian of the world as we know it

CocoaHeads Boise on twitter asked for an elaboration on big vs little endian. In the query was a very interesting statement: “surprised not mentioned in my C books.” I had never thought about endianness issues being covered in introductory materials .

Usually endianness is a topic that’s brought up when interoperability is involved, such as sending data over a network connection or constructing storage formats for different CPU families. C books typically are more worried about the reader getting up and running with the basics up and running explaining fun stuff like pointers and unions. The interoperability concepts don’t come up.

What is endianness?

The term “endian” refers to how integer data is actually stored as bytes in memory. An integer (aside from the char types) is a collection of multiple bytes, and the bytes have a significant order. Consider a four-byte integer like this:

int squeeze = 8533937;

When printed in hex looks like this:

0x008237b1

You get the actual integer values by doing some basic math, taking each “digit”’s value, and multiplying by the radix raised to the appropriate power:

When looking at the value in a human readable format, such as 0x008237b1, the leftmost byte (0x00 here) is called the “most significant byte”. Numbers here contribute the most to the magnitude of the value. The rightmost byte (0xb1 here), is called the “least significant” byte. It contributes least to the total magnitude.â€¨
OK, so we have an integer that takes up four bytes in memory. There are many ways of representing this in memory. Assuming memory address are increasing from left to right, you can store it like we look at the hex number:

00 82 37 b1

Or, you could store like in the formula for computing the final value:

b1 37 82 00

The first format, with the most significant byte coming first, is called “big endian”. The biggest part, the biggest end, the most significant part, comes first.

The second format, with the least significant byte coming first, is called “little endian”. The smallest part, the littlest end, the least significant part comes first.

The terms come from Gulliver’s Travels, where two kingdoms are at war over which end one should break a soft-boiled egg. Compared to some holy wars in computerdom, this actually ranks as “kind of reasonable”.

Seeing Endianness

This little program (living at this gist) runs through an array of integers. It prints out the integers in human-readable format (which happens to be big endian), and then scans through the bytes of memory and outputs them in memory order:

Modern Macs make it difficult to test the big-endian case. Xcode 4 has dropped PowerPC compiling support, and 10.7 dropped Rosetta, the emulation layer that would let you run PowerPC programs on Intel systems.

So what. Big Deal.

Endianness is a pretty easy concept to grasp - it’s just the order the bytes that comprise an integer are laid out in memory. This is processor dependent. The Motorola 68K family and the PowerPC family of processors are big endian. The intel x86 family, along with platforms of interest to us old folks like the VAX, 6502, and Z80, are little endian.

Some processors can go both ways. The Sparc processor has traditionally been a big endian processor, but now can be run either way. Same with the ARM that powers iDevices. It can be run in a big endian or little endian manner. iOS runs it in little endian manner, but you can, at the assembly language level, switch the endianness for sections of code.

We don’t usually have to worry about this stuff on a day-to-day basis. When it comes to interoperability, these low-level representational details can leak through. The classic example is sending integers over the network. It doesn’t really matter what format is sent over the wire, so long as both sides agree on a byte order.

To help standardize things, big endian is considered the “network byte order”. If you send integers over the network to someone, and you don’t already have other rules in place, then send your integers in network byte order. If you’re on a big endian machine you don’t have to do anything at all. This is your native byte order.

If you’re on a little endian machine, though, you’ve got some work to do. You need to permute your bytes around so that they’re in big endian order before you write it to your socket, and you need to scramble bytes around so they go back to little endian order after you read from your socket. It’s extra busywork, but necessary.

htonl – host to network long. If the machine is not already in network byte order, swap a 32-bit int’s worth, otherwise it’s a no-op

htons – same thing, but for a 16-bit int’s worth (s for short)

``ntohl – network to host long (32-bit). Similarly, if the data is in network byte order, swap it around so that it’s consumable for a little-endian machine

ntohs – same thing, but for 16-bit values.

Cocoa has a bunch of calls, easily over 30, for different byte swapping situations. You can ask what the current byte order is with NSHostByteOrder. NSSwapInt will permute an int no matter what byte order it is. Useful if you know that an int is big endian and you know you want it in little endian. If you’re mainly just interested in going to network byte order no matter what byte order you’re running in, you can use the NSSwapHost* calls, like NSSwapHostIntToBig.

Endianess and Audio

Another part of CocoaHeadsBoise’s question related to Core Audio. I must admit, I’ve done next to nothing with Core Audio outside of working through the excellent Learning Core Audio by Chris Adamson and Kevin Avila.

Browsing through the Core Audio Data Types Reference there are some of reference to endianness, such as ASBD (audio stream basic description, a common data type) format flags kAudioFormatFlagIsBigEndian and kAudioFormatFlagNativeEndian. “Native” endian meaning little endian. So if you’re converting sound samples from one format to another, and they differ in endianness, you’ll have to swap the bytes around as part of your other work.