endian : Java Glossary

Java stores binary values internally and in files MSB (Most Significant Byte)
first, i.e. high order part first. This is referred to as big-endian byte sex or
sometimes network order. Java binary files, Java sockets and OpenType font files also
use big-endian order. What do you do if your data files are in little-endian format
as would be the case for most Windows-95 binary files?

Your Options to Solving the Endian Problem: a Summary

Everything
in Java binary format files is stored
big-endian, MSB (Most Significant Bit)MSB
first. This is sometimes called network order. This is good news. This means if you
use only Java, all files are done the same way on all platforms Mac,
PC (Personal Computer), Solaris, etc. You can
freely exchange binary data electronically over the Internet or on CD/floppy without
any concerns about endianness. The problem comes when you must exchange data files
with some program not written in Java that uses
little-endian order, most commonly C on the PC.
Some platforms use big-endian order internally (Mac, IBM (International Business Machines)
390); some use little-endian order (Intel). Java hides that internal endianness from
you.

In a binary file, there are no separators between fields. The files are in binary,
not readable ASCII (American Standard Code for Information Interchange).

What do you do if you want to read data not in
this standard format, usually prepared by some non-Java program?

You have five options:

Rewrite the export program that is providing the imported file. It might export
directly in either big-endian binary DataOutputStream or
character DataOutputSream format. See binary formats.

Write a separate translator program that reads and rearranges bytes. You could
write this in any language.

Read the data as bytes and rearrange them on the fly.

Use LEDataInputStream, LEDataOutputStream and LERandomAccessFile
analogs of DataInputStream, DataOutputStream and RandomAccessFile
that work with little-endian binary data. You can read about LEDataStream. You can download the code and source free. You can
get help from the File I/O
Amanuensis to show you how to use the classes. Just tell it you have
little-endian binary data. This is the easiest way.

If you are using Java version 1.4 or later, you can use
nio and the ByteBuffer. order( ByteOrder. LITTLE_ENDIAN ) technique. This the most efficient way.

You Probably Don’t Even Have a Problem!

Most people new to Java coming from C think that they need to code differently
depending on whether the machine they are using internally represents integers as big
or little endian. In Java it does not matter. Further, without resorting to native
classes, there is no way you can even tell how they are stored. The
JVM (Java Virtual Machine)
may store them either way internally but Java is cleverly constructed so that it
never matters. Java has no struct I/O and no unions or any of the other
endian-sensitive language constructs.

The only time endianness becomes a concern is in communicating with legacy
little-endian C/C++
applications.

The following code will produce the same result on either a big or little endian
machine:

Reading Little-Endian Binary Files

The most common problem is
dealing with files stored in little-endian format.

I had to implement routines parallel to those in java.io.DataInputStream which reads raw
binary, in my LEDataInputStream and
LEDataOutputStream classes.
Don’t confuse this with the java.io.DataInput human-readable character-based file-interchange format.

If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream
classes, here is the basic technique. If you are not familiar with how to fudge
unsigned data in java by masking off the high order bits, you might want to read the
unsigned and masking entries first.

Reading Little-Endian Shorts

Presuming your integers are in
2’s complement little-endian format, shorts are pretty easy to handle:
Or if you want to get clever and puzzle your readers, you can avoid one mask since
the high bits will later be shaved

Reading Little-Endian Longs

Reading Little-Endian Chars

In a similar way to short we handle char.
You can also use Character.reverseBytes
in Java version 1.5 or later.

Reading Little-Endian Ints

In a similar way to short we handle int.

Reading Little-Endian Doubles

Floating point doubles are a little trickier. Presuming your data are in IEEE 754 Floating Point little-endian format, you
need something like this:

Reading Little-Endian Floats

floats
are much like doubles. Again, presuming your data are in
IEEE 754 Floating Point little-endian
format you need something like this:

Reading Little-Endian Bytes

You don’t need a readByteLittleEndian since the code would be identical to readByte, though you might create one just for consistency:

Big and little endian byte data are identical. There is nothing to rearrange. If you
wanted to reverse the

Nio and ByteBuffer for handling Little Endian Files

In
Java version 1.4 or later, you can do things like this
to deal with little endian data:
You can use ByteBuffer.order
( ByteOrder. LITTLE_ENDIAN )
to set the endian byte-sex of the buffer to little endian. Then when you use
ByteBuffer. getInt (
intoffset ), it will collect
the bytes least significant first. Note that the offset is specified in bytes, not ints.

Unicode

Unicode comes in both big and little endian variants.
Sometimes the order is marked, sometimes not. For details read about
BOMs (Byte Order Marks)
in the Unicode entry and read up on all
the variant Unicode UTF (Unicode Transformation unit) encodings.

History

In Gulliver’s travels the Lilliputians liked to
break their eggs on the small end and the Blefuscudians on the big end. They fought
wars over this. There is a computer analogy. Should numbers be stored most or least
significant byte first? This is sometimes referred to as byte
sex.

Those in the big-endian camp (most significant byte stored first) include the Java
VM virtual computer, the Java binary file format, the IBM
360 and follow-on mainframes such as the 390 and the Motorola 68K and most mainframes. The Power PC
is endian-agnostic.

Blefuscudians (big-endians) assert this is the way God intended integers to be
stored, most important part first. At an assembler level fields of mixed positive
integers and text can be sorted as if it were one big text field key. Real
programmers read hex dumps and big-endian is a lot easier to comprehend.

In the little-endian camp (least significant byte first) are the Intel 8080, 8086,
80286, Pentium and follow ons and the MOS 6502 popularised by the Apple ][.

Lilliputians (little-endians) assert that putting the low order part first is more
natural because when you do arithmetic manually, you start at the least significant
part and work toward the most significant part. This ordering makes writing
multi-precision arithmetic easier since you work up not down. It made implementing
8-bit microprocessors easier. At the assembler level (not
in Java ) it also lets you cheat and pass
addresses of a 32-bit positive ints to a routine expecting
only a 16-bit parameter and still have it work. Real
programmers read hex dumps and little-endian is more of a stimulating challenge.

If a machine is word addressable, with no finer addressing supported, the concept
of endianness means nothing since words are fetched from RAM (Random Access Memory)
in parallel, both ends first.

Oracle’s Solaris. Normally used as big-endian, but also has support for
operating for little-endian mode, including being able to switch endianness
under program control for particular loads and stores.

Univac 1100

word-addressable

36-bit words.

Univac 90/30

big

IBM 370 clone

Zilog Z80

little

Used in CPM (Cost Per thousand/Mille impressions) machines.

If you know the endianness of other CPUs/OSes/platforms please email me at

Four Byte Sexes

In theory data can have two different byte
sexes but CPUs
can have four. Let us give thanks, in this world of mixed left and right hand drive,
that there are not real CPUs
with all four sexes to contend with.

The four possible byte sexes for CPUs

The Four Possible Byte Sexes for CPU
s

Which Byte
Is Stored in the
Lower-Numbered
Address?

Which Byte
Is Addressed?

Used In

LSB (Least Significant Bit)

LSB

Intel, AMD, Power PC,
DEC.

LSB

MSB

none that I know of.

MSB

LSB

Perhaps one of the old word mark architecture machines.

MSB

MSB

Mac, IBM 390, Power PC

reverseBytes

In Java version 1.5 or later
there is a method part of Integer, Long, Short and Char called reverseBytes that will reverse
the byte sex. These will be most useful to deal with a handful of little-endian
fields. Unfortunately there is no such thing as Float.
reverseBytes or
Double.reverseBytes.