If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Optimizing Code - Data types

Although I don't do a lot of Open Source development, there is a lot I've learned in my almost 30 years of development, which might be useful to share.

Most compilers these days have built in Optimizers, some of which, (Like IBM COBOL 3&4, C# .Net 2/3/4) are designed to make use of both CPU advanced functions and compiled code reorganization to achieve faster run times.

In practice though I have found that at times, a little work by a developer can lead to even faster run times, which becomes very important when you are dealing with large quantities of data. In the Mainframe world, that usually means over 80,000 records or rows being processed in code, in .Net world, it varies on CPU but usually when you hit records or rows over 15,000 it something to consider.

One of the simplest tips to speed up code, is to use the smallest data type possible. For example you need a field that stores some number of choices that are limited. Do you store it in a Integer or an a String? It would make sense at first that integer compares are faster then string compares, but that is not always quite accurate.

For example a Int32 takes exactly 4 Bytes, however if the number of choices is limited to less then 36 values or so, then you could see a performance gain by using a 1 byte string or char data type instead, because of the way compares work. Every time an Integer is compared to another integer, it has to compare the full four bytes (The machine instruction is a memory block compare). In COBOL strings are also memory block compares without char-set enhancements, so a 1 byte field is faster to compare then a 4 byte field. In C# with string compares, even though it has added logic for many char-set, if the first (left most) byte is different, then it doesn't bother to test the rest of the string, so it may be still slightly faster then a integer compare in that case (You may have to test the data types in C# for specific cases to make sure you actually get an improvement though).

The second useful one is when you are doing a lot of Math functions, if you going to be doing a lot of computations of a lot of data, then uses the fastest numeric type possible, for the type of math function you are invoking, based on the CPU type. Int64 is the fastest number type on 64 bit CPU's, Int32 the fastest on 32 bit CPU's, Following that in order of how efficient they are is Long, Short, Byte, Double, Single, Decimal. As you can guess most database designers favor Decimals or variants (Money, Small-money) of it for some numbers, however, that is actually the worst numeric type to chose for math performance over large quantities of data. Sometimes even when dealing with money values, it's better to use Integers with the decimal portion represented as a whole number so 123.45 would be a int with a value of 12345. When you present it to a user on reports of for display, then and only then do you push it to a Double or Decimal, then divide it by 10^number of decimal places, (/ 100 in my example). This can lead to the precision trap though, in that if you need to jump from a 2 decimal precision to a four, you should always multiple a by 10 ^ additional decimals -first-, before you do any math. The precision trap is not a common gotcha, but I've occasionally seen code using integers with implied 2 decimals, and using multiplication or division on them, and expecting the resulting integer to have an implied 4 decimals, which it won't. You always want to bump values to the precision you need before doing the math on them, when you want increased precision over the base type. This isn't a issue as much in COBOL where decimals are always implied, for that the fastest type is Binary, then Packed, then Display-Numeric.

Do you store it in a Integer or an a String? It would make sense at first that integer compares are faster then string compares, but that is not always quite accurate.

For example a Int32 takes exactly 4 Bytes, however if the number of choices is limited to less then 36 values or so, then you could see a performance gain by using a 1 byte string or char data type instead, because of the way compares work.

Generally, this is true, and it is an excellent suggestion for anyone who doesn't know the underlying structure of both the language and processor.
As you advance, you also want to watch how your language stores data. Many languages move primitive data types (like integer) as a value. More complex data types involve a data descriptor to describe access to the value. In some, there is room in the descriptor for some of the data. So; shorter strings are directly accessed in the descriptor. (In Autolisp, this amounted to 3 characters). In some, you can define fixed length strings that are directly addressed rather than addressed through a descriptor.
The movement of the data may negate some of advantages you gain in optimizing word alignments for the processor.

Originally Posted by dgavin

The second useful one is when you are doing a lot of Math functions, if you going to be doing a lot of computations of a lot of data, then uses the fastest numeric type possible...

I was surprised to read then on a 64 bit processor 64 bit operations were faster then 32 bit - i expected them to be equally fast , but I did a test with a simple C program which seems to confirm this - 1 billion iterations of a loop containing 2 additions and an incrementation took 5.15s for 64 bit integers and 6.27s for 32 bit integers. Could someone explain what causes this significant difference? I tried to google it, but all I got was loads of comparisons of 32 and 64bit operating systems, which is a slightly different topic.

Another thing which I thing should be taken into account is that with large data sets using a larger data type will cause more memory traffic between RAM and the CPU, which can slow things down. I ran another test, this time of array[i]+=array[i-1] over a 200 million element array, and the results were 4.20s for 64bit and 3.17s for 32bit integers

Tests performed on an AMD 64bit CPU running a 64bit linux operating system.Times are total execution times averaged over 3 executions, rounded to 2 decimal places

I almost forgot the other thing to consider on integer types, is to always make sure they are alligned on word boundries when you are defining the numbers in a struct when your dealing with primatives. The Pointers to non primatives are always 1 or 2 words long, so those you don't have to worry about. On 64 bit CPU allign to double word boundries. You usualy only need to worry about this when you start getting into the 200,000 row/record data set sizes or larger.

For example the following struct is less effcient

struct
{ int32 number1; char[1]; int32 number; string Something; }

then this one which is faster

struct
{ int32 number1; string Something; int32 number; char[2]; }

Some compilers will attempt to insert 'slack bytes' in the fist example to force things into word allignments, such as in cobol. However in C++ or C# thats is not the case. Even Cobol doesn't always slack byte correctly. It usualy better to have all your numbers and refereces types, defined in structures and objects before primatives or arrary that are variable or small sized.

in the case of alligning 32bit integers on a 64 bit CPU it would look something like the following when you intentially add slack byte areas to allign the int32or the fields following them on a 64bit double word boundry.

For example a Int32 takes exactly 4 Bytes, however if the number of choices is limited to less then 36 values or so, then you could see a performance gain by using a 1 byte string or char data type instead, because of the way compares work. Every time an Integer is compared to another integer, it has to compare the full four bytes (The machine instruction is a memory block compare).

Sorry, this is flat-out incorrect. Integer comparison uses hardware CMP instruction, which operates on data with the length equal to that of machine word. That means 32 bits on 32-bit computer. In fact, that's very easy to see for yourself.

The whole compare() function effectively compiles to two instructions (bolded). The first one MOVes the first (32-bit) variable from stack into the EAX register, while the second one CoMPares the EAX register with the second variable on stack.

If we now switch from 32-bit integers to 8-bit integers (i.e. change int32_t to int8_t), the function looks like that:

As you can see, there is no profit to be had, and even worse, there is a loss, because you can see the compiler doing two extra MOV's. (I am not sure why GCC is doing things that way, though. I'd simply cast both arguments to int32 and do a 32-bit hardware compare, resulting in a code identical to the first one).

Fun fact: optimized string compare functions actually use 32-bit compare to decrease the number of required CMP and MOV instructions by the factor of 4 (i.e. they compare 4 bytes each time instead of just one). See e.g. http://www.opensource.apple.com/sour...n/ppc/memcmp.s

Code:

; int memcmp(const void *LHS, const void *RHS, size_t len);
;
; Memcmp returns the difference between the first two different bytes,
; or 0 if the two strings are equal. Because we compare a word at a
; time, this requires a little additional processing once we find a
; difference.
; r3 - LHS
; r4 - RHS
; r5 - len

Originally Posted by dgavin

The second useful one is when you are doing a lot of Math functions, if you going to be doing a lot of computations of a lot of data, then uses the fastest numeric type possible, for the type of math function you are invoking, based on the CPU type. Int64 is the fastest number type on 64 bit CPU's, Int32 the fastest on 32 bit CPU's, Following that in order of how efficient they are is Long, Short, Byte, Double, Single, Decimal.

Theoretically speaking, on a 64-bit machine, there should be no difference in performance between operating on 32-bit and 64-bit ints, because all operations will be done on 64-bit hardware registers anyway. On a 32-bit machine, operations on 64-bit ints will be slower than operations on (native) 32-bit ints, because 64-bit operations have to emulated by combining multiple 32-bit operations. (Any book on assembly language will have details).

There is however a fundamental difference in perfomance between using integer types and floating point types. As a general rule, integer operations are faster. By how much, depends on hardware.

Also, on a machine which has hardware support for floating point double precision numbers (meaning any processor with an FPU nowadays), there is no profit to be had from using single precision, because the compiler will cast the variable to double, do the calculations on hardware doubles, and truncate the result to single. So if anything, single will be slower because of extra casting required.

I just wanted to share a little anecdote about code optimization.
Decades ago, we were comparing calculation speeds between VAX, RISC, and x86 architectures.
No matter how many iterations or how complex the processing done in the iterations, we saw virtually no elapsed times on the VAX.
It turned out that the optimizer was smart enough to know we weren't doing anything with the result, so it removed all the code out of the compiled version since it deemed it unnecessary. A final print statement took care of that.

This is a very architecture-dependent subject. Some architectures can address 1, 2, 4, etc byte values equally well, others have to do more work for some than for others. In some cases, things like sign extension must be done when converting to the native word type for working with. (x86 happens to have variants of the mov instruction that do sign extension)

Generally, you can avoid any overhead due to conversions or oddball addressing by using the machine type and ensuring alignment (which is generally done automatically by compilers, with you needing to specify otherwise if you want to pack data without regard to alignment). There are other factors to consider as well...unnecessarily large data types fit less useful data in caches and require more data to be sent across the system bus, written to/read from disk, transmitted over the network, etc. If you're using a SIMD unit or GPU for doing computations, using a smaller data type lets you operate on more values simultaneously, which can far outweigh the cost of converting to and from that data type.

The differences in performance directly due to data type are generally minor, though. They can be important in tight loops that a lot of time is spent in, and the memory requirements can be a big deal when handling large amounts of data, but there's often much larger gains to be had in data structure and algorithmic improvements. A brute force nearest-neighbor search is going to run in linear time, while a nearest-neighbor query in a kd-tree structure will only take logarithmic time, which in a large set of points could be several orders of magnitude of improvement.

A related example: kd-tree construction using the obvious approach involves first sorting the points along each axis, a time consuming (typically O(n*log(n))) process, but you can instead estimate a good splitting plane without sorting anything by sampling random points...the resulting tree may not be ideal, but it'll probably be close, and far better than the brute force approach while not being as slow to construct as with the sorting approach. Alternatively, if you're constantly modifying the data, a structure that's easier to change might be more appropriate, like an octree.

Sorry, this is flat-out incorrect. Integer comparison uses hardware CMP instruction, which operates on data with the length equal to that of machine word. That means 32 bits on 32-bit computer. In fact, that's very easy to see for yourself.

The whole compare() function effectively compiles to two instructions (bolded). The first one MOVes the first (32-bit) variable from stack into the EAX register, while the second one CoMPares the EAX register with the second variable on stack.

If we now switch from 32-bit integers to 8-bit integers (i.e. change int32_t to int8_t), the function looks like that:

As you can see, there is no profit to be had, and even worse, there is a loss, because you can see the compiler doing two extra MOV's. (I am not sure why GCC is doing things that way, though. I'd simply cast both arguments to int32 and do a 32-bit hardware compare, resulting in a code identical to the first one).

Fun fact: optimized string compare functions actually use 32-bit compare to decrease the number of required CMP and MOV instructions by the factor of 4 (i.e. they compare 4 bytes each time instead of just one). See e.g. http://www.opensource.apple.com/sour...n/ppc/memcmp.s

Code:

; int memcmp(const void *LHS, const void *RHS, size_t len);
;
; Memcmp returns the difference between the first two different bytes,
; or 0 if the two strings are equal. Because we compare a word at a
; time, this requires a little additional processing once we find a
; difference.
; r3 - LHS
; r4 - RHS
; r5 - len

Theoretically speaking, on a 64-bit machine, there should be no difference in performance between operating on 32-bit and 64-bit ints, because all operations will be done on 64-bit hardware registers anyway. On a 32-bit machine, operations on 64-bit ints will be slower than operations on (native) 32-bit ints, because 64-bit operations have to emulated by combining multiple 32-bit operations. (Any book on assembly language will have details).

There is however a fundamental difference in perfomance between using integer types and floating point types. As a general rule, integer operations are faster. By how much, depends on hardware.

Also, on a machine which has hardware support for floating point double precision numbers (meaning any processor with an FPU nowadays), there is no profit to be had from using single precision, because the compiler will cast the variable to double, do the calculations on hardware doubles, and truncate the result to single. So if anything, single will be slower because of extra casting required.

What I said is correct, when your talkign more then one specific architecture and example, CMP, is Compare Operand which can work on any two blocks of memory that are the same size, there is also CMPSW Compare Word(s), CMPSB Compare Bytes, CMPSD Compare String Double Word, CMPSQ Compare String Quadword. On the z\OS platform they have various additional compares for Halfwords, Bytes, Doublewords, Quadwords. So even your C compiler might use a CH Compare Halfword for an Int16 depending on the target CPU.

When you're talking more than one architecture, the only thing certain is that the answer is "It's architecture dependent".

Your observations are generally wrong for most modern architectures though, because they tend to optimize operations for one word size (or even for multiples words when you're talking memory access since accessing a whole cache line is faster than anything else), and smaller operations require additional time to mask off unwanted information.

And for several architectures, floating point arithmetic is is faster than integer arithmetic (when optimized) because they have more floating point processing units for a deeper/faster/more parallel floating point pipeline, so they'll be able to do multiple floating-point operations per clock cycle..

Last edited by HenrikOlsen; 2012-Jul-17 at 12:24 PM.

__________________________________________________
Reductionist and proud of it.

Being ignorant is not so much a shame, as being unwilling to learn. Benjamin Franklin
Chase after the truth like all hell and you'll free yourself, even though you never touch its coat tails. Clarence Darrow
A person who won't read has no advantage over one who can't read. Mark Twain

When you're talking more than one architecture, the only thing certain is that the answer is "It's architecture dependent".

Your observations are generally wrong for most modern architectures though, because they tend to optimize operations for one word size (or even for multiples words when you're talking memory access since accessing a whole cache line is faster than anything else), and smaller operations require additional time to mask off unwanted information.

I fairly certain I did say that earlier yes, as well as that you should test optimizing code out in some circumstances to check if it does work or not. Generaly speaking the fastest datatype is the one that is native for the CPU code will run on. By native meaning that the compare compiles down to a single machine operation. The PS3 chip set, also a modern CPU per definition, also supports half word, quadword, word and double word instructions, much like the z/OS CPU (x86 is one of the few that doesn't actually). Infact since the z-9 processor, the Z platform CPU includes all three instruction sets, z-series, PS3, and x8086-64 (Based on AMD-Opteron instruction set).

Infact the new z-198 CPU supports running multiple Windows server instances, PS3 Server instances, Unix & Linix, and z-OS instances, on a single mainframe. All -without- CPU op code translation by VM software. Far as I know thats the only CPU made that can provide 3 different complete instruction sets in two different native character sets, so when were talking modern, thats where it's at.

Anyway arguing what instructions are the fastest on which platform, or what is considered modern or not, is not the intention of this thread. Which as two other people reiterated while it's platform dependant, in simplest terms it boils down to using the fastest datatype for the targeted run CPU.

So Decimals have much higher precision and are usually used within monetary (financial) applications that require a high degree of accuracy. But in performance wise Decimals are slower than double and float types. Double Types are probably the most normally used data type for real values, except handling money. More about....Decimal vs Double vs Float

Why not use long integers for finance and keep track of the money in pennies? It can still print reports in dollars by inserting decimals points. Or are banks keeping track of fractions of pennies now?

I've bounced around computing for a while. A good optimizer can easily beat the best hand-coding, by pulling things like invariants out of loops (the compilers I worked with in both the mainframe world and today's systems could do this, with a little help). When I was doing PLM programming, we also relied on the database (DB2),especially for sorting.

The machines I started on were, however, word, not byte, addressed, so using bytes or half-word integers was actually slower than using full-word integers or floating point numbers. For current systems, this may not be true, but most of my more serious coding has been numerical, where those bits drifting into oblivion can be a more severe problem than saving a few seconds.

Last edited by swampyankee; 2016-May-15 at 10:09 PM.

Information about American English usage here. Floating point issues? Please read this before posting.

So Decimals have much higher precision and are usually used within monetary (financial) applications that require a high degree of accuracy. But in performance wise Decimals are slower than double and float types. Double Types are probably the most normally used data type for real values, except handling money. More about....Decimal vs Double vs Float

That page is terrible, it misses the most important characteristic of decimal types while giving outright misinformation about their capabilities.

The defining trait of decimal numbers is that they are base 10. The fact that .Net's particular implementation of decimal numbers uses 128 bits is actually completely irrelevant, 128-bit quad-precision binary floating point implementations exist as well. The issue is that there are numbers such as 0.1 which have finite, exact representations in base 10, but which form an infinite repeating decimal in base 2: 0.110 = 0.0001100110011...2. This is why decimal types are used in financial systems: being unable to exactly represent 10 cents would be somewhat of a problem for a bank.

And decimals are no more capable of exactly representing numbers in general than binary floating point numbers are. Both binary and decimal representations are unable to exactly represent 1/3: 0.333...10 = 0.010101...2, the first will have to be rounded to a finite number of digits, the second to a finite number of bits. Irrational numbers such as pi have no exact, finite representation in any rational base. What decimal avoids is the rounding errors in the conversion between the binary and decimal representations.

Originally Posted by Chuck

Why not use long integers for finance and keep track of the money in pennies? It can still print reports in dollars by inserting decimals points. Or are banks keeping track of fractions of pennies now?

And yes, you can use large integers, with a scale factor to achieve any desired fixed point precision instead of using a floating point representation. More specialized financial applications may benefit from having the system track the precision of the number, which is what floating point representations do, and may need a wider range of numbers than can easily be handled in fixed point.