this is actually a question from my interview and also i have run a code to find out that it is millisecinds slower than the random access of the array..

Abhishek Basu
Greenhorn

Joined: Aug 18, 2012
Posts: 3

posted Aug 18, 2012 10:45:24

0

Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)? You can assume that the array is preloaded with data.

If this is true (and i wouldn't be too certain that it is), then I would guess that hardware is often designed to read a block of data at once for efficiency. Much like reading a byte[] from a stream is faster than reading one byte at a time. If the stuff you want to read is sequential, this works great. If not, it results in reading a bunch of stuff you don't actually need, to get the few bits you do need.

Abhishek Basu wrote:Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)?

Don't accept questions with hidden assumptions. The answer to this one is "I don't think it is faster. Do you have some evidence to demonstrate that?"

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3019

10

posted Aug 18, 2012 16:32:10

0

To be fair...

Abhishek Basu wrote:...and also i have run a code to find out that it is millisecinds slower than the random access of the array..

This is good. But "milliseconds" isn't very concrete, and many measurements can easily have an error of milliseconds. So you may need to repeat this experiment by a factor of 10000 or so to make sure the difference is real.

When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3019

10

posted Aug 18, 2012 22:07:22

0

My guess, then, is that for a smaller array, the whole thing gets buffered/cached, and accessing the cache randomly is as fast as accessing it sequentially. But when you exceed the size that can be buffered or cached, you get more slowdown as the system starts jumping around and reading different blocks.

I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

Paul Clapham wrote:I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

I suspect for Java it would definitely be rows, since 2D arrays are arrays of arrays.

BTW, I tried a similar experiment to yours, except with int[]'s instead of Integer[]'s and my results were similar, except the jump for me came between array sizes of 10K and 100K (averaged over 100,000 sums), and it was only about 70% longer; not several times:For array sizes of 100, the random selection was often faster:but even averaging over 100,000 sums, the results varied so much that I wouldn't trust them as far as I could throw them.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here

My guess is this there is no OS paging involved, but it is all due to inbuilt processor caches. In the past I've been using some computationally intensive SW, and the size of memory caches on the CPU (these "Level 1" and "Level 2" caches, sized in kBs and MBs) made bigger difference than raw CPU speed. This was several years ago, though, I don't know how these CPU caches of the past evolved since that time.

Paul Clapham wrote:When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.

CPU caches have something to do with the reason, but are not themselves the reason.

The reason for this is something called striding. When you access memory in a predictable pattern, a modern CPU will use one of its units to prefetch into cache (in the background) future memory locations at the "stride" it detected. You have to access memory at a consistent stride for the CPU to recognize this. A primitive array will generally get a continuous block of memory (well at least virtual memory), so if you walk it's indexes sequentially, the CPU will recognize the stride and prefetch future memory locations into its memory cache.

The reason you don't see this effect on small array sizes is because the whole array lives on one memory page, so the random vs sequential no longer matters much.

As per my knowledge, Array elements are stored consecutively/sequentially in memory. and I think accessing next element in sequence takes less time than accessing random. (i.e if array is on disk, accessing randomly can lead to disk operations overhead).