As I was programming, I haven't seen an instance where an array is better for storing information than another form thereof. I had indeed figured the added "features" in programming languages had improved upon this and by that replaced them. I see now that they aren't replaced but rather given new life, so to speak.

So, basically, what's the point of using arrays?

This is not so much why do we use arrays from a computer standpoint, but rather why would we use arrays from a programming standpoint (a subtle difference). What the computer does with the array was not the point of the question.

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

1

Why not considering what the computer does with array? We have a house numbering system because we have STRAIGHT streets. So is it for arrays.
–
lcnAug 28 '13 at 5:04

28 Answers
28

Maybe just because they are the first thing comes to mind when we want to store a "collection" of items in a "set".

Maybe they are the oldest structure in programming languages to store a data collection, in a naturally sorted way (i.e 1-1 correspondence with positive integers).

All other structures are some forms and diversions of arrays.

If your question is why arrays are the fist thing in mind when we want to store a collection?
The question may change to "why humans count?", "How we categorise?", "Is there a definition for a set?" etc.

Time to go back in time for a lesson. While we don't think about these things much in our fancy managed languages today, they are built on the same foundation, so let's look at how memory is managed in C.

Before I dive in, a quick explanation of what the term "pointer" means. A pointer is simply a variable that "points" to a location in memory. It doesn't contain the actual value at this area of memory, it contains the memory address to it. Think of a block of memory as a mailbox. The pointer would be the address to that mailbox.

In C, an array is simply a pointer with an offset, the offset specifies how far in memory to look. This provides O(1) access time.

MyArray [5]
^ ^
Pointer Offset

All other data structures either build upon this, or do not use adjacent memory for storage, resulting in poor random access look up time (Though there are other benefits to not using sequential memory).

For example, let's say we have an array with 6 numbers (6,4,2,3,1,5) in it, in memory it would look like this:

Because we can directly access any element in the array by adding the offset to the pointer, we can look up any element in the same amount of time, regardless of the size of the array. This means that getting MyArray[1000] would take the same amount of time as getting MyArray[5].

An alternative data structure is a linked list. This is a linear list of pointers, each pointing to the next node

Note that I made each "node" into its own block. This is because they are not guaranteed to be (and most likely won't be) adjacent in memory.

If I want to access P3, I can't directly access it, because I don't know where it is in memory. All I know is where the root (P1) is, so instead I have to start at P1, and follow each pointer to the desired node.

This is a O(N) look up time (The look up cost increases as each element is added). It is much more expensive to get to P1000 compared to getting to P4.

Higher level data structures, such as hashtables, stacks and queues, all may use an array (or multiple arrays) internally, while Linked Lists and Binary Trees usually use nodes and pointers.

You might wonder why anyone would use a data structure that requires linear traversal to look up a value instead of just using an array, but they have their uses.

Take our array again. This time, I want to find the array element that holds the value '5'.

When data is inserted into a binary tree, it uses several rules to decide where to place the new node. The basic concept is that if the new value is greater than the parents, it inserts it to the left, if it is lower, it inserts it to the right.

When searching a binary tree for the value of 75, we only need to visit 3 nodes ( O(log N) ) because of this structure:

Is 75 less than 100? Look at Right Node

Is 75 greater than 50? Look at Left Node

There is the 75!

Even though there is 5 nodes in our tree, we did not need to look at the remaining two, because we knew that they (and their children) could not possibly contain the value we were looking for. This gives us a search time that at worst case means we have to visit every node, but in the best case we only have to visit a small portion of the nodes.

That is where arrays get beat, they provide a constant O(N) search time, despite O(1) access time.

This is an incredibly high level overview on data structures in memory, skipping over a lot of details, but hopefully it illustrates an array's strength and weakness compared to other data structures.

Your C example is off by one, MyArray[4] should be pointing to the 5th element in your diagram, not the 4th element because arrays are zero-indexed in C. Otherwise, very nice explanation.
–
Robert GambleDec 25 '08 at 6:14

It's so sad you don't get reputation from this answer, I've up-voted some other of your answers as well.
–
lubos haskoDec 25 '08 at 7:14

43

This is what bugs me about "community wiki" this post is worth "proper" rep
–
QuibblesomeDec 25 '08 at 18:18

6

Nice answer. But the tree you describe is a binary search tree - a binary tree is just a tree where every node has at most two children. You can have a binary tree with the elements in any order. The binary search tree is organized as you describe.
–
gnudJan 2 '09 at 20:37

On which point? What is O(1)? What is random access? Why can't it be beaten? Another point?
–
jasonDec 25 '08 at 1:24

2

O(1) means constant time, for example if you want to get the n-esim element of an array, you just access it directly through its indexer (array[n-1]), with a linked list for example, you have to find the head, and then go to the next node sequentially n-1 times which is O(n), linear time.
–
CMSDec 25 '08 at 2:04

7

Big-O notation describes how the speed of an algorithm varies based on the size of its input. An O(n) algorithm will take twiceish as long to run with twice as many items and 8ish times as long to run with 8 times as many items. In other words the speed of an O(n) algorithm varies with the [cont...]
–
GarethDec 25 '08 at 2:06

7

size of its input. O(1) implies that the size of the input ('n') doesn't factor into the speed of the algorithm, it's a constant speed regardless of the input size
–
GarethDec 25 '08 at 2:07

This is usually the answer why various language features exist. Arrays are a core computer science concept. Replacing arrays with lists/matrices/vectors/whatever advanced data structure would severely impact performance, and be downright impracticable in a number of systems. There are any number of cases where using one of these "advanced" data collection objects should be used because of the program in question.

In business programming (which most of us do), we can target hardware that is relatively powerful. Using a List in C# or Vector in Java is the right choice to make in these situations because these structures allow the developer to accomplish the goals faster, which in turn allows this type of software to be more featured.

When writing embedded software or an operating system an array may often be the better choice. While an array offers less functionality, it takes up less RAM, and the compiler can optimize code more efficiently for look-ups into arrays.

I am sure I am leaving out a number of the benefits for these cases, but I hope you get the point.

Ironically, in Java you should use an ArrayList (or a LinkedList) instead of a Vector. This is to do with a vector being synchronised which is usually unnecessary overhead.
–
ashirleyJan 5 '09 at 11:02

Arrays are fundamental data structures that are useful for building many different abstract data types. Even if your language provides you with structures such as stacks, queues, lists, etc. they may internally use arrays to implement these structures.

There are two kinds of programming (well, ok, lots of kinds, but two that matter in this case): high-performance programming and regular programming.

In the high-performance case you need to know what kind of memory access you will be using and you care significantly about things like the cpu's cache and things like that. In this case you will often use arrays directly; this is the case when you are doing things like scientific computing, or developing basic features such as a generic collection implementation.

In the normal case, such as most business programming or web programming, or much programming that takes place in a VM, your main focus will be on application correctness and overall performance, often including things like database access. In this case you usually won't use an array directly, and instead should use a container such as Java's List. Now, in Java, the List is actually just an interface and you can choose from several Lists, including the ArrayList, or the LinkedList. But your code doesn't care which kind you use, and as the programmer you only have to decide up front to use one or the other. The main body of code that does all the work won't know if you chose Linked or Array. And in many many cases it won't even matter, since it's very common for lists to be only appended or only iterated. You might also use Maps, which are conceptually like arrays but the keys are anything instead of just sequential numbers.

Many of the other answers to this question say "how would you implement Strings without Arrays?!" or things like that. But Strings need not be arrays; in Java they are not (well, more precisely you don't have access to the character array). Sure, underneath the hood there's an array, but the business-logic programmer doesn't need to care. Only the guy at Sun who writes the String class cares.

Which guy are you? I am the business programmer, so I rarely use arrays. Most of the time I use arrays because Java uses arrays for some of its API calls or because I have a function that returns two values and I can't be bothered to create a class or a map for it. Or I need a temporary container for some bytes read from some stream. But 99.9% of my code is array-free.

Say you have a series of buckets each tied to the next by a piece of rope, and you're holding onto a piece of rope attached to the first bucket, but you want the contents of the 42nd bucket. You'll have to follow the ropes to 42 buckets before you get to the one you want.

This is like a "linked list". The buckets are memory locations that store the value, and the pieces of rope are the pointers to the next bucket. The lookup time for a random access like this is considered O(N) because it takes on the order of N "operations" to get there. As the size of the list increases, so does the lookup time linearly (i.e. linear time).

Now say you have a series of buckets spaced exactly 1 foot apart, as well as a really long ruler with markings every 1 foot, and you want to get to the 42nd bucket. Just go directly to the spot on the ruler marked 42 and you're there!

That's like an "array". Again, the buckets are the memory locations containing the values, but this time since they're in a straight line evenly spaced you can just jump directly to the offset (think of the memory addresses as the "ruler"). The lookup time for a random access is much faster for lots of buckets, only taking a constant number of operations (jumping directly to the right spot), called O(1). As the size of the array increases, the lookup time stays constant (i.e. constant time)

I think that is an oversimplification to the list concept. Sure, you can implement a list which does a sequential traversal to the ith element, but it is not necessarily the case. You could have a tree structure for your list such that you have log search instead of a linear search; or you could have forward pointers that jump at arbitrary levels (say each node has a pointer to adjacent nodes and to nodes 8 indexes away).
–
MPavlakAug 30 '12 at 15:32

msdn.microsoft.com/en-us/library/0ebtbkkc.aspx Where are you getting O(n) anyway? Sure, we can talk about super basic concepts and try and compare structures, but I think framework implementation actually matters more than theoretical, basic discussion.
–
MPavlakAug 30 '12 at 15:45

@MPavlak O(n) is the random access time for a linked list. If .NET's List has O(1) random access time, then it must be implemented with an array, not a linked list.
–
Max NanasyJun 11 '13 at 21:01

@MaxNanasy I think the issue is what a linked list can be. There is no reason you cannot have forward pointers or other things helping the linked list operate faster (dictionary at the head). If you talk about a list which only has a pointer to the next node, then yes, that would have O(n), but does is not a requirement on linked list.
–
MPavlakJun 13 '13 at 18:59

I'd take a guess as to not too often for "most" people. I also think that it wasn't so much of a "You are going to need this" as "See the value of the array in terms of speed."
–
XesanielDec 26 '08 at 4:44

6

@Mamut - Are you asking how often do you need efficiency? I personally like it all the time. I wish more programmers did.
–
bruceatkJan 1 '09 at 15:15

As some other people said, Strings are usually arrays. Another important use of arrays is cache locality--modern processors use a cache, that is, a special kind of memory much smaller than the main ram in the computer. Since arrays are adjacent in memory, they can completely loaded into the cache, unlike many of the linked structures. This makes them much faster in practice, because walking across elements in a linked list can result in cache misses (accessing data not in the cache), which is hundreds or thousands of times slower.

A main feature of the different types of collection is their asymptotic performance, the cost in terms of cpu cycles or memory consumed as the number of elements increases. Almost as important is the marginal cost when the size of the collection is very small.

At the large end it is very difficult to know before hand which type of structure will work best. In the case that your application is mostly doing reads or replaces, arrays often win, but when your application has to do a lot of inserts, linked lists or balanced trees are often preferable. In either case, no structure has better performance in terms of memory than arrays, because the only data stored is the values, not metadata related to the structure. This can be dominant when your application is very IO or cache dependant.

On the other side, when the size of the collection is small, arrays are often a clear winner because there is very little overhead for most operations. Linear scans can even be faster than binary searches in sorted structures because the entire collection might fit in a single cache entry.

These days it often makes sense to start by choosing any structure without regard to performance in the initial development phase. After the application is beginning to mature and you (or your customer) is starting to experience bottlenecks, then you can try to optimize your data structures, with the help of a profiler and a real world workload.

Some platforms, mobile applications using J2ME and BREW for example, only have array access due to hardware limitations. No STL or Container classes available, so you end up having to create your own data structures using arrays.

Arrays are used everywhere in computer science. What are you comparing an array to? A linked list? Arrays have much faster random access speeds than linked lists, especially as the lists get long. They also make much more efficient use of memory.

How else would you store a "list" of 10 numbers where access time is important?

I.E. I need to read the n-h element of my list and write the n+1-th element several thousand times per second?

Also, the natural layout of system memory lends itself well to arrays of values which are directly adjacent in a contgiuous block of memory. "Instances" and such tend to be allocated individually, and allocating many small pieces of memory typically wastes memory (since the allocation is often word or cache line aligned, where cache lines can be 128 bytes or more) and causes problems for the allocator when it needs to allocate a large chunk because of "fragmentation".

If your application processes data sequentially, then an array places data into sequential memory addresses which guarantees decent caching performance. The difference in speed between iterating through the elements of a linked list (whose elements may be scattered around in memory) and an array can be a factor of 10-20x.

Taking a single language as an example, in C++, why would you ever use an array instead of an std::vector?

Sure, the vector, after compilation, will be an array perhaps with various guarantees, checks and balances, but that's how I understand Xesaniel's question.

He's not asking why you would ever use an array in the underlying implementation, but why would you ever use an array up at the top abstraction level when writing your program?

Why not always at least use an std::vector if you are in C++? In some other language use their resources for variably managed collections. If your language doesn't provide an abstraction like that, then write one and use it. In the case of an std::vector, performance isn't an issue because when all's said and done, your compilation will contain an array.

Again, it will "be" an array at the end of its compilation lifecycle, but I think you should always use an abstraction level appropriate for what you are doing. In other words, get in the habit of using something other than an array.

Besides performance, etc., one thing that makes arrays indispensable is the fact that all languages (I don't know any which does not) have an implementation of arrays. Thus from an interoperability stand point you can not beat arrays. Most public webservices use arrays to return sets of data as it's something almost any language can understand.

Binary trees sacrifice insertion speed for search speed. If you insert sorted data into a binary tree, you end up with a linked list. That is why real implementations of binary tree's are typically Red/Black Tree's. These balance while inserting to keep lookup optimal.
–
FlySwatDec 27 '08 at 0:26

Arrays of value-type elements in .Net allow those elements to be accessed by reference (e.g. arrayOfPoints[index].x = 5). There is no way for any other type of container of value-type elements to allow such access.