targetsmart has asked for the
wisdom of the Perl Monks concerning the following question:

Is there any very good technical reason behind starting the array index at 0, why not at 1??.
Pardon me if this seems to be a simple question to you.

UPDATE
in Perl the $[ variable helps us to set the first element in an array, and we can set the number as we wish.
but my doubt is, it makes a lot of sense to say that first index as '1', not as '0'
always first refers to '1', not '0'.

Vivek
-- 'I' am not the body, 'I' am the 'soul', which has no beginning or no end, no attachment or no aversion, nothing to attain or lose.

I suppose the "technical reason" is that it's something we inherited from C. And in C, an array index is actually an offset from the start of the array. The first element is therefore at index zero as it's offset zero elements from the start of the array.

There's no real technical reason why it should still be implemented that way. But generations of programmers expect it to work like that.

Update: Please do not be tempted to use $[ to change the index base. Really, don't do it. Just forget it exists. If you use it you'll just create a maintenance nightmare for the people who look after your program after you.

It might make sense to you, but chances are that it won't for the maintainers of your code. Arrays in most languages I know start at index 0, I am not sure going against such a common characteristic is such a good idea.

Please tell me who in India taught you how to code, or who otherwise put the idea in your head that array indecies (or anything else programming-wise) should begin with 1, so I can punch them in the neck.

There are some data structures that can be easily put into arrays, and where operations on the data structure involve arithmetics with the index. In many such cases it's easier and less hassle with zero-based indexes. (I can't think of a particular example right now, sorry. But I remember that I came across some of them)

And if you then find an algorithm where it's the other way round, you can still leave the first item empty and work as if you had 1-based indexes, with minimal overhead. Doing it the other way round (ie emulating 0-based indexes with 1-based indexes) would involve an arithmetic operation on every array access.

But in the end it's a topic where you can have very strong opinions about, and no amount of arguing will convince you in the end. Like coding style.

Update: I thought a bit more about, and came to the conclusion that more integer operations as performed by the CPU stay within range if your numbers start from 0, not from 1:

So you see that if your numbers are start from 0, more operations have identical domain (for the left operand) and codomain, at the cost of having some disallowed operations for 0 as the right operand (like 1/0, and 0**0). IMHO that's a plus for choosing 0 as the start.

It also feels nice to have the neutral element of addition inside the range.

Let's say that you have an array of 3 "things" starting at address "4". I am going to make a simplifying assumption here that each "thing" fits within the native "word size" of the hardware machine. So what is the machine address of the first thing in the array? It is address 4+0 (it is at address 4). What it the address of the 2nd thing? That is address 5=4+1, etc..

Some languages will do an index-1 for you automatically, I seem to remember that FORTRAN does this. Maybe Pascal did this also? It has been a very long time since I've used these languages and my memory is hazy.

Anyway some languages do a "-1" for you. This slows things down a very slight bit, but does it matter? Probably not in the case of a single array.

This address 4+0 idea is very powerful when working with loop constructs and has the ability to simplify the code when working with "pointers" or "references".

That's actually a very good question. The most common explanation you'll find is because the index represents an offset from some memory address, so naturally the address of the first element will be at offset 0. But there's another reason which I came across some time ago and was very well explained by some mathematician. It was something to do with boudary checks and the argument was that i<10 is better than i<=10. Unfortunately, I can't remember the details nor can I find the web page.

There are some situations where having the index start at 1 is needed like in some matrix computations if I am not mistaken. But that's not a problem in any language.

Update: It was really bugging me that I couldn't recall who put the really good range argument, so I kept looking for it and here it is:

When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at zero: an element's ordinal (subscript) equals the number of elements preceding it in the sequence. And the moral of the story is that we had better regard —after all those centuries!— zero as a most natural number.

YES! this has some mathematical justification aside from "how the machine works"!

On another point, I just finished a 'C' project yesterday and I can barf out these C style for(;;) loops with code using [$i] indicies easily.

I think the BIG point here since we are talking about Perl, is who cares? A HUGE factor in Perl is that we don't care! Or for the most part! Consider: foreach my $thing(@array){..blah..} Normally for the most part, $array[0] vs $array[1] doesn't matter because it never comes up in the code. Perl also reduces the dreaded "off-by-one" error possibility.

This argument by Dijkstra is silly. Yes, I said it. Doesn't matter who says it; if the argument is aesthetic, that's not a reason. Consider:

N1..Nn (N sub 1 to N sub n)

My notation is "nicer," therefore it's better? No! The offset argument is better for the 0 discussion, but in the end it comes down to the generally accepted culture of using 0. If you work in a vacuum, by all means set it to be whatever your favorite number is.

I don't think it's silly, rather it just moves the question a bit. As you indicate, the question is merely moved to "which is the better way of expressing a range". It seems that Djikstra prefers "0 <= i < N" to "1<= i < N+1". However, he doesn't say why that's any better. I agree that it looks better, but there's another formulation "0 < i <= N" that looks just as good. Why isn't that just as good? He leaves the question open.

Changing $[ is nearly always evil. The only other language I know which has such a bizarre mechanism is APL.

I remember back then when programming in languages such as PL/1, Algol etc., that I found it very cool to be able to declare arbitrary index ranges for arrays, so I could for example say that array indexing started at 4711, if I wanted to. I was able to choose for each array individually the way indexing worked. In practice, it turned out that nearly all arrays started at 0. A few (in particular in mathematical problems involving matrices and vectors), my arrays started at 1, simply because for some obscure reasons, mathematicians seem to prefer starting at 1 instead of 0 (or -100). Occasionally, I found it handy to have an array starting at -1. The pleasure of this flexibility comes however at the price of additional headache for the poor fellows who had to understand my code, because they now had to take into account with each array the lower bound as well.

So, if you feel so comfortable to count from one, just take a zero-based array and use element zero for storing your teabags. Fortunately, our computers nowadays have enough storage space that we don't have to worry about wasting an array element...

The only other language I know which has such a bizarre mechanism is APL.

I don't know about current versions, but, back in the Visual Basic 3 era, VB had the option base statement which could be used to change the starting index of arrays. As I recall, even VB programmers quickly figured out that using it was generally a bad idea.

Now as you say it, I darkly remember that option base already existed at the time where VB did have the vancy "visual" sticker attached and was just called "BASIC". It is interesting that the same plunder had been incorparted at least three times in history - BASIC being first, followed by APL, then Perl.

When you were in your 1st year of life, you could say that was year "0".

Things get confusing (for many people) when numbers start to get bigger: The 20th century were the years 19xx (mostly). Did the 2nd millenium end on New Year's Eve 1999 or 2000?

For mathematicans or sientists in general (as mathematics is nowadays a big part of every sience) it is quite natural to count from zero.

Programming languages were developed by sientists mostly. So there is the historic background.

About the only language coming to my mind with 1-based arrays is BASIC. But looking at the BASIC mainline nowadays (VB.NET that is), Microsoft has also evened that out. While you have the possibility to declare arrays with arbitrary lower and upper bounds, a standard array will index its elements from zero.

Also going for the "natural" argument again. As an example in Perl look at the possibility to access the last array element via $array[-1]. This seems just right, concidering that the first element is $array[0]. Conversely with $array[1] as the first element. By allowing round-robin-indexing you end up with $array[0] as the last element. Which looks weird, in my eyes at least.

Things get confusing (for many people) when numbers start to get bigger: The 20th century were the years 19xx (mostly). Did the 2 millenium end on New Year's Eve 1999 or 2000?

The latter, logically speaking. The mass media managed to convince the ignorant masses otherwise, but there you go.

It is ironic in this context to note that people would have got this right intuitively if years, and centuries, had been zero-based rather than the supposedly "natural" one-based system we have. If we'd used a zero-based system, with century 0 starting in the year 0, then the year 2000 would have represented the start of century 20, rather than the end of the 20th century. Problem solved.

Instead we have a silly situation where technically centuries are 1-based, but people constantly try to press for a 0-based system that seems more natural to them. People want the 21st century to begin with the year 00, not the year 01!

When you were in your 1st year of life, you could say that was year "0".

Actually that is very culturally determined. The daughter of my Thai girlfriend was born in 2005. This year she turned 8 years old, but got 9 candles on her birthday cake! In Thailand one gets best wishes for the next year of your life, not for the life that just passed.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

In N bits, an integer can represent 2^N indexes starting at zero. If you start at index 1, you'll only be able to access 2^N-1 indexes. Using 1-based indexes therefore requires indexes twice as big and requires more code to support the same array sizes. This was not a luxury that could be afforded.

This may not be the case anymore, but you can't ignore how it's been done for ages, and there are many other good reasons as you've seen throughout this thread.

Some languages, such as Pascal, do start with an index of '1' although for whatever reason they are in the minority. “It's just one of those things.”

In my experience, though, “arrays” are not that common. Usually you see people using either hash-tables or lists. Perl's “array,” of course, really is (in practical effect...I do not wish to slice words here...) a list.

A Perl n-D structure is not an "array", certainly not for n>1. To me an
array has a fixed, regular memory layout, like a checker or chess board. If I
am on row 4, square 6 and I want to know what row 3, square 5 contains,
I just go: left 1 and up 1 from where I am at. That's it!

A Perl LoL
(List of List), a Perl 2-D structure, doesn't work that way.
I've written FORTRAN code with 2-D arrays and some ASM code,
but never any C code yet and certainly not any Perl code. It is simply not the
way that it is done.

In Perl, every dimension until the last one is a
"reference". It works the same as 'C'. If you take a 'C' class, somewhere
along the path to the first year, you will learn that this: int x [8][8];
is total BS! There is a HUGE flaw with this because you cannot pass "x" to a
subroutine! How big is it? What do I do? The answer to this is similar to
how Perl does it. The first dimension is a list of pointers to the 2nd
dimension. In the case of a 2-D array, you have to allocate memory
for the list of pointers to lists and also for the "rows" themselves and
it's a pretty huge hassle!

Anyway what you wind up with is a "list" of pointers to
"lists". Now I can give you "x" and tell you to add say 5 to every element in
this structure. I don't have to tell you how many rows there are, I don't have to tell
you how many columns there are (and they may even vary between rows).

Perl
automates a lot of this "grunt work". A Perl 2-D structure is not an array. It is a
list of lists.

let's look at it this way...
we can access an array both ways, from the end as well as from the start ....accessing the array from the end you use -x where x is any number that corresponds to the array element position that we wanna capture from the end.
so, look at this line now and tell me if 0 was not there will it not have been bizarre that we are interfering with laws of Algebra???

...you find that 0 is central to preserve and make intact your brains from having to dare to change the $[ or having to think where the central "Head" of the array is? since the "First in everything is the head" and a first element in an array is its head, and a first number in the one the makes both sides of the sign equated is the big round fat 0

Actually, you bring up points that lead me to prefer that Perl index such things starting at 1. I understand the original quoted justification for using 0. And it makes sense particularly in C++, where we have pointers to arrays and pointer arithmetic and "beg <= i < end" ranges.

We don't have any of those things in Perl, so I find the argument pretty pointless when applied to Perl.

But there are several relevant things we have in Perl. 0 is false. $x[-1] is the last element of an array. $x[-0] can't sanely be made to be the last element of an array. By starting at 0, index() fails by returning a true value and is forced to return "0 but true" for one success case. I'd be happy for $x[undef] to return undef, preserving the undefinedness. This becomes easy if $x[0] is likewise 'reserved'.

Give a child a handful of items and ask them to hand them back to you, numbering them as they go. They won't number them from zero to $N-1. The first item will be number 1, of course. That is why we call it the "first" item, the "1st" item, not the zeroth item. "zeroth" is a word you almost never hear from ordinary citizens.

I expect to hear "zeroth" used in connection with unbounded sequences, for example: "the zeroth power of 2". Perl arrays are not unbounded. They are indexed with values from 0..$#array and -@array..-1. Ugh. It would be so much nicer to have them indexed with values from 1..@array and -@array..-1. I would also appreciate the separation between those ranges so that an off-by-one error in one range can't accidentally land you in the other range.

I appreciate the elegance with which K&R resolved the long-running argument for the C language, by defining array[i] as a syntactic alias for *(array+i) and thus making array indices unambiguously offsets and therefore something that must start from 0.

I appreciate the elegance of "beg <= i < end" range definitions. I've used them quite a bit in C++. I've even used them in Perl code. But part of the point of such things is that 'end' is defined as 'the mythical item just off the end of our list'. $N+1 makes perfect sense as that. So I find "1 <= i < N+1" an even clearer representation of such ranges. It even leads to more regularity in defining such ranges, because they all become "start <= i < start+length" (which might lead you to write "1 <= i < 1+N").

So I've long found the arguments much more convincing for starting from 1 in Perl. Perl isn't glorified assembly (how I've heard many people refer to C) which is why it doesn't have pointer arithmetic which is also why it should number things like real humans do, not like how machine language would.

As to "zero as a most natural number", I think that belittles the fundamental innovation that was its invention/discovery long after 1 and its followers were being heavily used.

But all of this is pretty academic (as in "useless") in relation to Perl. Perl long ago standardized on numbering things starting from 0. That decision is not easily undone (or redone).

For better or worse "offset" was spelled "index", and thus the source of your confusion. The fact of the matter is that we are stuck with "index" whether it makes sense or not and whether we like it or not. Just think "offset" when you see "index". And never use the $[ trick!!!

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other