skef@SPICE.CS.CMU.EDU (Skef Wholey) (03/03/88)
In a couple of recent articles, people have asserted that zero-based arrays are somehow more burdensome and less natural to use than one-based arrays: >From: sommar@enea.se (Erland Sommarskog) >[...] > The idea of having all arrays starting on index zero is really to >put the burden on the programmer. If Wirth really must have a fixed >lower index, he could at least have chosen one! How many of you do >often use arrays with zero as the lower bound? I do it very seldom. >And his argument about saving execution time for index computation >is truely naive. The result will just be references like: > String(char_no + 1), >just making the code more verbose and complex for no use. >[...] >From: pattis@june.cs.washington.edu (Richard Pattis) > >I too detest 0-based arrays (and think 1-based arrays are better). In trying >to rationalize this choice, I came up the the following argument: When we >process arrays, we often need to know the size of the array and/or the >position of the last element. In 0-based arrays, these quantites are off by >one; in 1-based arrays they are the same. Hence, I think that people will >make fewer errors with 1-based arrays. Here I argue the opposite: that zero-based arrays are God's Gift To Programmers, and that one-based arrays are Satan's Way Of Complicating Code. I find that add-or-subtract-one code is more likely to arise in situations in which one-based arrays are used. I find that zero-based arrays lead to simpler code. Consider the task of processing subsequences of a one-dimensional array. There are two obvious ways to describe a subsequence: with a start and an end, or with a start and a length. Consider now two problems: iterating over a subsequence (while correctly dealing with zero length subseqences), and moving from one subsequence to the next. The notion of subsequence in Common Lisp (which, like C, provides only zero-based arrays) is that the start index specifies the first element of a subsequence and the end index specifies the index *one-beyond* the last element of the subsequence. Given the length and start of a subsequence, one computes the end simply by adding (i.e., End = Start + Length). Likewise, the length of a subsequence can be computing by subtracting the start from the end (i.e., Length = End - Start). A zero-length sequence has Start = End. 0 1 2 3 4 5 6 7 8 +-+-+-+-+-+-+-+-+-+ |X|X|X| | | | | | | +-+-+-+-+-+-+-+-+-+ ^ ^ | | start end The subsequence of three X's has a start of 0 and an end of 3. Its length is 3 - 0 = 3. The length of this array is 9 -- it has nine elements; as a subseqeunce it can be described as having start = 0 and end = 9 (or length = 9). The Common Lisp Make-Array function uses this number, and the Array-Length function will return this number. Likewise the declared size in C would be this number, and sizeof will return something proportional to it. Within this beautifully consistent end-exclusive model, *none* of these quantities are "off by one." To frob the elements of a subsequence of this sort, we do something like: (do ((i start (1+ i))) ; Do for I beginning at Start, adding one each time ((= i end)) ; Until I = End (frob (aref array i))) ; Frobbing the I'th element of the Array or (do ((i1 start (1+ i)) (i2 0 (1+ i))) ; I2 tells us where we are relative to the start ((= i2 length)) (frob (aref array i1) (aref another-array i2))) The first loop says, in non-Lispy terms: For I from Start (inclusive) to End (exclusive) Do Frob(Array[I]); The end test must always come before the body of the loop so that zero-length subsequences are processed correctly. The subsequence following the one defined by the interval [start,end) is [end,new-end). (do ((i old-end (1+ i))) ((= i new-end)) ...) The *only* +1's here are in the iteration steps; there are *none* in the initialization or termination clauses. The start-inclusive end-exclusive property is captured in the simple iteration construct Dotimes. For example, (dotimes (i 4) ...) executes its body four times, for I=0, I=1, I=2, and I=3. Now, one can of course use the end-exclusive model with one-based arrays, but then the subsequence containing the whole array does *not* have an end that is the highest index of the array; it is one more than that index. So when using one-based arrays people usually use an end-INclusive model, which also leads to nastiness. The Length of a subsequence is now End - Start + 1. To represent a zero-length sequence the end index must be one LESS than the start index. (do ((i start (1+ i))) ((= i (1+ end))) (frob (oref array i))) ; "Oref" for "One-based Aref" or (do ((i1 start (1+ i)) (i2 1 (1+ i))) ; I2 tells us where we are relative to the start ((= i2 (1+ length))) (frob (oref array i1) (oref another-array i2))) What was a clean, crisp, equality test with the end index in the first loop above is now a sloppy-looking test against the end plus one. What was a comparison with the length is now a comparison with the length plus one. And a following subsequence starts at one MORE then the end of the current subsequence. (do ((i (1+ old-end) (1+ i)) ((< i new-end)) ...) So the +1's suddenly start creeping into the code, into the initialization or termination clauses. I've written tens of thousands of lines of code in Common Lisp and C, and after I understood the beauty and just plain right-ness of the end-exclusive model, zero-based arrays became a joy to use. I seldom, if ever, need to adjust starts, ends, lengths, or indices by one. But I had to do that kind of stuff frequently when using one-based arrays. The more random incrementing and decrementing you do, the more fence-post errors your code will have. Remember: Zero is your friend. --Skef (Preferred mail address: Wholey@C.CS.CMU.EDU)
barmar@think.COM (Barry Margolin) (03/03/88)
In article <1012@PT.CS.CMU.EDU> skef@SPICE.CS.CMU.EDU (Skef Wholey) writes: >In a couple of recent articles, people have asserted that zero-based arrays >are somehow more burdensome and less natural to use than one-based arrays: [references deleted] >Here I argue the opposite: that zero-based arrays are God's Gift To >Programmers, and that one-based arrays are Satan's Way Of Complicating Code. I've used both PL/I (which allows arbitrary array indices, but defaults to 1-based, and only has 1-based strings, and Lisp, which only has 0-based arrays. It hardly needs to be said that the best thing is to allow the programmer to specify the index base. I've had my share of fencepost errors in both languages, but I think I prefer 1-based arrays. >Now, one can of course use the end-exclusive model with one-based arrays, but >then the subsequence containing the whole array does *not* have an end that is >the highest index of the array; it is one more than that index. That's true for 0-based arrays, too. For a 10-element 0-based array, the end-exclusive subsequence is [0,10), but the highest index of the array is 9. If it is 1-based, the end-exclusive subsequence is [1,11), and the highest index is 10. What's the difference? > So when using >one-based arrays people usually use an end-INclusive model, which also leads >to nastiness. The Length of a subsequence is now End - Start + 1. To >represent a zero-length sequence the end index must be one LESS than the start >index. Yes, I admit that arrays declared (1 .. 0) seem strange, and I think some languages don't even allow it. > > (do ((i start (1+ i))) > ((= i (1+ end))) > (frob (oref array i))) ; "Oref" for "One-based Aref" actually, we generally used (> i end) as our end test. Or we use traditional FROM-TO loops, as in the PL/I do i = start to end; frob (array(i)); end; In your posting you made mention of something like this when you trandlated your Lisp DO to pseudo-Pascal: > For I from Start (inclusive) to End (exclusive) Do > Frob(Array[I]); Most languages don't have an FOR-loop that is exclusive of the End (Zetalisp LOOP is an exception, with its FOR <index> FROM <start> BELOW <end> option); the End is usually inclusive, which works perfectly when you use the end-inclusive model of array subranges. For example, to loop over an entire array in PL/I you simply do: do i = lbound(array,1) to hbound(array,1); frob (array(i)); end; This code is actually independent of the base of the array, because it uses the LBOUND (Lower BOUND) and HBOUND (Higher BOUND) builtin functions to determine the actual index range (the second argument is the dimension whose bound should be returned). >And a following subsequence starts at one MORE then the end of the current >subsequence. > > (do ((i (1+ old-end) (1+ i)) > ((< i new-end)) > ...) > >So the +1's suddenly start creeping into the code, into the initialization or >termination clauses. Well, in PL/I, at the end of a "do i = start to end" loop, the variable i is left with the value end+1. So, it is common to continue with: do i = i to new-end;... >I've written tens of thousands of lines of code in Common Lisp and C, and >after I understood the beauty and just plain right-ness of the end-exclusive >model, zero-based arrays became a joy to use. I seldom, if ever, need to >adjust starts, ends, lengths, or indices by one. But I had to do that kind of >stuff frequently when using one-based arrays. The more random incrementing >and decrementing you do, the more fence-post errors your code will have. About the only time in PL/I where you have to add 1 is when going from loops like the above to calls to the SUBSTR builtin, which extracts a portion of a string. This function takes start and length rather than start and end. As I pointed out above, there's no real reason why end-inclusive subranges and 1-based arrays must be related. By the way, Lisp isn't even consistent about whether it is 1-based or 0-based. (FIRST <list>) is equivalent to (NTH 0 <list>), (SECOND <list>) is (NTH 1 <list>), etc. Mind you, I'm not complaining; the ordinal functions are intended for those cases where the programmer knows the position a priori, and we humans tend to think 1-based. Barry Margolin Thinking Machines Corp. barmar@think.com uunet!think!barmar
eugene@eos.UUCP (Eugene Miya) (03/04/88)
Given the choice of a 0 or 1 start, I now prefer 0, there are a few applications where it is handy and you just make the array +1 larger, or ignore element 0. but this is a cludge as well. Actually, I think that Pascal and Ada handle this problem even better with subranges and enumerations, but this is my quirk, Pascal's problem is those stupid pred() and succ() operators. Yes, range checking can be a pain (read "reduced performance"). Another problem is how to handle subarrays. Implementation is row or column major (See comp.lang.fortran) and this is an unfortunate detail. There are other array tricks coming down the pipe (read Fortran 8X) but when it boils down to either 1 based or 0 based are both arbitrary (ridiculous since either side can find counter arguments). From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
skef@SPICE.CS.CMU.EDU (Skef Wholey) (03/04/88)
>From: barmar@think.COM (Barry Margolin) >[...] >I've used both PL/I (which allows arbitrary array indices, but >defaults to 1-based, and only has 1-based strings, and Lisp, which >only has 0-based arrays. It hardly needs to be said that the best >thing is to allow the programmer to specify the index base. I've had >my share of fencepost errors in both languages, but I think I prefer >1-based arrays. I can't contest your preference. The articles I was responding to, though, asserted that zero-based arrays had the "obvious" disadvantage that things were "off by one a lot." I hope I've shown that this isn't really the case. You (Barmar) bring up the good point that if you've got the right set of iteration constructs, one-based arrays can be dealt with as cleanly as zero-based arrays. I agree with this, and almost said as much in my previous post. In Lisp the user can define these things as macros if they aren't provided (as by Zetalisp's LOOP). PL/I has these things, but then, what doesn't it have? This zero-based/one-based discussion arose from the description of the Oberon programming language. Oberon, like Pascal and Modula-2 and even C, provides a few iteration constructs and no practical, general, way of defining more (the C preprocessor leaves much to be desired as a macro-writing language). So, given these restrictions, I would say that if you're going to have a limited number of iteration constructs AND fixed-based arrays, the iteration constructs should be chosen to reflect the base (be it zero or one) of your arrays. (To put it another way: If your language is going to put you in a straightjacket, it should at least be a comfortable one.) >> (do ((i start (1+ i))) >> ((= i (1+ end))) >> (frob (oref array i))) ; "Oref" for "One-based Aref" > >actually, we generally used (> i end) as our end test. Yeah, the "system" code I write usually does that -- you can't trust lusers not to pass you bad indices. But my personal taste is that = looks much nicer than >. No biggie. On moving on to the next subsequence: >... in PL/I, at the end of a "do i = start to end" loop, the >variable i is left with the value end+1. So, it is common to continue >with: > > do i = i to new-end;... Yes, it's nice when the value of a loop index variable is defined after the loop. Unfortunately, not all languages do that, presumably to give the implementor more leeway in the name of "efficiency." (If a compiler unrolls the loop, for example, it would still have to assign something to the iteration variable afterwards.) >About the only time in PL/I where you have to add 1 is when going from >loops like the above to calls to the SUBSTR builtin, which extracts a >portion of a string. This function takes start and length rather than >start and end. Is that the only builtin that uses lengths? That seems like a kind of odd wart on the language, if so. If not, I'd argue as before that the end-exclusive property Length = End - Start is something worth trying to preserve. >As I pointed out above, there's no real reason why end-inclusive >subranges and 1-based arrays must be related. Yeah, I agree, given that your iteration constructs are flexible enough, or that you can roll your own. I'll make the vague and unsupported claim that people usually manipulate 1-based arrays using an end-inclusive model, though. The articles which prompted me to post here seemed to be written by people thinking along those lines. >... we humans tend to think 1-based. Hmm. I think I think zero-based. I don't know if that comes from having taken too many math courses, or from dealing with icky innards of low-level software (computer memory is, after all, zero-based). I guess you could argue that it took thousands of years of counting things before people discovered zero, so it can't be all THAT natural. Maybe one-based counting IS "natural," but that doesn't necessarily make it "right." --Skef (Preferred mail address: Wholey@C.CS.CMU.EDU)
turner@sdti.UUCP (Prescott K. Turner) (03/05/88)
As someone said, people do 1-based counting. (Sorry, I can't find the reference to quote.) A child learns to count by pointing at the objects and reciting the numbers, as "one, two, three". The counting process naturally labels the objects in a one-based fashion. Are any of you zero-based partisans teaching your children to recite "zero, one, two" and then deduce the cardinality to be three? I have seen pros and cons to zero-based arrays, and don't really mind them in a programming language. What I do mind is when it spills over into user interfaces and even human discourse. When the cursor is blinking in the first column of my screen, my text editor should report the column number as 1, not 0. (I know, my text editor is wishing it could put the cursor just to the left of the first column, not smack on top of the first column, but when it resorts to the latter real problems result.) And in Dijkstra's letter in the June 1987 CACM, he numbers his points and his references starting with 0. I find this annoying and wrong-headed. Since computer programs must relate to humans at some point, it is desirable to stick with the human convention that when labeling things by counting, you start with 1. And this applies to arrays. -- Prescott K. Turner, Jr. Software Development Technologies, Inc. 375 Dutton Rd., Sudbury, MA 01776 USA (617) 443-5779 UUCP:necntc!necis!mrst!sdti!turner
g-rh@cca.CCA.COM (Richard Harter) (03/07/88)
Most of the fuss and confusion comes about because there are three different kinds of 'numbers' involved: cardinals, ordinals, and displacement measures. A cardinal number is the measure of the size, e.g. this array has five elements. An ordinal number is the count sequence, e.g. this is the 1st element, ... this is the 5'th element. A displacement measure is the distance between an element and the base, e.g. this element is displaced four elements from the base. Ordinals match cardinals, e.g. if a sequence has five elements, the last ordinal corresponds to the cardinal number of the sequence. There are fundamental reasons for not mucking with the definition of ordinal, which I won't go into. Displacement measures are more useful than ordinals in programming. The fundamental reason is that names in programming specify addresses, i.e a[i] is the item located at address 'a[i]'. Regardless of the language array basing index, 'i' is used to determine a displacement from the array address. The fencepost problem is intrinsic -- in a sequence containing 'n' items the displacement between the first and the last element in the sequence is 'n-1'. This fact is not altered by the array basing scheme used. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
faustus@ic.Berkeley.EDU (Wayne A. Christopher) (03/08/88)
If you're looking for mathematical justifications for 0-based or 1-based arithmetic, remember that the usual definition of n in set theory is { n-1, n-2, ... 1, 0 }. Thus by analogy an array of n elements contains elements n-1, n-2, ... 1, 0. Really it's a matter of taste, like hex vs octal. Wayne
eric@snark.UUCP (Eric S. Raymond) (03/09/88)
In article <213@sdti.UUCP> Prescott K. Turner, Jr. writes: >And in Dijkstra's letter in the June 1987 CACM, he numbers his points and >his references starting with 0. I find this annoying and wrong-headed. I agree that it's usually appropriate to have lists for humans start with 1, and that Dijkstra was being a tad precious (this sort of behavior is an every once in a while requirement for continued perks and status in the academic bondage-and-discipline school of language design, doncha know). However: there is one sort of list-for-humans where zero-basing is appropriate, in my opinion -- a list that refers to itself. Consider the following examples: ------------------------------------------------------------------------------ This distribution package contains 3 files: 0. This READ.ME 1. t&a.txt -- phone numbers and vital statistics of the Dallas Cowboys' cheerleading squad. 2. hunk.txt -- phone numbers and vital statistics of the Dallas Cowboys. ------------------------------------------------------------------------------ or ------------------------------------------------------------------------------ To install the superduper thingy metacompiler, do the following: 0. Read this file carefully. 1. Insert a peanut butter sandwich in your drive slot. 2. Wrap a maypole with the distribution tape. ------------------------------------------------------------------------------ Also, sometimes a list of instructions will start with one that has sort of a background prep or general-precondition character, as opposed to the remainder which are specific well-defined actions. For example: ------------------------------------------------------------------------------ Instructions for going over Niagara Falls in a barrel: 0. Make sure you really want to do this. 1. Procure a barrel. 2. Pad the insides generously. (etc.) ------------------------------------------------------------------------------ In these particular sorts of cases, I find zero-indexing a useful cue. {Mild apologies for the silliness of the examples. I just stumbled out of bed and haven't showered or breakfasted yet. And BTW I am *not* a football fan...} -- Eric S. Raymond (the mad mastermind of TMN-Netnews) UUCP: {{seismo,ihnp4,rutgers}!cbmvax,sdcrdcf!burdvax,vu-vlsi}!snark!eric Post: 22 South Warren Avenue, Malvern, PA 19355 Phone: (215)-296-5718