karl@haddock.UUCP (Karl Heuer) (06/09/87)
In article <20540@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: >[haddock!karl (Karl Heuer) writes:] >>Given the suggested "pointer-like" semantics for arithmetic, and the >>ability to declare an array indexed by enums (or other types!), I could >>find more use for them. >>int hue[enum color] = { RED: 0x00F, GREEN: 0x0F0, BLUE: 0xF00 }; /* I wish */ > >I agree that it would be nice to support declaring arrays indexed by >"enum"s. (What "other types" would be indices? "char", presumably; >the only other types I could see would be subrange types - are you >proposing adding them to C?) Maybe someday -- but no, that wasn't what I was thinking. Some machines can support a[short] with no trouble, but that would raise the question of which types a conforming implementation is *required* to support (and hence can be used in a portable program)... >One interesting possibility is suggested by the fact that the set of >numerical encodings of "enum" values need not be "dense", and thus that > enum cookie { chocolate_chip = 1, oreo = 10, oatmeal = 100 }; > int price[enum cookie]; >might cause "price" to be a sparse array, with lookups in that array >done by some technique other than indexing. This is analogous to >"switch" statements; the compiler selects different implementations >based on the set of values used, with indexing used for a >sufficiently "dense" set, various combinations of hashing and >table-search used for less-"dense" sets, and a cascade of comparisons >used for other sets. I glad you posted this -- I was going to mention the same idea myself, but I prefer the feedback so I can tell someone is listening to my ravings. :-) In your example, the implementation might choose to allocate three ints and use a cascade of comparisons (actually, one comparison against "oreo" would suffice). Now, we could assert that a conforming implementation must support *all* types as subscripts, and that "int a[int]" will, if necessary, cause some sort of "smart" lookup. The problem now is that the amount of space required is not known at compile time. If the compiler is going to use a hash table, it can probably make good use of a "hint" from the programmer. (Which probably means another new keyword. Ugh.) Incidentally, I am not suggesting that this feature be incorporated into the new standard -- I realize there's very little hope for that. However, I *do* believe it is a good idea (at least for enum), and would like to see it implemented as an extension. If it catches on, it may be a candidate for the __STDC__==2 release. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
rwa@auvax.UUCP (Ross Alexander) (06/12/87)
In article <526@haddock.UUCP>, karl@haddock.UUCP (Karl Heuer) writes: > Some machines can support a[short] with no trouble, but that would raise > the question of which types a conforming implementation is *required* > to support (and hence can be used in a portable program) [...] and a little further along, > Now, we could assert that [the compiler] must support *all* types > as subscripts, and that "int a[int]" will, if necessary [...] and yet further, > Incidentally, I am not suggesting that this feature be incorporated into the > new standard -- I realize there's very little hope for that. However, I *do* > believe it is a good idea (at least for enum) [...] I can think of only one little quibble with the above: C arrays are considered to begin with the zero'th element and end at the N-1'th element, right? So to declare something as int array[ short ]; implies that array[] has members indexed by -1, -2, -3, and so on ('short' is a signed type, no?). So Karl will also have to think of some way to introduce the declaration of array index bases, as well as array index ranges, a la Algol-60 or Pascal; and this then introduces a rather large can of worms when pointers to based arrays are passed around. Say I declare an array int demo[ 21 : -10 ]; which I define to mean 'an array of ints, called demo, 21 ints long, first element is demo[ -10 ]'. Fine. Now, what is the value of the token 'demo' ? Is it '& demo[ -10 ]' ? Is it '& demo[ 0 ]' (this is K&R's idea of 'demo', I think) ? And further, when I then code up something like 'frobozz( 2, "hi there", demo );' and frobozz() looks like double frobozz( a, b, c ); int a; char * b; int c[]; { /* and so on... */ what is the allowable range of indicies of c[] ? -10 .. +10 ? 0 .. 20 ? Beats _me_ :-). As a quick out (because I _do_ like the basic idea), how about saying that subscript types must be unsigned? ...!ihnp4!alberta!auvax!rwa Ross Alexander @ Athabasca University
ford@crash.UUCP (06/14/87)
In article <184@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: >I can think of only one little quibble with the above: C arrays are >considered to begin with the zero'th element and end at the N-1'th >element, right? So to declare something as > > int array[ short ]; > >implies that array[] has members indexed by -1, -2, -3, and so on No. A supscript must simply be some sort of integral type which the copmiler can perform multiplication on or stick into an index register (i.e. it must be something you can add to a pointer). In particuar, pointers may be dereferenced with a negative subscript. According to K&R's C Reference Manual, section 14.3: "By definition, the subscript operator [] is interpreted in such a way that E1 [ E2 ] is identical to * ( (E1) + (E2) )." This means that, since negative numbers can be added to pointers, it is possible to reference negatively indexed members of an array. This would only be useful if you had a pointer to the middle of an array, just as it is only useful to use subscripts less than the size of an array. As an example, consider the code to remove a newline from the end of a string: void trim(string) char *string; { while (*string) ++string; if ( string[-1] == '\n' ) string[-1] = '\0'; } This code is optimized for situations where the newline will not usually be present, otherwise the first string[-1] would be *(--string) and the second would be *string. This is because using a non-zero subscript is usually faster than actually decrementing string, but decrementing it is faster than using the non-zero subscript twice. Note that for simplicity, the example does not properly handle the case where a zero-length string is passed. It would reference the byte immediately before the string in memory, which is undefined in general. > [...] Say I declare an array > > int demo[ 21 : -10 ]; > >which I define to mean 'an array of ints, called demo, 21 ints long, >first element is demo[ -10 ]'. This is not a bad idea, but I would prefer something like the pascal syntax: int demo[ -10 .. 10 ]; >Now, what is the value of the token 'demo' ? I think it should be the address of the FIRST element in demo (i.e. demo[-10]). This allows such idiomatic constructs as write(fd, demo, sizeof demo); which would become very non-intuitive if it were necessary to specify the starting or ending indices. > And further, when I then code up >something like 'frobozz( 2, "hi there", demo );' and frobozz() looks >like > > double frobozz( a, b, c ); > int a; > char * b; > int c[]; > { /* and so on... */ > >what is the allowable range of indicies of c[] ? -10 .. +10 ? 0 .. >20 ? > The problem here is that the declaration "int c[]" is misleading, it implies that an array is passed to this function, when only a pointer is passed. It makes sence to declare c as "int *", and then you can pass the address of whichever element of demo[] you like, with "demo" alone refering to the add- ress of the FIRST (not necessarily the zero-th) element. I see no real reason to have negative (or even specifiable) ranges of indices for arrays. This is another "educational" feature of pascal that detracts from C's philosophy of being a 'high-level assembly language', i.e., C is meant to provide a portable representation for operations present in most machines' instruction sets. But negative subscripts to POINTERS can be very useful, almost always has a direct mapping to machine instructions, and is already used in a lot of code that I have seen, including most C compilers' C libraries (in things like strcat, strtok, et al.). Michael "Ford" Ditto -=] Ford [=- P.O. Box 1721 ford@crash.CTS.COM Bonita, CA 92002 ford%oz@prep.mit.ai.edu -- Michael "Ford" Ditto -=] Ford [=- P.O. Box 1721 ford@crash.CTS.COM Bonita, CA 92002 ford%oz@prep.mit.ai.edu
edw@ius2.cs.cmu.edu.UUCP (06/14/87)
In article <1226@crash.CTS.COM>, ford@crash.CTS.COM (Michael Ditto) writes: > > > [...] Say I declare an array > > > > int demo[ 21 : -10 ]; > > > >which I define to mean 'an array of ints, called demo, 21 ints long, > >first element is demo[ -10 ]'. > > This is not a bad idea, but I would prefer something like the pascal syntax: > > int demo[ -10 .. 10 ]; > > >Now, what is the value of the token 'demo' ? > > I think it should be the address of the FIRST element in demo (i.e. > demo[-10]). This allows such idiomatic constructs as > > write(fd, demo, sizeof demo); > > which would become very non-intuitive if it were necessary to specify the > starting or ending indices. We have a problem here. If demo is actually the address of the first element of the array then what about the code generated for a = demo[-10] presuming that you what to index over the subrange. Does the compiler have to keep around subrange values so the code generated for the above is a = *(demo + index - range_base) > > > And further, when I then code up > >something like 'frobozz( 2, "hi there", demo );' and frobozz() looks > >like > > > > double frobozz( a, b, c ); > > int a; > > char * b; > > int c[]; > > { /* and so on... */ > > > >what is the allowable range of indicies of c[] ? -10 .. +10 ? 0 .. > >20 ? > > > > The problem here is that the declaration "int c[]" is misleading, it implies > that an array is passed to this function, when only a pointer is passed. It > makes sence to declare c as "int *", and then you can pass the address of > whichever element of demo[] you like, with "demo" alone refering to the add- > ress of the FIRST (not necessarily the zero-th) element. This is how you propose to solve the problem of indexing for subranges in C. Why bother putting them into the language in the first place if you can't subscript over them in the most natural way ie. for (i = -10; i <= 10; i++) sum += c[i]; The better solution would be to require the subrange declaration for c. ie int c[-10..10]; > > I see no real reason to have negative (or even specifiable) ranges of indices > for arrays. This is another "educational" feature of pascal that detracts > from C's philosophy of being a 'high-level assembly language', i.e., C is > meant to provide a portable representation for operations present in most > machines' instruction sets. But negative subscripts to POINTERS can be very > useful, almost always has a direct mapping to machine instructions, and is > already used in a lot of code that I have seen, including most C compilers' > C libraries (in things like strcat, strtok, et al.). > > Michael "Ford" Ditto -=] Ford [=- > P.O. Box 1721 ford@crash.CTS.COM > Bonita, CA 92002 ford%oz@prep.mit.ai.edu Points to be made. If subranges are added to the language then 1) the indexing code for arrays would be let efficient array base address + index - base subrange (constant folding can remove some inefficencies but not all - if the array is an automatic var the address isn't known until the function is called) 2) would require the code generation to be a lot smarter about array indexing can nolonger assume the base index will start at the integer value 0 - indexing subrange information on arrays would have to be keep around. 3) make the language more strongly typed To properly handle the frobozz example, the subrange information has to be present. Hence, one could say that int ??[-10..10] is a new type and to insure proper preformance only variables of the same type should be given as input parameters. -- Eddie Wyatt e-mail: edw@ius2.cs.cmu.edu
rwa@auvax.UUCP (Ross Alexander) (06/15/87)
In article <1226@crash.CTS.COM>, ford@crash.CTS.COM (Michael Ditto) writes: > In article <184@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: > > int array[ short ]; > > > >implies that array[] has members indexed by -1, -2, -3, and so on > > No. A subscript must simply be some sort of integral type which the copmiler > can perform multiplication on or stick into an index register (i.e. it must > be something you can add to a pointer). yes, yes, we all know that. float and double subscript values are strictly for BASIC, right :-) ?? what i was driving at was "how much storage is the compiler going to ALLOCATE for array[ short ] ?", and the corollory question is "what is the value of the index of the first and last allocated elements of array[]?". by analogy with Pascal, one might expect the first allocated element to have a negative index value, since 'short' is a signed type, and is really "MIN_SHORT_VALUE .. MAX_SHORT_VALUE" when you think of it as a subrange type (subrange of int). on 'typical' architectures, this might be -0x8000 .. 0x7fff or whatever. this was my meaning (poorly expressed, i grant you) when i said 'indexed by -1, -2, -3, and so on'. at any rate, the array will have (some) elements indexed by negative values, and (some) by non-negative values. now, getting back to the can of worms (smile, gentle people), say i declare int char_count[ char ]; (I want to count character frequencies in some input, or something), and further let us say that chars are signed, so this declaration is equivalent to int char_count[ MIN_CHAR .. MAX_CHAR ]; with MIN_CHAR == -128, and MAX_CHAR == 127. (As an aside, this is precisely the case locally). now, you suggest that the token 'char_count' would have the value '& char_count[ MIN_CHAR ]'. (actually, it's a pointer constant, but never mind). then i can say that '* ( char_count + 3 )' references an int three int's further on in memory than '& char_count[ MIN_CHAR ]'. ok? and of course, 'char_count[ 3 ]' is the third int after 'char_count[ 0 ]', which to my way of thinking is in the middle of the array. but we get into a small problem when we observe that '* ( char_count + 3 )' and 'char_count[ 3 ]' are formally equivalent, but talk about entirely different places (whoops!!). this _reductio ad absurdem_ leads me to think that fooling with the current semantics of the pointer constant 'char_count' is not a good idea, i.e., it should still mean '& char_count[ 0 ]' regardless of the fact that happens to be the middle of the array. And you yourself observed how much grief that is going to cause. has anybody got a bright idea? i am (reluctantly) forced to think that fiddling with the semantics of C array declarations is not wise. as any aside, any one have a good way to introduce sets into C? we already have enum's, i really would like to have sets and set operations. i know that you can fake it for small sets ( <= number of bits in a 'long' ) very easily, but it gets ugly after that. and writing enum days { sunday, monday, /* ... */, friday, saturday }; long days_off; int x; days_off = ( 1 << sunday ) | ( 1 << saturday ); if ( ( days_off & ( 1 << x ) ) != 0 ) { /* do something */ }; is a little less intuitive than ( i create syntax freely here, the semantics are pretty much the same ) set days_off[ days ]; days_off = [ sunday, saturday ]; if ( days_off[ x ] ) { /* do something */ }; i don't like the notation especially, but... ...!ihnp4!alberta!auvax!rwa Ross Alexander, Athabasca University
grk@sfsup.UUCP (06/15/87)
[Original article not included for brevity--type `p' if you want to see it] There have been several people who suggested that it would be nice if C had non-zero origined arrays. How about the following (albeit kludgy) solution: int a[21]; /* really meant to be a[ -10 .. 10 ] */ int *na; /* na stands for new-a, good software engineering type descriptive variable name :-) */ na = &a[10]; /* na[-10] is now a[0], na[0] is a[10], etc. */ --Ralph -- G. Ralph Kuntz N2HBN UUCP: {ihnp4,allegra}!attunix!grk ARPA: rutgers.rutgers.edu!pisc2b!grk PACKET: N2HBN @ NN2Z
franka@mmintl.UUCP (Frank Adams) (06/15/87)
In article <1226@crash.CTS.COM> ford@crash.CTS.COM (Michael "Ford" Ditto) writes: |In article <184@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: |>I can think of only one little quibble with the above: C arrays are |>considered to begin with the zero'th element and end at the N-1'th |>element, right? | |According to K&R's C Reference Manual, section 14.3: | | "By definition, the subscript operator [] is interpreted in such a | way that E1 [ E2 ] is identical to * ( (E1) + (E2) )." | | int demo[ -10 .. 10 ]; |... |>Now, what is the value of the token 'demo' ? | |I think it should be the address of the FIRST element in demo (i.e. |demo[-10]). But this is incompatible with the quotation from K&R, above. 'demo' would have to mean '&demo[0]' for this definition to work. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
karl@haddock.UUCP (Karl Heuer) (06/16/87)
In article <184@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: >In article <526@haddock.UUCP>, karl@haddock.UUCP (Karl Heuer) writes: [proposal for a new syntax allowing a type name instead of an array size] > >I can think of only one little quibble with the above: C arrays are >considered to begin with the zero'th element and end at the N-1'th >element, right? So to declare something as > int array[ short ]; >implies that array[] has members indexed by -1, -2, -3, and so on >('short' is a signed type, no?). Agreed. The set of valid subscripts would be -32768 to +32767 (on a vax). >So Karl will also have to think of some way to introduce the declaration of >array index bases, as well as array index ranges, a la Algol-60 or Pascal; What for? The notation "int array[short]" does that implicitly, by (my) definition. The more general case may also be useful, but that can be an independent issue. (And the best solution may turn out to be pascal-like subrange types.) >and this then introduces a rather large can of worms when pointers to based >arrays are passed around. No problem. You can already pass arrays around without knowing the base or size: "int a[10]; p = a+3;" gives you a pointer whose valid subscripts are -3 to +6. >Say I declare an array [named demo, size 21 ints, based at -10]. Now, what >is the value of the token 'demo' ? Is it '&demo[-10]'? Is it '&demo[0]'? If the definition of subscripting is to remain intact, demo must be equal to &demo[0]. This does create some minor problems, but then so does the original idea of an enum-indexed array in a strongly typed world. It may be that this can only be implemented properly by making the array a real datatype. >As a quick out (because I _do_ like the basic idea), how about >saying that subscript types must be unsigned? You get all of the same problems with "enum foo { X=11, Y=12, Z=13 }". Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
karl@haddock.UUCP (Karl Heuer) (06/16/87)
In <516@haddock.UUCP> and <526@haddock.UUCP> I suggested the possibility of a type-indexed array, the particular point of interest being enums: enum color { RED, GREEN, BLUE }; int hue[enum color] = { RED: 0x00F, GREEN: 0x0F0, BLUE: 0xF00 }; This is supposed to create an array which must be subscripted by enum color; using an int subscript would be illegal. The idea was that this could make enums more useful in the strong model. (In the strong model, enums are separate types as opposed to disguised integers.) Somehow I overlooked the fact that this is at odds with the definition of the subscript operator. If "hue[RED]" is an int, and is the same as "*(hue+RED)", then "hue" must have a type T such that T+enum yields "int *". There is no such type. (Logically, it's the type you get if you subtract an enum from a pointer, but this is an illegal operation in the strong model.) I see a few ways to salvage this: [0] Accept the weak model of enums. In this case we aren't adding anything. (If enums are just ints, they can already be used as subscripts; the only advantage is the notational convenience in declaring the array.) We lose the typechecking (detection of non-enum subscripts). [1] Legalize the operation "pointer - enum". You can add an integer to this type, yielding the same type. You cannot dereference it. You can add an enum (of the correct type), yielding a normal (dereferencable) pointer. This seems quite kludgy, but may be workable. (How do you declare an object of this new type?) [2] Discard the idea that a[i] means *(a+i), and the automatic conversion of arrays to pointers. This is probably best in the long run, but is a rather substantial change in the language, and would have to be phased in. >[I] would like to see it implemented as an extension. If it catches on, it >may be a candidate for the __STDC__==2 release. Suddenly, even __STDC__==3 seems optomistic. Unless I've missed something? Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
guy@gorodish.UUCP (06/16/87)
> [2] Discard the idea that a[i] means *(a+i), and the automatic conversion of > arrays to pointers. This is probably best in the long run, but is a rather > substantial change in the language, and would have to be phased in. Note that C++ has implicitly acknowledged that the "a[i] means *(a+i)" model isn't always right; when you add a new data type you can define a subscripting operator for it. I agree that it would be best in the long run, but one is reminded of Keynes' remark about the long run.... Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
karl@haddock.UUCP (Karl Heuer) (06/19/87)
In article <21213@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes: >[karl@haddock.isc.com (Karl Heuer) writes:] >>[2] Discard the idea that a[i] means *(a+i), and the automatic conversion of >>arrays to pointers. This is probably best in the long run, but is a rather >>substantial change in the language, and would have to be phased in. > >Note that C++ has implicitly acknowledged that [this model] isn't always >right; [you can redefine subscripting for new types]. I agree that it would >be best in the long run, but one is reminded of Keynes' remark [...] So maybe we should just shorten the long run. If we want to phase in this change, I would expect that the first step would be to redefine subscripting and array/pointer conversion as follows: 3.2.2.1 [Conversion of] Arrays, functions and pointers Add the following paragraph: When used as an operand where an array is required, an expression that has type "pointer to _type_" is converted to an expression that has type "array of _type_ with unknown number of elements" whose initial member is pointed to by the pointer expression. 3.3.2.1 Array subscripting Change the "constraints" paragraph: The left expression shall have type "array of _type_", the right expression shall have integral type, and the result has type "_type_". Change the first "semantics" paragraph: E1[E2] designates the E2-th member of E1 (counting from zero). Because of the conversion rule that applies to the subscripting operator, if E1 is a pointer to the initial member of an array object, E1[E2] is identical to (*(E1+(E2))). This is just the reverse of the current wording*; the notational equivalence of brackets and star-plus becomes a consequence instead of a definition. I claim that this change (along with appropriate rewording elsewhere in the document) would not affect the behavior of the abstract machine at all.** (Am I right?) Thus, this change would be transparent to existing programs and could be made immediately. The hard part would be deprecating the use of "a" when "&a[0]" is meant. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint -------- *Note that X3J11 has already done something very similar by changing the definition of the function-call operator: instead of the left expression being a function locator, and allowing the use of non-dereferenced function pointers by implicitly dereferencing them, they asserted that the expression is *always* a function pointer, and that the normal usage implicitly takes the address of the function locator. Personally I think they went the wrong direction; I'd rather have functions and function pointers be distinct, and require the "&" as well as the "*". **Well, actually my wording does forbid the barbaric construct i[a], but this could be either allowed-but-deprecated or a Common Extension.