maart@cs.vu.nl (Maarten Litmaath) (06/21/89)
An often-heard complaint by Pascal dweebs on C is the absence of the equivalence of VAR foo: array[-5..-2] of bar; C arrays always begin with subscript 0. Some time ago someone (Chris Torek?) suggested to use a macro: #define HIGH -2 #define LOW -5 bar foo[HIGH - LOW + 1]; #define foo_addr(n) &foo[(n) - LOW] By this scheme every `zork(n)' might be an array reference instead of a function call/function-like macro invocation. :-( I doubt Chris was the person who suggested this; the solution below seems so straightforward: bar _foo[HIGH - LOW + 1]; #define foo (_foo - LOW) Now: foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == *(_foo + 1) == _foo[1] There's only one (small) objection: name space pollution - an invisible extra identifier `_foo' is needed. Example of the usefulness of negative subscripts: in the MINIX kernel's `proc' table user processes have positive indices, while kernel tasks have negative. -- "I HATE arbitrary limits, especially when |Maarten Litmaath @ VU Amsterdam: they're small." (Stephen Savitzky) |maart@cs.vu.nl, mcvax!botter!maart
karl@haddock.ima.isc.com (Karl Heuer) (06/22/89)
In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >C arrays always begin with subscript 0. ... the solution below seems so >straightforward: > bar _foo[HIGH - LOW + 1]; > #define foo (_foo - LOW) If LOW <= 0 <= HIGH, no problem. But this is not portable if you're trying to emulate, say, origin-1 arrays (LOW==1), since the expression (_foo-1) could be outside your address space. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (06/22/89)
In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: > An often-heard complaint by Pascal dweebs on C is the absence of the > equivalence of > VAR foo: array[-5..-2] of bar; I am by no means a "Pascal dweeb", because I don't particularly care for the language. However, there are many problems in which it is natural to have the subscript range of an array be other than 0..N-1. (But, presumably, Maarten meant "one is a Pascal dweeb => one tends to complain about C subscripts" rather than "one tends to complain about C subscripts => one is a Pascal dweeb"). One application for non-0-based-subscripting is, in fact, > in the MINIX kernel's `proc' table user processes have positive > indices, while kernel tasks have negative. > #define HIGH -2 > #define LOW -5 > bar foo[HIGH - LOW + 1]; > #define foo_addr(n) &foo[(n) - LOW] > > By this scheme every `zork(n)' might be an array reference instead > of a function call/function-like macro invocation. :-( Not an array "reference", whatever that means, but an expression evaluating to a pointer (into an array). It is certainly permitted in C to have a function or a macro return a pointer, as in: *foo_addr(3) = 15; It might be disconcerting, as you note, to see zork(5) = 20; in a C program. > I doubt Chris was the person who suggested this; the solution below > seems so straightforward: The keyword here is "seems". If you ever think that Chris Torek has made a mistake, it is prudent to think again. The odds say that you made the mistake. > bar _foo[HIGH - LOW + 1]; /* #1 */ > #define foo (_foo - LOW) /* #2 */ [with the example] > foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == /* #3 */ > *(_foo + 1) == _foo[1] Objection 1: in pANS C, many identifiers starting with "_" are reserved for the implementation. Unfortunately, I don't have the rules handy, so I can't tell the circumstances under which this declaration would be legal. If "_foo" is extern, I'm pretty sure it is illegal. "real_foo" would be a better choice. Objection 2 (minor): with this #define, constructs like a = func foo; would be syntactically legal, and foo = 10; would give a confusing error message. I might prefer #define foo &_foo[-LOW] which is (more or less) equivalent. Objection 3: the result of the computation is undefined in pANS C. > foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) OK so far, in a syntactic sense at least. > == *(_foo + 1) Wrong, at least in pANS C. pANS C is not permitted to rearrange expressions so as to ignore parentheses. "(a+b)+c" must be computed by adding a to b, and then adding the result to c.% Since HIGH-LOW+1 is 4, the declaration of _foo is bar _foo[4]; Elements _foo+0 through _foo+3 exist, and the address _foo+4 may be computed, but the expression given is *((_foo + 5) - 4) and the address _foo+5 is undefined. In particular, an implementation is permitted to abort the program or generate a random address. Under what conditions is Maarten's scheme guaranteed to work? (From here on, assume there's no integer overflow.) First, the declaration bar id[HIGH - LOW + 1]; must be legal. Since C does not allow 0-sized arrays (currently), HIGH - LOW + 1 >= 1 so HIGH - LOW >= 0 or HIGH >= LOW The other conditions are derived from the requirement that #define foo (id - LOW) generate a valid address. The valid addresses from id are id+0 through id+(HIGH-LOW+1) inclusive, so the first condition is id - LOW >= id + 0 hence -LOW >= 0 or LOW <= 0 and the second condition is id - LOW <= id + HIGH - LOW + 1 hence -LOW <= HIGH + 1 + (-LOW) or 0 <= HIGH + 1 or HIGH >= -1 So the three preconditions for Maarten's scheme to be guaranteed to work are HIGH >= LOW LOW <= 0 HIGH >= -1 Maarten's second example fails the third constraint, and thus is not portable under pANS C. (His first example, about the MINIX process table, would work.) In fact, almost all architectures will do it "right", but that's no consolation when you try to port to the odd one. The first #define, attributed to Chris Torek, > #define foo_addr(n) &foo[(n) - LOW] might have a similar problem. "a[b]" was defined by K&R to be identical to "*(a+b)", so &foo[(n) - LOW] <==> (foo + (n) - LOW) I don't know whether pANS C specifies the same identity,# or what it says about evaluating expressions without parentheses. If compilers are allowed to rearrange, a compiler might instead compute (foo - LOW + (n)) which would have the same overflow conditions as Maarten's scheme. With extra parentheses, any LOW and HIGH pair may be used when LOW <= HIGH (if there's no integer overflow): > #define foo_addr(n) (&(foo)[ ((n)-(LOW)) ]) There's a general rule of thumb: in the right-hand side of a macro definition, parenthesize everything, even where you think parentheses are unnecessary. Here's yet one more example of where the rule of thumb is useful. % Actually, there's the "as if" rule, which permits an implementation to do anything it wants as long as the result derived is correct for all cases in which pANS C defines an answer. For example, if "a+b" would overflow, the value of "(a+b)+c" is not defined under pANS C, and an implementation may do whatever it likes. It might produce, in this case, "a+(b+c)", which might happen to be the numerically correct answer. Or it might send 110 volts at 100 amps AC through your chair. # Actually, I'm implicitly assuming another identity, that &*x == x for any address x. I don't know about this identity under pANS C, either.
mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/23/89)
In article <2784@solo8.cs.vu.nl>, maart@cs.vu.nl (Maarten Litmaath) writes: > An often-heard complaint by Pascal dweebs on C is the absence of the > equivalence of . . . > bar _foo[HIGH - LOW + 1]; > > #define foo (_foo - LOW) . . . > There's only one (small) objection: name space pollution - an invisible > extra identifier `_foo' is needed. I can think of another objection. The C convention that the address of the first element of an array == its name. Not to mention that this "foo" can no longer be passed to sizeof(). And I'm sure some cleverer people will notice that "foo" may not be calculable in some implementations of C, since it involves an address BEFORE the actual start of an array. Mike Lindner attunix!mpl AT&T Bell Laboratories 190 River Rd. Summit, NJ 07901
diamond@diamond.csl.sony.junet (Norman Diamond) (06/23/89)
In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >>C arrays always begin with subscript 0. ... the solution below seems so >>straightforward: >> bar _foo[HIGH - LOW + 1]; >> #define foo (_foo - LOW) In article <13788@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: >If LOW <= 0 <= HIGH, no problem. Well, as long as LOW <= 0 and LOW <= HIGH. The latter is trivially a requirement of all languages' array support. This solution also works for LOW == -5 and HIGH == -3. >But this is not portable if you're trying to emulate, say, origin-1 >arrays (LOW==1), since the expression (_foo-1) could be >outside your >address space. True indeed. The entire expression would yield a valid location (except when the program really does have a subscript error), e.g. foo[i] == *((_foo-LOW)+i) would be legal if the compiler were permitted to rewrite the expression as *(_foo+(i-LOW)). Looks like ANSI has standardized existing practice once again, by forbidding compilers from making such a rearrangement. If permitted, maybe this would have become a quality-of-implementation issue. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net) The above opinions are claimed by your machine's init process (pid 1), after being disowned and orphaned. However, if you see this at Waterloo, Stanford, or Anterior, then their administrators must have approved of these opinions.
dg@lakart.UUCP (David Goodenough) (06/23/89)
From article <2784@solo8.cs.vu.nl>, by maart@cs.vu.nl (Maarten Litmaath): > Example of the usefulness of negative subscripts: in the MINIX kernel's `proc' > table user processes have positive indices, while kernel tasks have negative. And then there was yacc :-) -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
rns@se-sd.NCR.COM (Rick Schubert) (07/11/89)
In article <10420@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >>>C arrays always begin with subscript 0. ... the solution below seems so >>>straightforward: >>> bar _foo[HIGH - LOW + 1]; >>> #define foo (_foo - LOW) >In article <13788@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: >>But this is not portable if you're trying to emulate, say, origin-1 >>arrays (LOW==1), since the expression (_foo-1) could be >outside your >>address space. >True indeed. The entire expression would yield a valid location >(except when the program really does have a subscript error), >e.g. foo[i] == *((_foo-LOW)+i) would be legal if the compiler were >permitted to rewrite the expression as *(_foo+(i-LOW)). >Looks like ANSI has standardized existing practice once again, by >forbidding compilers from making such a rearrangement. If permitted, >maybe this would have become a quality-of-implementation issue. I'm not sure what existing practice you're flaming, but the problem is NOT that ANSI forbids compilers from making such a rearrangement. 1. Compilers are allowed to make such rearrangements. The new rule about honoring parentheses still allows the compiler to rearrange expressions in certain situations. For integer arithmetic (including, for the purpose of this discussion, pointer arithmetic), associative and commutative laws may be used if they do not affect the result; this is the case if either: a. no overflow can occur either before or after the rearrangement b. no overflow can occur before the rearrangement, can happen after the rearrangement, but silent mod 2^n arithmetic takes place (at least as far as the rearrangement of additive operators) c. overflow would occur before the rearrangement but not after the rearrangement. Case c applies to your situation. ((_foo-LOW)+i) may overflow (i.e. (_foo-LOW) may yield an invalid address, whereas (_foo+(i-LOW)) would not if `i' is a valid subscript. The compiler would be allowed to make this rearrangement. 2. The problem is that the compiler should not be REQUIRED to make such a rearrangement. In order for the macro to be legal, compilers would be required to make this rearrangement on architectures for which computing (_foo-LOW) would cause problems. This would be applying the Don't-Do-What-I-Said, Do-What-I-Meant principle. Disclaimer: I can't believe this topic is still active; sorry for prolonging it. -- Rick Schubert (rns@se-sd.sandiego.NCR.COM)