[comp.lang.c] A nice macro

maart@cs.vu.nl (Maarten Litmaath) (06/21/89)

An often-heard complaint by Pascal dweebs on C is the absence of the
equivalence of

	VAR
		foo:	array[-5..-2] of bar;

C arrays always begin with subscript 0.
Some time ago someone (Chris Torek?) suggested to use a macro:

	#define		HIGH		-2
	#define		LOW		-5

	bar	foo[HIGH - LOW + 1];

	#define		foo_addr(n)	&foo[(n) - LOW]

By this scheme every `zork(n)' might be an array reference instead of a
function call/function-like macro invocation. :-(

I doubt Chris was the person who suggested this; the solution below seems so
straightforward:

	bar	_foo[HIGH - LOW + 1];

	#define		foo		(_foo - LOW)

Now:
	foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == *(_foo + 1) ==
	_foo[1]

There's only one (small) objection: name space pollution - an invisible
extra identifier `_foo' is needed.

Example of the usefulness of negative subscripts: in the MINIX kernel's `proc'
table user processes have positive indices, while kernel tasks have negative.
-- 
"I HATE arbitrary limits, especially when |Maarten Litmaath @ VU Amsterdam:
   they're small."  (Stephen Savitzky)    |maart@cs.vu.nl, mcvax!botter!maart

karl@haddock.ima.isc.com (Karl Heuer) (06/22/89)

In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>C arrays always begin with subscript 0. ... the solution below seems so
>straightforward:
>	bar	_foo[HIGH - LOW + 1];
>	#define		foo		(_foo - LOW)

If LOW <= 0 <= HIGH, no problem.  But this is not portable if you're trying to
emulate, say, origin-1 arrays (LOW==1), since the expression (_foo-1) could be
outside your address space.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (06/22/89)

In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
> An often-heard complaint by Pascal dweebs on C is the absence of the
> equivalence of
>	VAR foo: array[-5..-2] of bar;

I am by no means a "Pascal dweeb", because I don't particularly care
for the language.  However, there are many problems in which it is
natural to have the subscript range of an array be other than 0..N-1.
(But, presumably, Maarten meant
    "one is a Pascal dweeb => one tends to complain about C subscripts"
rather than
    "one tends to complain about C subscripts => one is a Pascal dweeb").

One application for non-0-based-subscripting is, in fact,
> in the MINIX kernel's `proc' table user processes have positive
> indices, while kernel tasks have negative. 

>	#define		HIGH		-2
>	#define		LOW		-5
>	bar	foo[HIGH - LOW + 1];
>	#define		foo_addr(n)	&foo[(n) - LOW]
>
> By this scheme every `zork(n)' might be an array reference instead
> of a function call/function-like macro invocation. :-(

Not an array "reference", whatever that means, but an expression
evaluating to a pointer (into an array).  It is certainly permitted in
C to have a function or a macro return a pointer, as in:
        *foo_addr(3) = 15;
It might be disconcerting, as you note, to see
        zork(5) = 20;
in a C program.

> I doubt Chris was the person who suggested this; the solution below
> seems so straightforward:

The keyword here is "seems".  If you ever think that Chris Torek has
made a mistake, it is prudent to think again.  The odds say that you
made the mistake.

>	bar	_foo[HIGH - LOW + 1];                   /* #1 */
>	#define foo         (_foo - LOW)                /* #2 */
[with the example]
>	foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4) == /* #3 */
>           *(_foo + 1) == _foo[1]

Objection 1: in pANS C, many identifiers starting with "_" are
reserved for the implementation.  Unfortunately, I don't have the
rules handy, so I can't tell the circumstances under which this
declaration would be legal.  If "_foo" is extern, I'm pretty sure it
is illegal.  "real_foo" would be a better choice.

Objection 2 (minor): with this #define, constructs like
        a = func foo;
would be syntactically legal, and
        foo = 10;
would give a confusing error message.  I might prefer
        #define foo     &_foo[-LOW]
which is (more or less) equivalent.

Objection 3: the result of the computation is undefined in pANS C.
>	foo[-4] == (_foo - -5)[-4] == *((_foo + 5) - 4)
OK so far, in a syntactic sense at least.
>           == *(_foo + 1)
Wrong, at least in pANS C.  pANS C is not permitted to rearrange
expressions so as to ignore parentheses.  "(a+b)+c" must be computed
by adding a to b, and then adding the result to c.%  Since HIGH-LOW+1
is 4, the declaration of _foo is
        bar _foo[4];
Elements _foo+0 through _foo+3 exist, and the address _foo+4 may be
computed, but the expression given is
        *((_foo + 5) - 4)
and the address _foo+5 is undefined.  In particular, an implementation
is permitted to abort the program or generate a random address.

Under what conditions is Maarten's scheme guaranteed to work?  (From
here on, assume there's no integer overflow.)

First, the declaration
        bar id[HIGH - LOW + 1];
must be legal.  Since C does not allow 0-sized arrays (currently),
        HIGH - LOW + 1 >= 1
so      HIGH - LOW >= 0
or      HIGH >= LOW

The other conditions are derived from the requirement that
	#define foo         (id - LOW)
generate a valid address.  The valid addresses from id are id+0
through id+(HIGH-LOW+1) inclusive, so the first condition is
        id - LOW >= id + 0
hence   -LOW >= 0
or      LOW <= 0

and the second condition is
        id - LOW <= id + HIGH - LOW + 1
hence   -LOW <= HIGH + 1 + (-LOW)
or      0 <= HIGH + 1
or      HIGH >= -1

So the three preconditions for Maarten's scheme to be guaranteed to
work are
        HIGH >= LOW
        LOW <= 0
        HIGH >= -1
Maarten's second example fails the third constraint, and thus is not
portable under pANS C.  (His first example, about the MINIX process
table, would work.)  In fact, almost all architectures will do it
"right", but that's no consolation when you try to port to the odd
one.

The first #define, attributed to Chris Torek,

>	#define		foo_addr(n)	&foo[(n) - LOW]

might have a similar problem.  "a[b]" was defined by K&R to be
identical to "*(a+b)", so
        &foo[(n) - LOW] <==> (foo + (n) - LOW)
I don't know whether pANS C specifies the same identity,# or what it
says about evaluating expressions without parentheses.  If compilers
are allowed to rearrange, a compiler might instead compute
         (foo - LOW + (n))
which would have the same overflow conditions as Maarten's scheme.
With extra parentheses, any LOW and HIGH pair may be used when
LOW <= HIGH (if there's no integer overflow):

>	#define		foo_addr(n)	(&(foo)[  ((n)-(LOW))  ])

There's a general rule of thumb: in the right-hand side of a macro
definition, parenthesize everything, even where you think parentheses
are unnecessary.  Here's yet one more example of where the rule of
thumb is useful.


% Actually, there's the "as if" rule, which permits an implementation
to do anything it wants as long as the result derived is correct for
all cases in which pANS C defines an answer.  For example, if "a+b"
would overflow, the value of "(a+b)+c" is not defined under pANS C,
and an implementation may do whatever it likes.  It might produce, in
this case, "a+(b+c)", which might happen to be the numerically correct
answer.  Or it might send 110 volts at 100 amps AC through your chair.

# Actually, I'm implicitly assuming another identity, that
        &*x == x
for any address x.  I don't know about this identity under pANS C,
either.

mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/23/89)

In article <2784@solo8.cs.vu.nl>, maart@cs.vu.nl (Maarten Litmaath) writes:
> An often-heard complaint by Pascal dweebs on C is the absence of the
> equivalence of
	.
	.
	.
> 	bar	_foo[HIGH - LOW + 1];
> 
> 	#define		foo		(_foo - LOW)
	.
	.
	.
> There's only one (small) objection: name space pollution - an invisible
> extra identifier `_foo' is needed.

I can think of another objection.  The C convention that the address of the
first element of an array == its name.  Not to mention that this "foo" can
no longer be passed to sizeof().  And I'm sure some cleverer people will
notice that "foo" may not be calculable in some implementations of C, since
it involves an address BEFORE the actual start of an array.

Mike Lindner
attunix!mpl
AT&T Bell Laboratories
190 River Rd.
Summit, NJ 07901

diamond@diamond.csl.sony.junet (Norman Diamond) (06/23/89)

In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>>C arrays always begin with subscript 0. ... the solution below seems so
>>straightforward:
>>	bar	_foo[HIGH - LOW + 1];
>>	#define		foo		(_foo - LOW)

In article <13788@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:

>If LOW <= 0 <= HIGH, no problem.

Well, as long as LOW <= 0 and LOW <= HIGH.  The latter is trivially a
requirement of all languages' array support.  This solution also works
for LOW == -5 and HIGH == -3.

>But this is not portable if you're trying to emulate, say, origin-1
>arrays (LOW==1), since the expression (_foo-1) could be >outside your
>address space.

True indeed.  The entire expression would yield a valid location
(except when the program really does have a subscript error),
e.g. foo[i] == *((_foo-LOW)+i) would be legal if the compiler were
permitted to rewrite the expression as *(_foo+(i-LOW)).

Looks like ANSI has standardized existing practice once again, by
forbidding compilers from making such a rearrangement.  If permitted,
maybe this would have become a quality-of-implementation issue.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
 The above opinions are claimed by your machine's init process (pid 1), after
 being disowned and orphaned.  However, if you see this at Waterloo, Stanford,
 or Anterior, then their administrators must have approved of these opinions.

dg@lakart.UUCP (David Goodenough) (06/23/89)

From article <2784@solo8.cs.vu.nl>, by maart@cs.vu.nl (Maarten Litmaath):
> Example of the usefulness of negative subscripts: in the MINIX kernel's `proc'
> table user processes have positive indices, while kernel tasks have negative.

And then there was yacc :-)
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

rns@se-sd.NCR.COM (Rick Schubert) (07/11/89)

In article <10420@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>In article <2784@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>>>C arrays always begin with subscript 0. ... the solution below seems so
>>>straightforward:
>>>	bar	_foo[HIGH - LOW + 1];
>>>	#define		foo		(_foo - LOW)

>In article <13788@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
>>But this is not portable if you're trying to emulate, say, origin-1
>>arrays (LOW==1), since the expression (_foo-1) could be >outside your
>>address space.

>True indeed.  The entire expression would yield a valid location
>(except when the program really does have a subscript error),
>e.g. foo[i] == *((_foo-LOW)+i) would be legal if the compiler were
>permitted to rewrite the expression as *(_foo+(i-LOW)).

>Looks like ANSI has standardized existing practice once again, by
>forbidding compilers from making such a rearrangement.  If permitted,
>maybe this would have become a quality-of-implementation issue.

I'm not sure what existing practice you're flaming, but the problem is NOT
that ANSI forbids compilers from making such a rearrangement.

1. Compilers are allowed to make such rearrangements.  The new rule about
   honoring parentheses still allows the compiler to rearrange expressions
   in certain situations.  For integer arithmetic (including, for the purpose
   of this discussion, pointer arithmetic), associative and commutative laws
   may be used if they do not affect the result; this is the case if either:

	a. no overflow can occur either before or after the rearrangement 

	b. no overflow can occur before the rearrangement, can happen after
           the rearrangement, but silent mod 2^n arithmetic takes place
           (at least as far as the rearrangement of additive operators)

        c. overflow would occur before the rearrangement but not after the
           rearrangement.

   Case c applies to your situation.  ((_foo-LOW)+i) may overflow (i.e.
   (_foo-LOW) may yield an invalid address, whereas (_foo+(i-LOW)) would
   not if `i' is a valid subscript.  The compiler would be allowed to
   make this rearrangement.

2. The problem is that the compiler should not be REQUIRED to make such a
   rearrangement.  In order for the macro to be legal, compilers would be
   required to make this rearrangement on architectures for which computing
   (_foo-LOW) would cause problems.  This would be applying the
   Don't-Do-What-I-Said, Do-What-I-Meant principle.

Disclaimer: I can't believe this topic is still active;
sorry for prolonging it.

-- Rick Schubert (rns@se-sd.sandiego.NCR.COM)